* Linux do_coredump() and SMP systems
@ 2015-02-17 13:41 Sudharsan Vijayaraghavan
2015-02-17 15:12 ` Greg KH
0 siblings, 1 reply; 5+ messages in thread
From: Sudharsan Vijayaraghavan @ 2015-02-17 13:41 UTC (permalink / raw)
To: kernelnewbies
Hi All,
We are running 3.8 kernel.
I have a unique scenario, where we hit on several issues in do_coredump.
We have a SMP system with thousands of cores, one pthread is tied to
one core. The main process containing these pthreads runs in the first
core.
Here is the issue # 1
When one of threads core dump, we enter into do_coredump(), now one
other thread in same process running in a different
core can as well core dump(before SIGKILL was delivered to it as a
consequence of first core dump)
This gives way to entering into do_coredump more than once.
Once we have two guys entering do_coredump() one can kill other with SIGKILL
the result is completely unpredictable. No guarantee we will have two
core files generated in the end
Linux kernel does not seem to handle it at all.
Adding a spin lock within do_coredump() will solve the case of
multiple entries into do_coredump()
I want to know whether Linux kernel really does not handle the above
case or am I missing something?
Please clarify
Issue # 2:
Within do_coredump() SIGKILL is sent to all threads in process other
than the one running core dump.
There is no guarantee that SIGKILL will be immediately received by all
threads in the process, which means the state of threads (particularly
backtrace per thread) can be lot of different now when compared to the
time at which offending thread initiated a coredump.
This is in turn means the core dump generated, will have a backtrace
per thread, which is not accurate
Please confirm my understanding, advice on how this problem can be solved
Thanks,
Sudharsan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Linux do_coredump() and SMP systems
2015-02-17 13:41 Linux do_coredump() and SMP systems Sudharsan Vijayaraghavan
@ 2015-02-17 15:12 ` Greg KH
[not found] ` <CAP0SO-GBmpMS168SRNAdFmStknC=+4EuAENCvuiaX=mXRvr7hg@mail.gmail.com>
0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2015-02-17 15:12 UTC (permalink / raw)
To: kernelnewbies
On Tue, Feb 17, 2015 at 07:11:55PM +0530, Sudharsan Vijayaraghavan wrote:
> Hi All,
>
> We are running 3.8 kernel.
That's pretty old and obsolete, why are you stuck with that version?
> I have a unique scenario, where we hit on several issues in do_coredump.
> We have a SMP system with thousands of cores, one pthread is tied to
> one core. The main process containing these pthreads runs in the first
> core.
>
> Here is the issue # 1
> When one of threads core dump, we enter into do_coredump(), now one
> other thread in same process running in a different
> core can as well core dump(before SIGKILL was delivered to it as a
> consequence of first core dump)
> This gives way to entering into do_coredump more than once.
> Once we have two guys entering do_coredump() one can kill other with SIGKILL
> the result is completely unpredictable. No guarantee we will have two
> core files generated in the end
>
> Linux kernel does not seem to handle it at all.
> Adding a spin lock within do_coredump() will solve the case of
> multiple entries into do_coredump()
>
> I want to know whether Linux kernel really does not handle the above
> case or am I missing something?
Odd, we should handle this just fine, try emailing the developers
responsible for this code and cc: the linux-kernel mailing list so they
can work it out.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Linux do_coredump() and SMP systems
[not found] ` <CAP0SO-GBmpMS168SRNAdFmStknC=+4EuAENCvuiaX=mXRvr7hg@mail.gmail.com>
@ 2015-02-18 6:14 ` Sudharsan Vijayaraghavan
2015-02-18 16:01 ` Greg KH
0 siblings, 1 reply; 5+ messages in thread
From: Sudharsan Vijayaraghavan @ 2015-02-18 6:14 UTC (permalink / raw)
To: kernelnewbies
We are doing prototype so much change have gone into kernel , we are
finding it difficult to upgrade to latest immediately
However I ran through the code once again, indeed kernel handles it
down_write(&mm->mmap_sem); in coredump_wait() makes sure the second
coredump is stopped and returns negative for core_waiters
I will debug further, thanks for confirming that kernel handles this scenario
On Tue, Feb 17, 2015 at 9:57 PM, Sudharsan Vijayaraghavan
<sudvijayr@gmail.com> wrote:
> We are doing prototype so much change have gone into kernel , we are
> finding it difficult to upgrade to latest immediately
> However I ran through the code once again, indeed kernel handles it
> down_write(&mm->mmap_sem); in coredump_wait() makes sure the second
> coredump is stopped and returns negative for core_waiters
>
> I will debug further, thanks for confirming that kernel handles this scenario
>
>
> On Tue, Feb 17, 2015 at 8:42 PM, Greg KH <greg@kroah.com> wrote:
>> On Tue, Feb 17, 2015 at 07:11:55PM +0530, Sudharsan Vijayaraghavan wrote:
>>> Hi All,
>>>
>>> We are running 3.8 kernel.
>>
>> That's pretty old and obsolete, why are you stuck with that version?
>>
>>> I have a unique scenario, where we hit on several issues in do_coredump.
>>> We have a SMP system with thousands of cores, one pthread is tied to
>>> one core. The main process containing these pthreads runs in the first
>>> core.
>>>
>>> Here is the issue # 1
>>> When one of threads core dump, we enter into do_coredump(), now one
>>> other thread in same process running in a different
>>> core can as well core dump(before SIGKILL was delivered to it as a
>>> consequence of first core dump)
>>> This gives way to entering into do_coredump more than once.
>>> Once we have two guys entering do_coredump() one can kill other with SIGKILL
>>> the result is completely unpredictable. No guarantee we will have two
>>> core files generated in the end
>>>
>>> Linux kernel does not seem to handle it at all.
>>> Adding a spin lock within do_coredump() will solve the case of
>>> multiple entries into do_coredump()
>>>
>>> I want to know whether Linux kernel really does not handle the above
>>> case or am I missing something?
>>
>> Odd, we should handle this just fine, try emailing the developers
>> responsible for this code and cc: the linux-kernel mailing list so they
>> can work it out.
>>
>> thanks,
>>
>> greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Linux do_coredump() and SMP systems
2015-02-18 6:14 ` Sudharsan Vijayaraghavan
@ 2015-02-18 16:01 ` Greg KH
2015-02-19 12:00 ` Sudharsan Vijayaraghavan
0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2015-02-18 16:01 UTC (permalink / raw)
To: kernelnewbies
On Wed, Feb 18, 2015 at 11:44:32AM +0530, Sudharsan Vijayaraghavan wrote:
> We are doing prototype so much change have gone into kernel , we are
> finding it difficult to upgrade to latest immediately
What changes are you making to the kernel that you are sticking with
such an old version (3.8 is 2 years old now, and over 155 thousand
changes have happened to the kernel since then)?
> However I ran through the code once again, indeed kernel handles it
> down_write(&mm->mmap_sem); in coredump_wait() makes sure the second
> coredump is stopped and returns negative for core_waiters
Great, so it works now?
confused,
greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Linux do_coredump() and SMP systems
2015-02-18 16:01 ` Greg KH
@ 2015-02-19 12:00 ` Sudharsan Vijayaraghavan
0 siblings, 0 replies; 5+ messages in thread
From: Sudharsan Vijayaraghavan @ 2015-02-19 12:00 UTC (permalink / raw)
To: kernelnewbies
Hi Greg,
There is plan to move to 3.14, right now the focus it to iron out
existing issues.
Now with regard to core dump issue, we find 10% of times we get struck in
coredump_wait():
==> wait_for_completion(&core_state->startup);
Analyzing exit_mm() to see what is going wrong here.
I have one other question, which am curious about,
In coredump_wait():
There is loop to wait for task inactive (no task is running on any core)
ptr = core_state->dumper.next;
while (ptr != NULL) {
pr_err("pid: %d %s() calling
wait_task_inactive() for pid : %d\n",
tsk->pid,__func__,ptr->task->pid);
wait_task_inactive(ptr->task, 0);
pr_err("pid: %d %s() wait_task_inactive()
returned for pid : %d\n",
tsk->pid,__func__,ptr->task->pid);
ptr = ptr->next;
}
There is a delay between the crash and actual generation of core dump
due to the above loop.
In a multicore system it is quite possible other threads of the same
process can run in other cores, as a consequence
the address space / program counter etc., can change
Given this coredump generated will not reflect the state of process
(various thread registers/mm) as it must have been at time of crash
(any thread/main process)
Is my understanding correct? Just probing on way to get rid of this discrepancy
Thanks,
Sudharsan
Thanks,
Sudharsan
On Wed, Feb 18, 2015 at 9:31 PM, Greg KH <greg@kroah.com> wrote:
> On Wed, Feb 18, 2015 at 11:44:32AM +0530, Sudharsan Vijayaraghavan wrote:
>> We are doing prototype so much change have gone into kernel , we are
>> finding it difficult to upgrade to latest immediately
>
> What changes are you making to the kernel that you are sticking with
> such an old version (3.8 is 2 years old now, and over 155 thousand
> changes have happened to the kernel since then)?
>> However I ran through the code once again, indeed kernel handles it
>> down_write(&mm->mmap_sem); in coredump_wait() makes sure the second
>> coredump is stopped and returns negative for core_waiters
>
> Great, so it works now?
>
> confused,
>
> greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-02-19 12:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-17 13:41 Linux do_coredump() and SMP systems Sudharsan Vijayaraghavan
2015-02-17 15:12 ` Greg KH
[not found] ` <CAP0SO-GBmpMS168SRNAdFmStknC=+4EuAENCvuiaX=mXRvr7hg@mail.gmail.com>
2015-02-18 6:14 ` Sudharsan Vijayaraghavan
2015-02-18 16:01 ` Greg KH
2015-02-19 12:00 ` Sudharsan Vijayaraghavan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).