linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: "Michel Dänzer" <michel@daenzer.net>,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, David.Panariti@amd.com,
	oleg@redhat.com, Alexander.Deucher@amd.com,
	akpm@linux-foundation.org, Christian.Koenig@amd.com
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Date: Tue, 24 Apr 2018 17:11:44 -0500	[thread overview]
Message-ID: <87a7ts2r1b.fsf@xmission.com> (raw)
In-Reply-To: <27d7d15b-f7c3-2a0a-af85-eb243526ac88@amd.com> (Andrey Grodzovsky's message of "Tue, 24 Apr 2018 17:37:08 -0400")

Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> writes:

> On 04/24/2018 05:21 PM, Eric W. Biederman wrote:
>> Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> writes:
>>
>>> On 04/24/2018 03:44 PM, Daniel Vetter wrote:
>>>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote:
>>>>> Adding the dri-devel list, since this is driver independent code.
>>>>>
>>>>>
>>>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote:
>>>>>> Avoid calling wait_event_killable when you are possibly being called
>>>>>> from get_signal routine since in that case you end up in a deadlock
>>>>>> where you are alreay blocked in singla processing any trying to wait
>>>>> Multiple typos here, "[...] already blocked in signal processing and [...]"?
>>>>>
>>>>>
>>>>>> on a new signal.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>>>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>> index 088ff2b..09fd258 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>>>>>>    		return;
>>>>>>    	/**
>>>>>>    	 * The client will not queue more IBs during this fini, consume existing
>>>>>> -	 * queued IBs or discard them on SIGKILL
>>>>>> +	 * queued IBs or discard them when in death signal state since
>>>>>> +	 * wait_event_killable can't receive signals in that state.
>>>>>>    	*/
>>>>>> -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>>>>>> +	if (current->flags & PF_SIGNALED)
>>>> You want fatal_signal_pending() here, instead of inventing your own broken
>>>> version.
>>> I rely on current->flags & PF_SIGNALED because this being set from
>>> within get_signal,
>> It doesn't mean that.  Unless you are called by do_coredump (you
>> aren't).
>
> Looking in latest code here
> https://elixir.bootlin.com/linux/v4.17-rc2/source/kernel/signal.c#L2449
> i see that current->flags |= PF_SIGNALED; is out side of
> if (sig_kernel_coredump(signr)) {...} scope

In small words.  You showed me the backtrace and I have read
the code.

PF_SIGNALED means you got killed by a signal.
get_signal
  do_coredump
  do_group_exit
    do_exit
       exit_signals
          sets PF_EXITING
       exit_mm
          calls fput on mmaps
             calls sched_task_work
       exit_files
          calls fput on open files
             calls sched_task_work
       exit_task_work
          task_work_run
             /* you are here */

So strictly speaking you are inside of get_signal it is not
meaningful to speak of yourself as within get_signal.

I am a little surprised to see task_work_run called so early.
I was mostly expecting it to happen when the dead task was
scheduling away, like normally happens.

Testing for PF_SIGNALED does not give you anything at all
that testing for PF_EXITING (the flag that signal handling
is shutdown) does not get you.

There is no point in distinguishing PF_SIGNALED from any other
path to do_exit.  do_exit never returns.

The task is dead.

Blocking indefinitely while shutting down a task is a bad idea.
Blocking indefinitely while closing a file descriptor is a bad idea.

The task has been killed it can't get more dead.  SIGKILL is meaningless
at this point.

So you need a timeout, or not to wait at all.


Eric

  reply	other threads:[~2018-04-24 22:13 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24 15:30 Avoid uninterruptible sleep during process exit Andrey Grodzovsky
2018-04-24 15:30 ` [PATCH 1/3] signals: Allow generation of SIGKILL to exiting task Andrey Grodzovsky
2018-04-24 16:10   ` Eric W. Biederman
2018-04-24 16:42   ` Eric W. Biederman
2018-04-24 16:51     ` Andrey Grodzovsky
2018-04-24 17:29       ` Eric W. Biederman
2018-04-25 13:13   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process Andrey Grodzovsky
2018-04-24 15:46   ` Michel Dänzer
2018-04-24 15:51     ` Andrey Grodzovsky
2018-04-24 15:52     ` Andrey Grodzovsky
2018-04-24 19:44     ` Daniel Vetter
2018-04-24 21:00       ` Eric W. Biederman
2018-04-24 21:02       ` Andrey Grodzovsky
2018-04-24 21:21         ` Eric W. Biederman
2018-04-24 21:37           ` Andrey Grodzovsky
2018-04-24 22:11             ` Eric W. Biederman [this message]
2018-04-25  7:14             ` Daniel Vetter
2018-04-25 13:08               ` Andrey Grodzovsky
2018-04-25 15:29                 ` Eric W. Biederman
     [not found]                   ` <311660b9-9e46-b960-3088-06e16ac3838d@amd.com>
2018-04-25 16:31                     ` Eric W. Biederman
2018-04-24 21:40         ` Daniel Vetter
2018-04-25 13:22           ` Oleg Nesterov
2018-04-25 13:36             ` Daniel Vetter
2018-04-25 14:18               ` Oleg Nesterov
2018-04-25 13:43           ` Andrey Grodzovsky
2018-04-24 16:23   ` Eric W. Biederman
2018-04-24 16:43     ` Andrey Grodzovsky
2018-04-24 17:12       ` Eric W. Biederman
2018-04-25 13:55         ` Oleg Nesterov
2018-04-25 14:21           ` Andrey Grodzovsky
2018-04-25 17:17             ` Oleg Nesterov
2018-04-25 18:40               ` Andrey Grodzovsky
2018-04-26  0:01                 ` Eric W. Biederman
2018-04-26 12:34                   ` Andrey Grodzovsky
2018-04-26 12:52                     ` Andrey Grodzovsky
2018-04-26 15:57                       ` Eric W. Biederman
2018-04-26 20:43                         ` Andrey Grodzovsky
2018-04-30 12:08                   ` Christian König
2018-04-30 14:32                     ` Andrey Grodzovsky
2018-04-30 15:25                       ` Christian König
2018-04-30 16:00                       ` Oleg Nesterov
2018-04-30 16:10                         ` Andrey Grodzovsky
2018-04-30 18:29                           ` Christian König
2018-04-30 19:28                             ` Andrey Grodzovsky
2018-05-02 11:48                               ` Christian König
2018-05-01 14:35                           ` Oleg Nesterov
2018-05-23 15:08                             ` Andrey Grodzovsky
2018-04-30 15:29                     ` Oleg Nesterov
2018-04-30 16:25                     ` Eric W. Biederman
2018-04-30 17:18                       ` Andrey Grodzovsky
2018-04-25 13:05   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang Andrey Grodzovsky
2018-04-24 15:52   ` Panariti, David
2018-04-24 15:58     ` Andrey Grodzovsky
2018-04-24 16:20       ` Panariti, David
2018-04-24 16:30         ` Eric W. Biederman
2018-04-25 17:17           ` Andrey Grodzovsky
2018-04-25 20:55             ` Eric W. Biederman
2018-04-26 12:28               ` Andrey Grodzovsky
2018-04-24 16:14   ` Eric W. Biederman
2018-04-24 16:38     ` Andrey Grodzovsky
2018-04-30 11:34   ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a7ts2r1b.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=David.Panariti@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michel@daenzer.net \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).