All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	Alexander.Deucher@amd.com, Christian.Koenig@amd.com,
	David.Panariti@amd.com, oleg@redhat.com,
	akpm@linux-foundation.org
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Date: Tue, 24 Apr 2018 12:12:22 -0500	[thread overview]
Message-ID: <877eowa5qh.fsf@xmission.com> (raw)
In-Reply-To: <8840ac96-50c4-f94d-eb7c-f007940163f3@amd.com> (Andrey Grodzovsky's message of "Tue, 24 Apr 2018 12:43:28 -0400")

Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> writes:

> On 04/24/2018 12:23 PM, Eric W. Biederman wrote:
>> Andrey Grodzovsky <andrey.grodzovsky@amd.com> writes:
>>
>>> Avoid calling wait_event_killable when you are possibly being called
>>> from get_signal routine since in that case you end up in a deadlock
>>> where you are alreay blocked in singla processing any trying to wait
>>> on a new signal.
>> I am curious what the call path that is problematic here.
>
> Here is the problematic call stack
>
> [<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched]
> [<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu]
> [<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu]
> [<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu]
> [<0>] drm_release+0x414/0x5b0 [drm]
> [<0>] __fput+0x176/0x350
> [<0>] task_work_run+0xa1/0xc0
> [<0>] do_exit+0x48f/0x1280
> [<0>] do_group_exit+0x89/0x140
> [<0>] get_signal+0x375/0x8f0
> [<0>] do_signal+0x79/0xaa0
> [<0>] exit_to_usermode_loop+0x83/0xd0
> [<0>] do_syscall_64+0x244/0x270
> [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [<0>] 0xffffffffffffffff
>
> On exit from system call you process all the signals you received and
> encounter a fatal signal which triggers process termination.
>
>>
>> In general waiting seems wrong when the process has already been
>> fatally killed as indicated by PF_SIGNALED.
>
> So indeed this patch avoids wait in this case.
>
>>
>> Returning -ERESTARTSYS seems wrong as nothing should make it back even
>> to the edge of userspace here.
>
> Can you clarify please - what should be returned here instead ?

__fput does not have a return code.  I don't see the return code of
release being used anywhere.  So any return code is going to be lost.
So maybe something that talks to the drm/kernel layer but don't expect
your system call to be restarted, which is what -ERESTARTSYS asks for.

Hmm.  When looking at the code that is merged versus whatever your
patch is against it gets even clearer.  The -ERESTARTSYS
return code doesn't even get out of drm_sched_entity_fini.

Caring at all about process state at that point is wrong, as except for
being in ``process'' context where you can sleep nothing is connected to
a process.

Let me respectfully suggest that the wait_event_killable on that code
path is wrong.  Possibly you want a wait_event_timeout if you are very
nice.  But the code already has the logic necessary to handle what
happens if it can't sleep.

So I think the justification needs to be why you are trying to sleep
there at all.

The progress guarantee needs to come from the gpu layer or the AMD
driver not from someone getting impatient and sending SIGKILL to
a dead process.


Eric


>>
>> Given that this is the only use of PF_SIGNALED outside of bsd process
>> accounting I find this code very suspicious.
>>
>> It looks the code path that gets called during exit is buggy and needs
>> to be sorted out.
>>
>> Eric
>>
>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>> index 088ff2b..09fd258 100644
>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>>>   		return;
>>>   	/**
>>>   	 * The client will not queue more IBs during this fini, consume existing
>>> -	 * queued IBs or discard them on SIGKILL
>>> +	 * queued IBs or discard them when in death signal state since
>>> +	 * wait_event_killable can't receive signals in that state.
>>>   	*/
>>> -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>>> +	if (current->flags & PF_SIGNALED)
>>>   		entity->fini_status = -ERESTARTSYS;
>>>   	else
>>>   		entity->fini_status = wait_event_killable(sched->job_scheduled,

  reply	other threads:[~2018-04-24 17:12 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24 15:30 Avoid uninterruptible sleep during process exit Andrey Grodzovsky
2018-04-24 15:30 ` Andrey Grodzovsky
     [not found] ` <1524583836-12130-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-04-24 15:30   ` [PATCH 1/3] signals: Allow generation of SIGKILL to exiting task Andrey Grodzovsky
2018-04-24 15:30     ` Andrey Grodzovsky
2018-04-24 16:10     ` Eric W. Biederman
2018-04-24 16:10       ` Eric W. Biederman
2018-04-24 16:42     ` Eric W. Biederman
2018-04-24 16:42       ` Eric W. Biederman
     [not found]       ` <87y3hca73s.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 16:51         ` Andrey Grodzovsky
2018-04-24 16:51           ` Andrey Grodzovsky
2018-04-24 17:29           ` Eric W. Biederman
2018-04-25 13:13     ` Oleg Nesterov
2018-04-24 15:30   ` [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process Andrey Grodzovsky
2018-04-24 15:30     ` Andrey Grodzovsky
2018-04-24 15:46     ` Michel Dänzer
2018-04-24 15:52       ` Andrey Grodzovsky
2018-04-24 15:52         ` Andrey Grodzovsky
     [not found]       ` <7313704c-0693-0bb9-8818-99cd2b7c0ca0-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-04-24 15:51         ` Andrey Grodzovsky
2018-04-24 15:51           ` Andrey Grodzovsky
2018-04-24 19:44         ` Daniel Vetter
2018-04-24 19:44           ` Daniel Vetter
2018-04-24 21:00           ` Eric W. Biederman
     [not found]           ` <20180424194418.GE25142-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2018-04-24 21:02             ` Andrey Grodzovsky
2018-04-24 21:02               ` Andrey Grodzovsky
2018-04-24 21:21               ` Eric W. Biederman
     [not found]                 ` <87tvs05mik.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 21:37                   ` Andrey Grodzovsky
2018-04-24 21:37                     ` Andrey Grodzovsky
2018-04-24 22:11                     ` Eric W. Biederman
2018-04-25  7:14                     ` Daniel Vetter
2018-04-25 13:08                       ` Andrey Grodzovsky
2018-04-25 13:08                         ` Andrey Grodzovsky
2018-04-25 15:29                         ` Eric W. Biederman
2018-04-25 16:13                           ` Andrey Grodzovsky
2018-04-25 16:31                             ` Eric W. Biederman
2018-04-24 21:40               ` Daniel Vetter
2018-04-24 21:40                 ` Daniel Vetter
2018-04-25 13:22                 ` Oleg Nesterov
2018-04-25 13:36                   ` Daniel Vetter
2018-04-25 14:18                     ` Oleg Nesterov
2018-04-25 14:18                       ` Oleg Nesterov
2018-04-25 13:43                 ` Andrey Grodzovsky
2018-04-25 13:43                   ` Andrey Grodzovsky
2018-04-24 16:23     ` Eric W. Biederman
2018-04-24 16:23       ` Eric W. Biederman
     [not found]       ` <87muxsbmkp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 16:43         ` Andrey Grodzovsky
2018-04-24 16:43           ` Andrey Grodzovsky
2018-04-24 17:12           ` Eric W. Biederman [this message]
2018-04-25 13:55             ` Oleg Nesterov
     [not found]               ` <20180425135552.GD7592-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-25 14:21                 ` Andrey Grodzovsky
2018-04-25 14:21                   ` Andrey Grodzovsky
2018-04-25 17:17                   ` Oleg Nesterov
     [not found]                     ` <20180425171757.GA10441-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-25 18:40                       ` Andrey Grodzovsky
2018-04-25 18:40                         ` Andrey Grodzovsky
2018-04-26  0:01                         ` Eric W. Biederman
     [not found]                           ` <874ljyu98e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-26 12:34                             ` Andrey Grodzovsky
2018-04-26 12:34                               ` Andrey Grodzovsky
     [not found]                               ` <611911a3-2858-200c-d5f8-679c5f41ee3a-5C7GfCeVMHo@public.gmane.org>
2018-04-26 12:52                                 ` Andrey Grodzovsky
2018-04-26 12:52                                   ` Andrey Grodzovsky
2018-04-26 15:57                                   ` Eric W. Biederman
     [not found]                                     ` <87zi1qq7t1.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-26 20:43                                       ` Andrey Grodzovsky
2018-04-26 20:43                                         ` Andrey Grodzovsky
2018-04-30 12:08                             ` Christian König
2018-04-30 12:08                               ` Christian König
     [not found]                               ` <c3c9787d-b279-8169-43d1-74eeb666ffbd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-30 14:32                                 ` Andrey Grodzovsky
2018-04-30 14:32                                   ` Andrey Grodzovsky
     [not found]                                   ` <bceb1a1b-c453-782d-5a7d-40fa2f22c813-5C7GfCeVMHo@public.gmane.org>
2018-04-30 15:25                                     ` Christian König
2018-04-30 15:25                                       ` Christian König
2018-04-30 16:00                                   ` Oleg Nesterov
     [not found]                                     ` <20180430160006.GB10583-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-30 16:10                                       ` Andrey Grodzovsky
2018-04-30 16:10                                         ` Andrey Grodzovsky
     [not found]                                         ` <e5b0221d-84ba-10ff-4a58-4fa27c99650f-5C7GfCeVMHo@public.gmane.org>
2018-04-30 18:29                                           ` Christian König
2018-04-30 18:29                                             ` Christian König
     [not found]                                             ` <bb224134-7ccb-cc87-9a71-3ef1743eb074-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-30 19:28                                               ` Andrey Grodzovsky
2018-04-30 19:28                                                 ` Andrey Grodzovsky
     [not found]                                                 ` <79b2ce10-2cd7-b6f2-551e-0b4ae21072af-5C7GfCeVMHo@public.gmane.org>
2018-05-02 11:48                                                   ` Christian König
2018-05-02 11:48                                                     ` Christian König
2018-05-17 11:18                                                     ` Andrey Grodzovsky
2018-05-17 14:48                                                       ` Michel Dänzer
2018-05-17 15:33                                                         ` Andrey Grodzovsky
2018-05-17 15:52                                                           ` Michel Dänzer
2018-05-17 19:05                                                         ` Andrey Grodzovsky
2018-05-18  8:46                                                           ` Michel Dänzer
2018-05-18  9:42                                                             ` Christian König
2018-05-18 14:44                                                               ` Michel Dänzer
2018-05-18 14:50                                                                 ` Christian König
2018-05-18 15:02                                                                   ` Andrey Grodzovsky
2018-05-22 12:58                                                                     ` Christian König
2018-05-22 15:49                                                             ` Andrey Grodzovsky
2018-05-22 16:09                                                               ` Michel Dänzer
2018-05-22 16:30                                                                 ` Andrey Grodzovsky
2018-05-22 16:33                                                                   ` Michel Dänzer
2018-05-22 16:37                                                                     ` Andrey Grodzovsky
2018-05-01 14:35                                         ` Oleg Nesterov
     [not found]                                           ` <20180501143524.GA13017-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-05-23 15:08                                             ` Andrey Grodzovsky
2018-05-23 15:08                                               ` Andrey Grodzovsky
2018-04-30 15:29                               ` Oleg Nesterov
2018-04-30 16:25                               ` Eric W. Biederman
     [not found]                                 ` <87k1so8xv8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-30 17:18                                   ` Andrey Grodzovsky
2018-04-30 17:18                                     ` Andrey Grodzovsky
2018-04-25 13:05     ` Oleg Nesterov
2018-04-24 15:30   ` [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang Andrey Grodzovsky
2018-04-24 15:30     ` Andrey Grodzovsky
2018-04-24 16:14     ` Eric W. Biederman
2018-04-24 16:14       ` Eric W. Biederman
     [not found]       ` <8736zkd1jz.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 16:38         ` Andrey Grodzovsky
2018-04-24 16:38           ` Andrey Grodzovsky
     [not found]     ` <1524583836-12130-4-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-04-24 15:52       ` Panariti, David
2018-04-24 15:52         ` Panariti, David
     [not found]         ` <DM5PR12MB244017F98FC732EB5DD86E0395880-2J9CzHegvk/fqmGed1UJxwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-04-24 15:58           ` Andrey Grodzovsky
2018-04-24 15:58             ` Andrey Grodzovsky
     [not found]             ` <b4309ea4-c1a0-5811-040b-4390ce6f297f-5C7GfCeVMHo@public.gmane.org>
2018-04-24 16:20               ` Panariti, David
2018-04-24 16:20                 ` Panariti, David
2018-04-24 16:30                 ` Eric W. Biederman
2018-04-24 16:30                   ` Eric W. Biederman
     [not found]                   ` <87bme8bm9g.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-25 17:17                     ` Andrey Grodzovsky
2018-04-25 17:17                       ` Andrey Grodzovsky
2018-04-25 20:55                       ` Eric W. Biederman
2018-04-25 20:55                         ` Eric W. Biederman
     [not found]                         ` <87h8nzt39f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-26 12:28                           ` Andrey Grodzovsky
2018-04-26 12:28                             ` Andrey Grodzovsky
2018-04-30 11:34       ` Christian König
2018-04-30 11:34         ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877eowa5qh.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=David.Panariti@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.