AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: David.Panariti@amd.com, "Michel Dänzer" <michel@daenzer.net>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	oleg@redhat.com, amd-gfx@lists.freedesktop.org,
	Alexander.Deucher@amd.com, akpm@linux-foundation.org,
	Christian.Koenig@amd.com
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Date: Wed, 25 Apr 2018 10:29:47 -0500	[thread overview]
Message-ID: <87a7trwbh0.fsf@xmission.com> (raw)
In-Reply-To: <94828a42-02dd-29ad-a3d0-dc4c0cc82ddb@amd.com> (Andrey Grodzovsky's message of "Wed, 25 Apr 2018 09:08:08 -0400")

Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> writes:

> On 04/25/2018 03:14 AM, Daniel Vetter wrote:
>> On Tue, Apr 24, 2018 at 05:37:08PM -0400, Andrey Grodzovsky wrote:
>>>
>>> On 04/24/2018 05:21 PM, Eric W. Biederman wrote:
>>>> Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> writes:
>>>>
>>>>> On 04/24/2018 03:44 PM, Daniel Vetter wrote:
>>>>>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote:
>>>>>>> Adding the dri-devel list, since this is driver independent code.
>>>>>>>
>>>>>>>
>>>>>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote:
>>>>>>>> Avoid calling wait_event_killable when you are possibly being called
>>>>>>>> from get_signal routine since in that case you end up in a deadlock
>>>>>>>> where you are alreay blocked in singla processing any trying to wait
>>>>>>> Multiple typos here, "[...] already blocked in signal processing and [...]"?
>>>>>>>
>>>>>>>
>>>>>>>> on a new signal.
>>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>>>>>>>>     1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>>> index 088ff2b..09fd258 100644
>>>>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>>>>>>>>     		return;
>>>>>>>>     	/**
>>>>>>>>     	 * The client will not queue more IBs during this fini, consume existing
>>>>>>>> -	 * queued IBs or discard them on SIGKILL
>>>>>>>> +	 * queued IBs or discard them when in death signal state since
>>>>>>>> +	 * wait_event_killable can't receive signals in that state.
>>>>>>>>     	*/
>>>>>>>> -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>>>>>>>> +	if (current->flags & PF_SIGNALED)
>>>>>> You want fatal_signal_pending() here, instead of inventing your own broken
>>>>>> version.
>>>>> I rely on current->flags & PF_SIGNALED because this being set from
>>>>> within get_signal,
>>>> It doesn't mean that.  Unless you are called by do_coredump (you
>>>> aren't).
>>> Looking in latest code here
>>> https://elixir.bootlin.com/linux/v4.17-rc2/source/kernel/signal.c#L2449
>>> i see that current->flags |= PF_SIGNALED; is out side of
>>> if (sig_kernel_coredump(signr)) {...} scope
>> Ok I read some more about this, and I guess you go through process exit
>> and then eventually close. But I'm not sure.
>>
>> The code in drm_sched_entity_fini also looks strange: You unpark the
>> scheduler thread before you remove all the IBs. At least from the comment
>> that doesn't sound like what you want to do.
>
> I think it should be safe for the dying scheduler entity since before that (in
> drm_sched_entity_do_release) we set it's runqueue to NULL
> so no new jobs will be dequeued form it by the scheduler thread.
>
>>
>> But in general, PF_SIGNALED is really something deeply internal to the
>> core (used for some book-keeping and accounting). The drm scheduler is the
>> only thing looking at it, so smells like a layering violation. I suspect
>> (but without knowing what you're actually trying to achive here can't be
>> sure) you want to look at something else.
>>
>> E.g. PF_EXITING seems to be used in a lot more places to cancel stuff
>> that's no longer relevant when a task exits, not PF_SIGNALED. There's the
>> TIF_MEMDIE flag if you're hacking around issues with the oom-killer.
>>
>> This here on the other hand looks really fragile, and probably only does
>> what you want to do by accident.
>> -Daniel
>
> Yes , that what Eric also said and in the V2 patches i will try  to change
> PF_EXITING
>
> Another issue is changing wait_event_killable to wait_event_timeout where I need
> to understand
> what TO value is acceptable for all the drivers using the scheduler, or maybe it
> should come as a property
> of drm_sched_entity.

It would not surprise me if you could pick a large value like 1 second
and issue a warning if that time outever triggers.  It sounds like the
condition where we wait indefinitely today is because something went
wrong in the driver.

Eric

  reply	other threads:[~2018-04-25 15:29 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24 15:30 Avoid uninterruptible sleep during process exit Andrey Grodzovsky
     [not found] ` <1524583836-12130-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-04-24 15:30   ` [PATCH 1/3] signals: Allow generation of SIGKILL to exiting task Andrey Grodzovsky
2018-04-24 16:10     ` Eric W. Biederman
2018-04-24 16:42     ` Eric W. Biederman
     [not found]       ` <87y3hca73s.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 16:51         ` Andrey Grodzovsky
2018-04-24 17:29           ` Eric W. Biederman
2018-04-25 13:13     ` Oleg Nesterov
2018-04-24 15:30   ` [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process Andrey Grodzovsky
2018-04-24 15:46     ` Michel Dänzer
     [not found]       ` <7313704c-0693-0bb9-8818-99cd2b7c0ca0-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-04-24 15:51         ` Andrey Grodzovsky
2018-04-24 19:44         ` Daniel Vetter
2018-04-24 21:00           ` Eric W. Biederman
     [not found]           ` <20180424194418.GE25142-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2018-04-24 21:02             ` Andrey Grodzovsky
2018-04-24 21:21               ` Eric W. Biederman
     [not found]                 ` <87tvs05mik.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 21:37                   ` Andrey Grodzovsky
2018-04-24 22:11                     ` Eric W. Biederman
2018-04-25  7:14                     ` Daniel Vetter
2018-04-25 13:08                       ` Andrey Grodzovsky
2018-04-25 15:29                         ` Eric W. Biederman [this message]
2018-04-25 16:13                           ` Andrey Grodzovsky
2018-04-25 16:31                             ` Eric W. Biederman
2018-04-24 21:40               ` Daniel Vetter
2018-04-25 13:22                 ` Oleg Nesterov
2018-04-25 13:36                   ` Daniel Vetter
2018-04-25 14:18                     ` Oleg Nesterov
2018-04-25 13:43                 ` Andrey Grodzovsky
2018-04-24 15:52       ` Andrey Grodzovsky
2018-04-24 16:23     ` Eric W. Biederman
     [not found]       ` <87muxsbmkp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 16:43         ` Andrey Grodzovsky
2018-04-24 17:12           ` Eric W. Biederman
2018-04-25 13:55             ` Oleg Nesterov
     [not found]               ` <20180425135552.GD7592-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-25 14:21                 ` Andrey Grodzovsky
2018-04-25 17:17                   ` Oleg Nesterov
     [not found]                     ` <20180425171757.GA10441-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-25 18:40                       ` Andrey Grodzovsky
2018-04-26  0:01                         ` Eric W. Biederman
     [not found]                           ` <874ljyu98e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-26 12:34                             ` Andrey Grodzovsky
     [not found]                               ` <611911a3-2858-200c-d5f8-679c5f41ee3a-5C7GfCeVMHo@public.gmane.org>
2018-04-26 12:52                                 ` Andrey Grodzovsky
2018-04-26 15:57                                   ` Eric W. Biederman
     [not found]                                     ` <87zi1qq7t1.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-26 20:43                                       ` Andrey Grodzovsky
2018-04-30 12:08                             ` Christian König
     [not found]                               ` <c3c9787d-b279-8169-43d1-74eeb666ffbd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-30 14:32                                 ` Andrey Grodzovsky
     [not found]                                   ` <bceb1a1b-c453-782d-5a7d-40fa2f22c813-5C7GfCeVMHo@public.gmane.org>
2018-04-30 15:25                                     ` Christian König
2018-04-30 16:00                                   ` Oleg Nesterov
     [not found]                                     ` <20180430160006.GB10583-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-04-30 16:10                                       ` Andrey Grodzovsky
     [not found]                                         ` <e5b0221d-84ba-10ff-4a58-4fa27c99650f-5C7GfCeVMHo@public.gmane.org>
2018-04-30 18:29                                           ` Christian König
     [not found]                                             ` <bb224134-7ccb-cc87-9a71-3ef1743eb074-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-30 19:28                                               ` Andrey Grodzovsky
     [not found]                                                 ` <79b2ce10-2cd7-b6f2-551e-0b4ae21072af-5C7GfCeVMHo@public.gmane.org>
2018-05-02 11:48                                                   ` Christian König
2018-05-01 14:35                                         ` Oleg Nesterov
     [not found]                                           ` <20180501143524.GA13017-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-05-23 15:08                                             ` Andrey Grodzovsky
2018-04-30 15:29                               ` Oleg Nesterov
2018-04-30 16:25                               ` Eric W. Biederman
     [not found]                                 ` <87k1so8xv8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-30 17:18                                   ` Andrey Grodzovsky
2018-04-25 13:05     ` Oleg Nesterov
2018-04-24 15:30   ` [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang Andrey Grodzovsky
2018-04-24 16:14     ` Eric W. Biederman
     [not found]       ` <8736zkd1jz.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-24 16:38         ` Andrey Grodzovsky
     [not found]     ` <1524583836-12130-4-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-04-24 15:52       ` Panariti, David
     [not found]         ` <DM5PR12MB244017F98FC732EB5DD86E0395880-2J9CzHegvk/fqmGed1UJxwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-04-24 15:58           ` Andrey Grodzovsky
     [not found]             ` <b4309ea4-c1a0-5811-040b-4390ce6f297f-5C7GfCeVMHo@public.gmane.org>
2018-04-24 16:20               ` Panariti, David
2018-04-24 16:30                 ` Eric W. Biederman
     [not found]                   ` <87bme8bm9g.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-25 17:17                     ` Andrey Grodzovsky
2018-04-25 20:55                       ` Eric W. Biederman
     [not found]                         ` <87h8nzt39f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-04-26 12:28                           ` Andrey Grodzovsky
2018-04-30 11:34       ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a7trwbh0.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=David.Panariti@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michel@daenzer.net \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox