From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oleg Nesterov Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process. Date: Tue, 1 May 2018 16:35:24 +0200 Message-ID: <20180501143524.GA13017@redhat.com> References: <877eowa5qh.fsf@xmission.com> <20180425135552.GD7592@redhat.com> <20180425171757.GA10441@redhat.com> <874ljyu98e.fsf@xmission.com> <20180430160006.GB10583@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Andrey Grodzovsky Cc: christian.koenig@amd.com, "Eric W. Biederman" , David.Panariti@amd.com, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Alexander.Deucher@amd.com, akpm@linux-foundation.org List-Id: amd-gfx.lists.freedesktop.org On 04/30, Andrey Grodzovsky wrote: > > On 04/30/2018 12:00 PM, Oleg Nesterov wrote: > >On 04/30, Andrey Grodzovsky wrote: > >>What about changing PF_SIGNALED to  PF_EXITING in > >>drm_sched_entity_do_release > >> > >>-       if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) > >>+      if ((current->flags & PF_EXITING) && current->exit_code == SIGKILL) > >let me repeat, please don't use task->exit_code. And in fact this check is racy > > > >But this doesn't matter. Say, we can trivially add SIGNAL_GROUP_KILLED_BY_SIGKILL, > >or do something else, > > Can you explain where is the race and what is a possible alternative then ? Oh. I mentioned this race automatically, because I am pedant ;) Let me repeat that this doesn't really matter, and let me remind that the caller of fop->release can be completely unrelated process, say $cat /proc/pid/fdinfo. And in any case ->exit_code should not be used outside of ptrace/exit paths. OK, the race. Consider a process P with a main thread M and a sub-thread T. T does pthread_exit(), enters do_exit() and gets a preemption before exit_files(). The process is killed by SIGKILL. M calls do_group_exit(), do_exit() and passes exit_files(). However, it doesn't call close_files() because T has another reference. T resumes, calls close_files(), fput(), etc, and then exit_task_work(), so it can finally call ->release() with current->exit_code == 0 desptite the fact the process was killed. Again, again, this doesn't matter. We can distinguish killed-or-not, by SIGKILL- or-not. But I still do not think we actually need this. At least in ->release() paths, ->flush() may differ. Oleg.