From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01468C433F5 for ; Mon, 17 Jan 2022 16:09:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237453AbiAQQJ4 convert rfc822-to-8bit (ORCPT ); Mon, 17 Jan 2022 11:09:56 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:58430 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240818AbiAQQJk (ORCPT ); Mon, 17 Jan 2022 11:09:40 -0500 Received: from in02.mta.xmission.com ([166.70.13.52]:38868) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1n9UZe-00HJYy-1g; Mon, 17 Jan 2022 09:09:38 -0700 Received: from ip68-110-24-146.om.om.cox.net ([68.110.24.146]:44298 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1n9UZc-00FVMc-Oj; Mon, 17 Jan 2022 09:09:37 -0700 From: "Eric W. Biederman" To: Olivier Langlois Cc: Linus Torvalds , Heiko Carstens , Linux Kernel Mailing List , "" , Linux API , Alexey Gladkov , Kyle Huey , Oleg Nesterov , Kees Cook , Al Viro , Jens Axboe , Pavel Begunkov References: <87a6ha4zsd.fsf@email.froward.int.ebiederm.org> <20211213225350.27481-1-ebiederm@xmission.com> <87sfu3b7wm.fsf@email.froward.int.ebiederm.org> <87ilurwjju.fsf@email.froward.int.ebiederm.org> <87o84juwhg.fsf@email.froward.int.ebiederm.org> <57dfc87c7dd5a2f9f9841bba1185336016595ef7.camel@trillion01.com> <87lezmrxlq.fsf@email.froward.int.ebiederm.org> <87mtk2qf5s.fsf@email.froward.int.ebiederm.org> <87h7a5kgan.fsf@email.froward.int.ebiederm.org> <991211d94c6dc0ad3501cd9f830cdee916b982b3.camel@trillion01.com> Date: Mon, 17 Jan 2022 10:09:28 -0600 In-Reply-To: <991211d94c6dc0ad3501cd9f830cdee916b982b3.camel@trillion01.com> (Olivier Langlois's message of "Sat, 15 Jan 2022 14:23:34 -0500") Message-ID: <87ee56e43r.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=1n9UZc-00FVMc-Oj;;;mid=<87ee56e43r.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.110.24.146;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19RCqiXptE75sBjTr0WM3bA53Ca83Pe8DA= X-SA-Exim-Connect-IP: 68.110.24.146 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org Olivier Langlois writes: > On Fri, 2022-01-14 at 18:12 -0600, Eric W. Biederman wrote: >> Linus Torvalds writes: >> >> > On Tue, Jan 11, 2022 at 10:51 AM Eric W. Biederman >> > wrote: >> > > >> > > +       while ((n == -ERESTARTSYS) && >> > > test_thread_flag(TIF_NOTIFY_SIGNAL)) { >> > > +               tracehook_notify_signal(); >> > > +               n = __kernel_write(file, addr, nr, &pos); >> > > +       } >> > >> > This reads horribly wrongly to me. >> > >> > That "tracehook_notify_signal()" thing *has* to be renamed before >> > we >> > have anything like this that otherwise looks like "this will just >> > loop >> > forever". >> > >> > I'm pretty sure we've discussed that "tracehook" thing before - the >> > whole header file is misnamed, and most of the functions in theer >> > are >> > too. >> > >> > As an ugly alternative, open-code it, so that it's clear that "yup, >> > that clears the TIF_NOTIFY_SIGNAL flag". >> >> A cleaner alternative looks like to modify the pipe code to use >> wake_up_XXX instead of wake_up_interruptible_XXX and then have code >> that does pipe_write_killable instead of pipe_write_interruptible. > > Do not forget that the problem might not be limited to the pipe FS as > Oleg Nesterov pointed out here: > > https://lore.kernel.org/io-uring/20210614141032.GA13677@redhat.com/ > > This is why I did like your patch fixing __dump_emit. If the only > problem is the tracehook_notify_signal() function unclear name, that > should be addressed instead of trying to fix the problem in a different > way. It might be that the fix is to run a portion of the exit_to_userspace loop that does: if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) handle_signal_work(regs, ti_work); I am deep in brainstorm mode trying to find something that comes out clean. Oleg is right that while to be POSIX compliant and otherwise compatible with traditional unix behavior sleeps in filesystems need to be uninterruptible. NFS has not always provided that compatibility. >> There is also a question of how all of this should interact with the >> freezer, as I think changing from interruptible to killable means >> that >> the coredumps became unfreezable. >> >> I am busily simmering this on my back burner and I hope I can come up >> with something sensible. > > IMHO, fixing the problem on the emit function side has the merit of > being future proof if something else than io_uring in the future would > raise the TIF_NOTIFY_SIGNAL flag > > but I am wondering why no one commented anything about my proposal of > cancelling io_uring before generating the core dump therefore stopping > it to flip TIF_NOTIFY_SIGNAL while the core dump is generated. > > Is there something wrong with my proposed approach? > https://lore.kernel.org/lkml/cover.1629655338.git.olivier@trillion01.com/ > > It did flawlessly created many dozens of io_uring app core dumps in the > last months for me... >From my perspective I am not at all convinced that io_uring is the only culprit. Beyond that the purpose of a coredump is to snapshot the process as it is, before anything is shutdown so that someone can examine the coredump and figure out what failed. Running around changing the state of the process has a very real chance of hiding what is going wrong. Further your change requires that there be a place for io_uring to clean things up. Given that fundamentally that seems like the wrong thing to me I am not interested in making it easy to what looks like the wrong thing. All of this may be perfection being the enemy of the good (especially as your io_uring magic happens as a special case in do_coredump). My work in this area is to remove hacks so I can be convinced the code works 100% of the time so unfortunately I am not interested in pick up a change that is only good enough. Someone else like Andrew Morton might be. None of that changes the fact that tracehook_notify_signal needs to be renamed. That effects your approach and my proof of concept approach. So renaming tracehook_notify_signal just needs to be done. Eric