From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BA9CC388F9 for ; Wed, 11 Nov 2020 07:43:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 200252065C for ; Wed, 11 Nov 2020 07:43:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726091AbgKKHm5 (ORCPT ); Wed, 11 Nov 2020 02:42:57 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:52832 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725859AbgKKHm4 (ORCPT ); Wed, 11 Nov 2020 02:42:56 -0500 Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1kckmM-003IQR-Fi; Wed, 11 Nov 2020 00:42:54 -0700 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1kckmL-00084a-Bq; Wed, 11 Nov 2020 00:42:54 -0700 From: ebiederm@xmission.com (Eric W. Biederman) To: Miklos Szeredi Cc: Alexey Gladkov , LKML , linux-fsdevel@vger.kernel.org, Alexey Gladkov References: <1e796f9e008fb78fb96358ff74f39bd4865a7c88.1604926010.git.gladkov.alexey@gmail.com> <87v9ee2wer.fsf@x220.int.ebiederm.org> Date: Wed, 11 Nov 2020 01:42:43 -0600 In-Reply-To: (Miklos Szeredi's message of "Mon, 9 Nov 2020 21:24:13 +0100") Message-ID: <87d00ks5jg.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1kckmL-00084a-Bq;;;mid=<87d00ks5jg.fsf@x220.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18F4elmEH4+n1AOxCJe/UXOuE16/PqSCfY= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [RESEND PATCH v3] fuse: Abort waiting for a response if the daemon receives a fatal signal X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Miklos Szeredi writes: > On Mon, Nov 9, 2020 at 7:54 PM Eric W. Biederman wrote: >> >> Miklos Szeredi writes: >> >> > On Mon, Nov 9, 2020 at 1:48 PM Alexey Gladkov wrote: >> >> >> >> This patch removes one kind of the deadlocks inside the fuse daemon. The >> >> problem appear when the fuse daemon itself makes a file operation on its >> >> filesystem and receives a fatal signal. >> >> >> >> This deadlock can be interrupted via fusectl filesystem. But if you have >> >> many fuse mountpoints, it will be difficult to figure out which >> >> connection to break. >> >> >> >> This patch aborts the connection if the fuse server receives a fatal >> >> signal. >> > >> > The patch itself might be acceptable, but I have some questions. >> > >> > To logic of this patch says: >> > >> > "If a task having the fuse device open in it's fd table receives >> > SIGKILL (and filesystem was initially mounted in a non-init user >> > namespace), then abort the filesystem operation" >> > >> > You just say "server" instead of "task having the fuse device open in >> > it's fd table" which is sloppy to say the least. It might also lead >> > to regressions, although I agree that it's unlikely. >> > >> > Also how is this solving any security issue? Just create the request >> > loop using two fuse filesystems and the deadlock avoidance has just >> > been circumvented. So AFAICS "selling" this as a CVE fix is not >> > appropriate. >> >> The original report came in with a CVE on it. So referencing that CVE >> seems reasonable. Even if the issue isn't particularly serious. It is >> very annoying not to be able to kill processes with SIGKILL or the OOM >> killer. >> >> You have a good point about the looping issue. I wonder if there is a >> way to enhance this comparatively simple approach to prevent the more >> complex scenario you mention. > > Let's take a concrete example: > > - task A is "server" for fuse fs a > - task B is "server" for fuse fs b > - task C: chmod(/a/x, ...) > - task A: read UNLINK request > - task A: chmod(/b/x, ...) > - task B: read UNLINK request > - task B: chmod (/a/x, ...) > > Now B is blocking on i_mutex on x , A is waiting for reply from B, C > is holding i_mutex on x and waiting for reply from A. > > At this point B is truly uninterruptible (and I'm not betting large > sums on Al accepting killable VFS locks patches), so killing B is out. > > Killing A with this patch does nothing, since A does not have b's dev > fd in its fdtable. > > Killing C again does nothing, since it has no fuse dev fd at all. > >> Does tweaking the code to close every connection represented by a fuse >> file descriptor after a SIGKILL has been delevered create any problems? > > In the above example are you suggesting that SIGKILL on A would abort > "a" from fs b's code? Yeah, that would work, I guess. Poking into > another instance this way sounds pretty horrid, though. Yes. That is what I am suggesting. Layering purity it does not have. It is also fragile as it only handles interactions between fuse instances. The advantage is that it is a very small amount of code. I think there is enough care to get a small change like that in. (With a big fat comment describing why it is imperfect). I don't know if there is enough care to get the general solution (you describe below) implemented and merged in any kind of timely manner. >> > What's the reason for making this user-ns only? If we drop the >> > security aspect, then I don't see any reason not to do this >> > unconditionally. >> >> >> > Also note, there's a proper solution for making fuse requests always >> > killable, and that is to introduce a shadow locking that ensures >> > correct fs operation in the face of requests that have returned and >> > released their respective VFS locks. Now this would be a much more >> > complex solution, but also a much more correct one, not having issues >> > with correctly defining what a server is (which is not a solvable >> > problem). >> >> Is this the solution that was removed at some point from fuse, >> or are you talking about something else? >> >> I think you are talking about adding a set of fuse specific locks >> so fuse does not need to rely on the vfs locks. I don't quite have >> enough insight to see that bigger problem so if you can expand in more >> detail I would appreciate it. > > Okay, so the problem with making the wait_event() at the end of > request_wait_answer() killable is that it would allow compromising the > server's integrity by unlocking the VFS level lock (which protects the > fs) while the server hasn't yet finished the request. > > The way this would be solvable is to add a fuse level lock for each > VFS level lock. That lock would be taken before the request is sent > to userspace and would be released when the answer is received. > Normally there would be zero contention on these shadow locks, but if > a request is forcibly killed, then the VFS lock is released and the > shadow lock now protects the filesystem. > > This wouldn't solve the case where a fuse fs is deadlocked on a VFS > lock (e.g. task B), but would allow tasks blocked directly on a fuse > filesystem to be killed (e.g. task A or C, both of which would unwind > the deadlock). Are we just talking the inode lock here? I am trying to figure out if this is a straight forward change. Or if it will take a fair amount of work. If the change is just wordy we can probably do the good version and call fuse well and truly fixed. But I don't currently see the problem well enough to know what the good change would look like even on a single code path. Eric