From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753866AbdLSXgq (ORCPT ); Tue, 19 Dec 2017 18:36:46 -0500 Received: from out02.mta.xmission.com ([166.70.13.232]:43059 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753323AbdLSXgo (ORCPT ); Tue, 19 Dec 2017 18:36:44 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Al Viro Cc: Giuseppe Scrivano , Andrew Morton , LKML , alexander.deucher@amd.com, broonie@kernel.org, chris@chris-wilson.co.uk, David Miller , deepa.kernel@gmail.com, Greg KH , luc.vanoostenryck@gmail.com, lucien xin , Ingo Molnar , Neil Horman , syzkaller-bugs@googlegroups.com, Vladislav Yasevich References: <20171219101440.19736-1-gscrivan@redhat.com> <20171219114819.GQ21978@ZenIV.linux.org.uk> <20171219153225.GA14771@ZenIV.linux.org.uk> <874lomhcwb.fsf@redhat.com> <87vah2ftn8.fsf@redhat.com> <20171219201411.GT21978@ZenIV.linux.org.uk> <8737465qxn.fsf@xmission.com> <20171219224054.GV21978@ZenIV.linux.org.uk> Date: Tue, 19 Dec 2017 17:36:14 -0600 In-Reply-To: <20171219224054.GV21978@ZenIV.linux.org.uk> (Al Viro's message of "Tue, 19 Dec 2017 22:40:54 +0000") Message-ID: <87609247f5.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eRRR8-0004l3-Le;;;mid=<87609247f5.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=75.170.127.89;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19/LimV+8tam+uI0D3ncjytX8T3d1+kNBs= X-SA-Exim-Connect-IP: 75.170.127.89 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Al Viro X-Spam-Relay-Country: X-Spam-Timing: total 374 ms - load_scoreonly_sql: 0.07 (0.0%), signal_user_changed: 3.5 (0.9%), b_tie_ro: 2.4 (0.7%), parse: 1.30 (0.3%), extract_message_metadata: 17 (4.5%), get_uri_detail_list: 2.7 (0.7%), tests_pri_-1000: 7 (2.0%), tests_pri_-950: 1.68 (0.4%), tests_pri_-900: 1.40 (0.4%), tests_pri_-400: 28 (7.5%), check_bayes: 27 (7.2%), b_tokenize: 10 (2.7%), b_tok_get_all: 8 (2.2%), b_comp_prob: 3.1 (0.8%), b_tok_touch_all: 3.2 (0.9%), b_finish: 0.70 (0.2%), tests_pri_0: 305 (81.6%), check_dkim_signature: 0.61 (0.2%), check_dkim_adsp: 4.5 (1.2%), tests_pri_500: 4.1 (1.1%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH linux-next] mqueue: fix IPC namespace use-after-free X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Al Viro writes: > On Tue, Dec 19, 2017 at 03:49:24PM -0600, Eric W. Biederman wrote: >> > what would you be delaying? kmem_cache_alloc() for struct mount and assignments >> > to its fields? That's noise; if anything, I would expect the main cost with >> > a plenty of containers to be in sget() scanning the list of mqueue superblocks. >> > And we can get rid of that, while we are at it - to hell with mount_ns(), with >> > that approach we can just use mount_nodev() instead. The logics in >> > mq_internal_mount() will deal with multiple instances - if somebody has already >> > triggered creation of internal mount, all subsequent calls in that ipcns will >> > end up avoiding kern_mount_data() entirely. And if you have two callers >> > racing - sure, you will get two superblocks. Not for long, though - the first >> > one to get to setting ->mq_mnt (serialized on mq_lock) wins, the second loses >> > and prompty destroys his vfsmount and superblock. I seriously suspect that >> > variant below would cut down on the cost a whole lot more - as it is, we have >> > the total of O(N^2) spent in the loop inside of sget_userns() when we create >> > N ipcns and mount in each of those; this patch should cut that to >> > O(N)... >> >> If that is where the cost is, is there any point in delaying creating >> the internal mount at all? > > We won't know without the profiles... Incidentally, is there any point in > using mount_ns() for procfs? Similar scheme (with ->proc_mnt instead of > ->mq_mnt, of course) would live with mount_nodev() just fine, and it's > definitely less costly - we don't bother with the loop in sget_userns() > at all that way. The mechanism of mqueuefs and proc are different for dealing with a filesystem that continues to be mounted/referenced after the namespace exists. Proc actually takes a reference on the pid namespace so it is easier to work with. pid_ns_prepare_proc and and pid_ns_release_proc are the namespace side of that dependency. So yes we could look at a local cache in the namespace and all would be well for proc. I don't know what we would use for locking when we start allowing more that one path to set it. atmoic_cmpxchg(&proc_mnt, NULL)? That makes me suspect we could have a common helper that does the work. I do know that the reason I moved proc to mount_ns is that it had simply been open coding that function. Eric