From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935065AbbI2RjA (ORCPT ); Tue, 29 Sep 2015 13:39:00 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:40095 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932527AbbI2Riw (ORCPT ); Tue, 29 Sep 2015 13:38:52 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Konstantin Khlebnikov , linux-api@vger.kernel.org, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, Roman Gushchin , Serge Hallyn , Chen Fan , Andrew Morton , Linus Torvalds , =?utf-8?Q?St=C3=A9phane?= Graber References: <20150925135246.27620.97496.stgit@buzz> <20150925175654.GA12504@redhat.com> <871tdi8pqj.fsf@x220.int.ebiederm.org> <20150929164315.GA16734@redhat.com> Date: Tue, 29 Sep 2015 12:30:39 -0500 In-Reply-To: <20150929164315.GA16734@redhat.com> (Oleg Nesterov's message of "Tue, 29 Sep 2015 18:43:15 +0200") Message-ID: <874mid16bk.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1/cGHsLGkflwbI6i3Nkj+WaSPLUXLDFGYE= X-SA-Exim-Connect-IP: 67.3.201.231 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.5 XMGappySubj_01 Very gappy subject * 1.5 XMNoVowels Alpha-numberic number with no vowels * 1.0 XMGappySubj_02 Gappier still * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4701] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa03 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 T_TooManySym_03 6+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject X-Spam-DCC: XMission; sa03 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 922 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 4 (0.4%), parse: 1.62 (0.2%), extract_message_metadata: 6 (0.7%), get_uri_detail_list: 4 (0.4%), tests_pri_-1000: 6 (0.7%), tests_pri_-950: 2 (0.2%), tests_pri_-900: 1.56 (0.2%), tests_pri_-400: 42 (4.5%), check_bayes: 40 (4.3%), tests_pri_0: 845 (91.7%), tests_pri_500: 8 (0.8%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd() X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg Nesterov writes: > On 09/28, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> >> > Honestly, I do not really like the new helper... I understand this >> > is subjective, so I won't insist. But how about 1/1? We do not need >> > fd/file at all. With this patch your sys_getvpid() can just use >> > proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns(). >> > >> > Eric, what do you think? >> >> At some level I don't care this is not exposed to userspace. > > I agree, this is minor. But you know, the kernel is already complicated > too much, we should try to simplify/cleanup everything we can ;) > >> Of the existing uses several of them sleep, which unfortunately means we >> can not use rcu locking for everything. The network namespace ones wind >> up taking a reference to struct net because the have the legacy pid case >> to deal with. Which makes we can not use fdget for all callers either. > > And that is why 1/1 adds proc_get_ns_by_fd/get_type which can also be used > by the network namespace. > >> For this translate_pid rcu locking is sufficient, rcu locking is easy >> and doing any more than rcu locking just seems silly. So let me >> respectfully suggest. >> >> struct ns_common *ns_by_fd_rcu(int fd, int type) >> { >> struct files_struct *files = current->files; >> struct file *file; >> struct ns_common *ns; >> void *ret; >> >> file = fcheck_files(files, fd); >> if (!file) >> return ERR_PTR(-EBADF); >> >> if (file->f_mode & FMODE_PATH) >> return ERR_PTR(-EINVAL); >> >> if (file->f_op != &ns_file_operations) >> return ERR_PTR(-EINVAL); >> >> ns = get_proc_ns(file_inode(file)); >> if (ns->ops->type != type) >> return ERR_PTR(-EINVAL); >> >> return ns; >> } > > OK, I won't insist, this too looks better to me than proc_ns_fdget(&fd_ref). > > And in any case fcheck_files() makes more sense than fdget(), somehow I did > not think about this when I sent 1/1. > > Hmm. and after the quick look at cleanup_net() I can't understand whether > get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or > not... Can it? Some of those places need a reference that allows them to sleep, and the code is shared with the legacy pid case so with an addition of get_net we can use ns_by_fd_rcu(). There are cases like setns that could use ns_by_fd_rcu() with code reording. We can implement get_net_ns_by_fd as: struct net *get_net_ns_by_fd(int fd) { struct net *net; rcu_read_lock(); net = net_ns_by_fd_rcu(fd); if (!IS_ERR(net)) get_net(net); rcu_read_unlock(); return net; } Which means we can achieve code sharing with the pure rcu version as a base. If the networking code did not have the legacy pid case to handle I would want to do something with struct fd, as the file descriptor already keeps the struct net reference alive and struct fd allows for sleeping. Eric