From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932483AbcHBUoH (ORCPT ); Tue, 2 Aug 2016 16:44:07 -0400 Received: from fieldses.org ([173.255.197.46]:50068 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753059AbcHBUn5 (ORCPT ); Tue, 2 Aug 2016 16:43:57 -0400 Date: Tue, 2 Aug 2016 16:34:06 -0400 From: "J. Bruce Fields" To: Jeff Layton Cc: "Eric W. Biederman" , Nikolay Borisov , viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, containers@lists.linux-foundation.org, serge.hallyn@canonical.com Subject: Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process Message-ID: <20160802203406.GE15324@fieldses.org> References: <1470148943-21835-1-git-send-email-kernel@kyup.com> <87r3a7qhy0.fsf@x220.int.ebiederm.org> <20160802174003.GD11767@fieldses.org> <87invjq97h.fsf@x220.int.ebiederm.org> <20160802194437.GD15324@fieldses.org> <1470168082.15226.14.camel@poochiereds.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1470168082.15226.14.camel@poochiereds.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 02, 2016 at 04:01:22PM -0400, Jeff Layton wrote: > On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote: > > On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote: > > > > > > > > "J. Bruce Fields" writes: > > > > > > > > > > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote: > > > > > > > > > > > > > > Nikolay Borisov writes: > > > > > > > > > > > > > > > > > Currently when /proc/locks is read it will show all the file locks > > > > > > which are currently created on the machine. On containers, hosted > > > > > > on busy servers this means that doing lsof can be very slow. I > > > > > > observed up to 5 seconds stalls reading 50k locks, while the container > > > > > > itself had only a small number of relevant entries. Fix it by > > > > > > filtering the locks listed by the pidns of the current process > > > > > > and the process which created the lock. > > > > > > > > > > The locks always confuse me so I am not 100% connecting locks > > > > > to a pid namespace is appropriate. > > > > > > > > > > That said if you are going to filter by pid namespace please use the pid > > > > > namespace of proc, not the pid namespace of the process reading the > > > > > file. > > > > > > > > Oh, that makes sense, thanks. > > > > > > > > What does /proc/mounts use, out of curiosity?  The mount namespace that > > > > /proc was originally mounted in? > > > > > > /proc/mounts -> /proc/self/mounts > > > > D'oh, I knew that. > > > > > > > > /proc/[pid]/mounts lists mounts from the mount namespace of the > > > appropriate process. > > > > > > That is another way to go but it is a tread carefully thing as changing > > > things that way it is easy to surprise apparmor or selinux rules and be > > > surprised you broke someones userspace in a way that prevents booting. > > > Although I suspect /proc/locks isn't too bad. > > > > OK, thanks. > > > > /proc/[pid]/locks might be confusing.  I'd expect it to be "all the > > locks owned by this task", rather than "all the locks owned by pid's in > > the same pid namespace", or whatever criterion we choose. > > > > Uh, I'm still trying to think of the Obviously Right solution here, and > > it's not coming. > > > > --b. > > > I'm a little leery of changing how this works. It has always been > maintained as a legacy interface, so do we run the risk of breaking > something if we turn it into a per-namespace thing? The namespace work is all about making interfaces per-namespace. I guess it works as long as it contributes to the illusion that each container is its own machine. Thinking about it, I might be sold on the per-pidns approach (with Eric's modification to use the pidns of /proc not the reader). My complaint about not being able to see conflicting locks would apply just as well to conflicts from nfs locks held by other clients. A disk filesystem shared across multiple containers is a little like an nfs filesystem shared between nfs clients. That'd solve this immediate problem without requiring an lsof upgrade as well. > This also doesn't > solve the problem of slow traversal in the init_pid_ns -- only in a > container. > > I also can't help but feel that /proc/locks is just showing its age. It > was fine in the late 90's, but its limitations are just becoming more > apparent as things get more complex. It was never designed for > performance as you end up thrashing several spinlocks when reading it. > > Maybe it's time to think about presenting this info in another way? A > global view of all locks on the system is interesting but maybe it > would be better to present it more granularly somehow? But, yes, that might be a good idea. --b. > > I guess I should go look at what lsof actually does with this info... > > -- > Jeff Layton