From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikolay Borisov Subject: Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns Date: Wed, 3 Aug 2016 17:33:48 +0300 Message-ID: <57A200CC.3060702@kyup.com> References: <1470148943-21835-1-git-send-email-kernel@kyup.com> <1470209710-30022-1-git-send-email-kernel@kyup.com> <1470232012.18285.4.camel@poochiereds.net> <57A1FCE5.3040206@kyup.com> <20160803142850.GA27072@fieldses.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20160803142850.GA27072-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: "J. Bruce Fields" Cc: Andrey Vagin , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jeff Layton , xemul-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org List-Id: containers.vger.kernel.org On 08/03/2016 05:28 PM, J. Bruce Fields wrote: > On Wed, Aug 03, 2016 at 05:17:09PM +0300, Nikolay Borisov wrote: >> >> >> On 08/03/2016 04:46 PM, Jeff Layton wrote: >>> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote: >>>> On busy container servers reading /proc/locks shows all the locks >>>> created by all clients. This can cause large latency spikes. In my >>>> case I observed lsof taking up to 5-10 seconds while processing around >>>> 50k locks. Fix this by limiting the locks shown only to those created >>>> in the same pidns as the one the proc was mounted in. When reading >>>> /proc/locks from the init_pid_ns show everything. >>>> >>>>> Signed-off-by: Nikolay Borisov >>>> --- >>>> fs/locks.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/fs/locks.c b/fs/locks.c >>>> index ee1b15f6fc13..751673d7f7fc 100644 >>>> --- a/fs/locks.c >>>> +++ b/fs/locks.c >>>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v) >>>> { >>>>> struct locks_iterator *iter = f->private; >>>>> struct file_lock *fl, *bfl; >>>>> + struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info; >>>>> + struct pid_namespace *current_pidns = task_active_pid_ns(current); >>>> >>>>> fl = hlist_entry(v, struct file_lock, fl_link); >>>> >>>>>> + if ((current_pidns != &init_pid_ns) && fl->fl_nspid >>> >>> Ok, so when you read from a process that's in the init_pid_ns >>> namespace, then you'll get the whole pile of locks, even when reading >>> this from a filesystem that was mounted in a different pid_ns? >>> >>> That seems odd to me if so. Any reason not to just uniformly use the >>> proc_pidns here? >> >> [CCing some people from openvz/CRIU] >> >> My train of thought was "we should have means which would be the one >> universal truth about everything and this would be a process in the >> init_pid_ns". > > OK, but why not make that means be "mount proc from the init_pid_ns and > read /proc/locks there". So just replace current_pidns with proc_pidns > in the above. I think that's all Jeff was suggesting. Oh, you are right. Silly me, yes, I'm happy with this and I will send a patch. > > --b. > >> I don't have strong preference as long as I'm not breaking >> userspace. As I said before - I think the CRIU guys might be using that >> interface. >> >>> >>>>>> + && (proc_pidns != ns_of_pid(fl->fl_nspid))) >>>>> + return 0; >>>> + >>>>> lock_get_status(f, fl, iter->li_pos, ""); >>>> >>>>> list_for_each_entry(bfl, &fl->fl_block, fl_block) >>> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f49.google.com ([74.125.82.49]:33317 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932315AbcHCOdw (ORCPT ); Wed, 3 Aug 2016 10:33:52 -0400 Received: by mail-wm0-f49.google.com with SMTP id p129so65380277wmp.0 for ; Wed, 03 Aug 2016 07:33:51 -0700 (PDT) Subject: Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns To: "J. Bruce Fields" References: <1470148943-21835-1-git-send-email-kernel@kyup.com> <1470209710-30022-1-git-send-email-kernel@kyup.com> <1470232012.18285.4.camel@poochiereds.net> <57A1FCE5.3040206@kyup.com> <20160803142850.GA27072@fieldses.org> Cc: Jeff Layton , viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, ebiederm@xmission.com, containers@lists.linux-foundation.org, Andrey Vagin , xemul@virtuozzo.com From: Nikolay Borisov Message-ID: <57A200CC.3060702@kyup.com> Date: Wed, 3 Aug 2016 17:33:48 +0300 MIME-Version: 1.0 In-Reply-To: <20160803142850.GA27072@fieldses.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 08/03/2016 05:28 PM, J. Bruce Fields wrote: > On Wed, Aug 03, 2016 at 05:17:09PM +0300, Nikolay Borisov wrote: >> >> >> On 08/03/2016 04:46 PM, Jeff Layton wrote: >>> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote: >>>> On busy container servers reading /proc/locks shows all the locks >>>> created by all clients. This can cause large latency spikes. In my >>>> case I observed lsof taking up to 5-10 seconds while processing around >>>> 50k locks. Fix this by limiting the locks shown only to those created >>>> in the same pidns as the one the proc was mounted in. When reading >>>> /proc/locks from the init_pid_ns show everything. >>>> >>>>> Signed-off-by: Nikolay Borisov >>>> --- >>>> fs/locks.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/fs/locks.c b/fs/locks.c >>>> index ee1b15f6fc13..751673d7f7fc 100644 >>>> --- a/fs/locks.c >>>> +++ b/fs/locks.c >>>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v) >>>> { >>>>> struct locks_iterator *iter = f->private; >>>>> struct file_lock *fl, *bfl; >>>>> + struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info; >>>>> + struct pid_namespace *current_pidns = task_active_pid_ns(current); >>>> >>>>> fl = hlist_entry(v, struct file_lock, fl_link); >>>> >>>>>> + if ((current_pidns != &init_pid_ns) && fl->fl_nspid >>> >>> Ok, so when you read from a process that's in the init_pid_ns >>> namespace, then you'll get the whole pile of locks, even when reading >>> this from a filesystem that was mounted in a different pid_ns? >>> >>> That seems odd to me if so. Any reason not to just uniformly use the >>> proc_pidns here? >> >> [CCing some people from openvz/CRIU] >> >> My train of thought was "we should have means which would be the one >> universal truth about everything and this would be a process in the >> init_pid_ns". > > OK, but why not make that means be "mount proc from the init_pid_ns and > read /proc/locks there". So just replace current_pidns with proc_pidns > in the above. I think that's all Jeff was suggesting. Oh, you are right. Silly me, yes, I'm happy with this and I will send a patch. > > --b. > >> I don't have strong preference as long as I'm not breaking >> userspace. As I said before - I think the CRIU guys might be using that >> interface. >> >>> >>>>>> + && (proc_pidns != ns_of_pid(fl->fl_nspid))) >>>>> + return 0; >>>> + >>>>> lock_get_status(f, fl, iter->li_pos, ""); >>>> >>>>> list_for_each_entry(bfl, &fl->fl_block, fl_block) >>>