linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	David Howells <dhowells@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Peter Zijlstra <peterz@infradead.org>,
	Sasha Levin <levinsasha928@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] introduce proc_inode->pid_entry
Date: Sun, 10 Aug 2014 21:23:24 +0200	[thread overview]
Message-ID: <20140810192324.GA31721@redhat.com> (raw)
In-Reply-To: <87d2caod0g.fsf@x220.int.ebiederm.org>

[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]

On 08/08, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > To me it looks a bit annoying that task_mmu.c needs 6 seq_operations's and
> > 6 file_operations's to handle /proc/pid/*maps*. And _only_ because ->show()
> > differs.
> >
> > Eric, et al, what do you think? At least something like 1-3 looks like a
> > good cleanup imho. And afaics we can do more cleanups on top.
>
>
> I see where you are getting annoyed.
>
> Taking a quick look at task_mmu.c  It looks like the tgid vs pid logic
> to decided which stack or stacks to display is simply incorrect.

Yes, probably, but please forget this for now. Because,

> Given where you are starting I think tack_mmu.c code that decides
> when/which stack deserves a serious audit.

Yes. And more. At least this code needs more cleanups. ->task should
die or at least we should avoid get/put_task_struct, ->pid can die too,
hold_task_mempolicy() doesn't look correct (at least the "prevent changing
our mempolicy while show_numa_maps" comment and down_write() in
do_set_mempolicy(). I am going to try to cleanup this a bit after I change
task_nommu.c to avoid mm_access() in m_start().

But this is off-topic,

> At a practical level moving pid_entry into the proc inode is ugly
> especially for the hack that is is_tgid_pid_entry.

Well, I am not going to argue with maintainer, mostly simply because
I do not understand this code enough ;)

But let me say that I disagree. We already have ->fd, and ->sysctl*.
I see nothing wrong if *id_base_stuff files have more info for free.

And imo proc_inode->sysctl* is much worse. Simply because they are
not private to fs/proc/proc_sysctl.c.

Could you please look into the attached mbox? I am just curious if we
can do something like this in a clean way. In particular, please look at
"Note:" comment in 3/3. Perhaps proc_sys_init() can do proc_get_inode(),
initialize/instantiate it...

To clarify, of course it is not that I want to shrink sizeof(proc_inode).
Just to me it doesn't look clean that proc_evict_inode() has to do
sysctl_head_put(), grab_header() assumes that ->sysctl == NULL means
sysctl_table_root.*, etc.

> That test could be implemented more easily by looking at the parent
> directories inode operations and seeing if they are
> proc_root_inode_operations.

Hmm. Looking at inode? How?

Oleg.

[-- Attachment #2: U.m --]
[-- Type: text/plain, Size: 4885 bytes --]

>From a4c94461ae18b2d6cc2e8a9cfb042d0b6cc46e86 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Sat, 9 Aug 2014 17:25:53 +0200
Subject: [PATCH 1/3] proc: introduce proc_sys_evict_inode()

Preparation. Shift the sysctl_head_put() code from proc_evict_inode()
into the new trivial helper, make sysctl_head_put() static.

---
 fs/proc/inode.c       |    8 ++------
 fs/proc/internal.h    |    4 ++--
 fs/proc/proc_sysctl.c |   13 ++++++++++++-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 0adbc02..9f6715a 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -31,7 +31,6 @@
 static void proc_evict_inode(struct inode *inode)
 {
 	struct proc_dir_entry *de;
-	struct ctl_table_header *head;
 	const struct proc_ns_operations *ns_ops;
 	void *ns;
 
@@ -45,11 +44,8 @@ static void proc_evict_inode(struct inode *inode)
 	de = PROC_I(inode)->pde;
 	if (de)
 		pde_put(de);
-	head = PROC_I(inode)->sysctl;
-	if (head) {
-		RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL);
-		sysctl_head_put(head);
-	}
+
+	proc_sys_evict_inode(inode);
 	/* Release any associated namespace */
 	ns_ops = PROC_I(inode)->ns.ns_ops;
 	ns = PROC_I(inode)->ns.ns;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 78784cd..7ce3262 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -238,10 +238,10 @@ extern int proc_setup_self(struct super_block *);
  */
 #ifdef CONFIG_PROC_SYSCTL
 extern int proc_sys_init(void);
-extern void sysctl_head_put(struct ctl_table_header *);
+extern void proc_sys_evict_inode(struct inode *inode);
 #else
 static inline void proc_sys_init(void) { }
-static inline void sysctl_head_put(struct ctl_table_header *head) { }
+static inline void proc_sys_evict_inode(struct inode *inode) { }
 #endif
 
 /*
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 7129046..2fa67e7 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -256,7 +256,7 @@ static void sysctl_head_get(struct ctl_table_header *head)
 	spin_unlock(&sysctl_lock);
 }
 
-void sysctl_head_put(struct ctl_table_header *head)
+static void sysctl_head_put(struct ctl_table_header *head)
 {
 	spin_lock(&sysctl_lock);
 	if (!--head->count)
@@ -264,6 +264,17 @@ void sysctl_head_put(struct ctl_table_header *head)
 	spin_unlock(&sysctl_lock);
 }
 
+void proc_sys_evict_inode(struct inode *inode)
+{
+	struct ctl_table_header *head;
+
+	head = PROC_I(inode)->sysctl;
+	if (head) {
+		RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL);
+		sysctl_head_put(head);
+	}
+}
+
 static struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head)
 {
 	BUG_ON(!head);
-- 
1.5.5.1


>From e0b42f4770cd76069e4e70e48e4e7bfa84fab4bf Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Sat, 9 Aug 2014 17:37:31 +0200
Subject: [PATCH 2/3] proc: change proc_sys_evict_inode() to check inode->i_op

Change proc_sys_evict_inode() to verify that this inode is really
a /proc/sys inode before using ->sysctl.

---
 fs/proc/proc_sysctl.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 2fa67e7..863aaee 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -268,6 +268,10 @@ void proc_sys_evict_inode(struct inode *inode)
 {
 	struct ctl_table_header *head;
 
+	if (inode->i_op != &proc_sys_inode_operations &&
+	    inode->i_op != &proc_sys_dir_operations)
+		return;
+
 	head = PROC_I(inode)->sysctl;
 	if (head) {
 		RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL);
-- 
1.5.5.1


>From 9c8e98fc45f091d5d2a4bfc6dcd86b67275160a1 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Sun, 10 Aug 2014 17:37:25 +0200
Subject: [PATCH 3/3] proc: place proc_inode->sysctl* and proc_inode->fd into a union

->sysctl* and ->fd members can share the memory, add a union.

Note: unfortunately proc_alloc_inode() still has to initialize ->sysctl*
members. And the only (afaics) reason is that proc_mkdir("sys") relies
on ->sysctl == NULL which means sysctl_table_root in grab_header(). It
would be nice to initialize these members after proc_get_inode("sys")
somehow and remove the special "NULL means global root" case in proc_sys
paths somehow, but I do not see how.

---
 fs/proc/internal.h |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 7ce3262..8d35ac0 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -60,11 +60,15 @@ union proc_op {
 
 struct proc_inode {
 	struct pid *pid;
-	int fd;
-	union proc_op op;
 	struct proc_dir_entry *pde;
-	struct ctl_table_header *sysctl;
-	struct ctl_table *sysctl_entry;
+	union proc_op op;
+	union {
+		struct {
+			struct ctl_table_header *sysctl;
+			struct ctl_table *sysctl_entry;
+		};
+		int fd;
+	};
 	struct proc_ns ns;
 	struct inode vfs_inode;
 };
-- 
1.5.5.1


      parent reply	other threads:[~2014-08-10 19:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08 18:57 [RFC PATCH 0/5] introduce proc_inode->pid_entry Oleg Nesterov
2014-08-08 18:57 ` [RFC PATCH 1/5] proc: intoduce proc_inode->pid_entry and is_tgid_pid_entry() Oleg Nesterov
2014-08-08 20:05   ` Cyrill Gorcunov
2014-08-09 14:28     ` Oleg Nesterov
2014-08-10  7:16       ` Cyrill Gorcunov
2014-08-08 18:57 ` [RFC PATCH 2/5] proc: unify proc_tgid_stat() and proc_tid_stat() Oleg Nesterov
2014-08-08 18:57 ` [RFC PATCH 3/5] proc: kill *_tid_*maps* stuff Oleg Nesterov
2014-08-08 18:57 ` [RFC PATCH 4/5] proc: introduce pid_entry_name() Oleg Nesterov
2014-08-08 18:58 ` [RFC PATCH 5/5] proc: unify proc_pid_*maps* stuff Oleg Nesterov
2014-08-08 22:03 ` [RFC PATCH 0/5] introduce proc_inode->pid_entry Eric W. Biederman
2014-08-08 22:11   ` Eric W. Biederman
2014-08-10 19:23   ` Oleg Nesterov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140810192324.GA31721@redhat.com \
    --to=oleg@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=kirill@shutemov.name \
    --cc=levinsasha928@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).