netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] nfnetlink_queue: enable PID info retrieval
@ 2016-06-09 20:50 Saeed Mahameed
  2016-06-09 21:17 ` Eric Dumazet
  2016-06-09 21:35 ` Florian Westphal
  0 siblings, 2 replies; 9+ messages in thread
From: Saeed Mahameed @ 2016-06-09 20:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, netfilter-devel, Yevgeny Petrilin, Andre Melkoumian,
	Matthew Finlay, Saeed Mahameed, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik

From: Matthew Finlay <matt@mellanox.com>

Allow the netlink_queue_module to get the PID associated with an outgoing
connection. Finding the PID based on the tuple in userspace is expensive.
This additional attribute makes it convenient and efficient to get the PID
associated with the outgoing connection in userspace, without the need to
parse procfs.

Signed-off-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
---
 include/linux/fs.h                             |  1 +
 include/uapi/linux/netfilter/nfnetlink_queue.h |  4 +++-
 net/netfilter/nfnetlink_queue.c                | 25 +++++++++++++++++++++++++
 net/socket.c                                   |  1 +
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index dd28814..f6e0ae3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -871,6 +871,7 @@ extern struct block_device *I_BDEV(struct inode *inode);
 struct fown_struct {
 	rwlock_t lock;          /* protects pid, uid, euid fields */
 	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
+	struct pid *sock_pid;	/* pid of the process that created the socket */
 	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
 	kuid_t uid, euid;	/* uid/euid of process setting the owner */
 	int signum;		/* posix.1b rt signal to be delivered on IO */
diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
index ae30841..87379ae 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -60,6 +60,7 @@ enum nfqnl_attr_type {
 	NFQA_SECCTX,			/* security context string */
 	NFQA_VLAN,			/* nested attribute: packet vlan info */
 	NFQA_L2HDR,			/* full L2 header */
+	NFQA_PID,			/* __s32 sk pid */
 
 	__NFQA_MAX
 };
@@ -114,7 +115,8 @@ enum nfqnl_attr_config {
 #define NFQA_CFG_F_GSO				(1 << 2)
 #define NFQA_CFG_F_UID_GID			(1 << 3)
 #define NFQA_CFG_F_SECCTX			(1 << 4)
-#define NFQA_CFG_F_MAX				(1 << 5)
+#define NFQA_CFG_F_PID				(1 << 5)
+#define NFQA_CFG_F_MAX				(1 << 6)
 
 /* flags for NFQA_SKB_INFO */
 /* packet appears to have wrong checksums, but they are ok */
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index aa93877..b7a7f5a3 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -278,6 +278,24 @@ nla_put_failure:
 	return -1;
 }
 
+static int nfqnl_put_sk_pid(struct sk_buff *skb, struct sock *sk)
+{
+	struct pid *sk_pid;
+	int err = 0;
+
+	if (!sk_fullsock(sk))
+		return 0;
+
+	read_lock_bh(&sk->sk_callback_lock);
+	if (sk->sk_socket && sk->sk_socket->file) {
+		sk_pid = sk->sk_socket->file->f_owner.sock_pid;
+		if (sk_pid)
+			err = nla_put_be32(skb, NFQA_PID, htonl(pid_nr(sk_pid)));
+	}
+	read_unlock_bh(&sk->sk_callback_lock);
+	return err;
+}
+
 static u32 nfqnl_get_sk_secctx(struct sk_buff *skb, char **secdata)
 {
 	u32 seclen = 0;
@@ -440,6 +458,9 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 			size += nla_total_size(seclen);
 	}
 
+	if (queue->flags & NFQA_CFG_F_PID)
+		size += nla_total_size(sizeof(int32_t)); /* pid */
+
 	skb = alloc_skb(size, GFP_ATOMIC);
 	if (!skb) {
 		skb_tx_error(entskb);
@@ -570,6 +591,10 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 	    nfqnl_put_sk_uidgid(skb, entskb->sk) < 0)
 		goto nla_put_failure;
 
+	if ((queue->flags & NFQA_CFG_F_PID) && entskb->sk &&
+	    nfqnl_put_sk_pid(skb, entskb->sk) < 0)
+		goto nla_put_failure;
+
 	if (seclen && nla_put(skb, NFQA_SECCTX, seclen, secdata))
 		goto nla_put_failure;
 
diff --git a/net/socket.c b/net/socket.c
index a1bd161..67de200 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
 	}
 
 	sock->file = file;
+	file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
 	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
 	file->private_data = sock;
 	return file;
-- 
2.8.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-09 20:50 [PATCH net-next] nfnetlink_queue: enable PID info retrieval Saeed Mahameed
@ 2016-06-09 21:17 ` Eric Dumazet
  2016-06-10 14:29   ` David Laight
  2016-06-09 21:35 ` Florian Westphal
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2016-06-09 21:17 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, netfilter-devel, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik

On Thu, 2016-06-09 at 23:50 +0300, Saeed Mahameed wrote:
> From: Matthew Finlay <matt@mellanox.com>


> diff --git a/net/socket.c b/net/socket.c
> index a1bd161..67de200 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
>  	}
>  
>  	sock->file = file;
> +	file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
>  	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
>  	file->private_data = sock;
>  	return file;

Wow, that is a serious memory leak weapon (of struct pid)

Why don't you store the pid value, instead of a pointer ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-09 20:50 [PATCH net-next] nfnetlink_queue: enable PID info retrieval Saeed Mahameed
  2016-06-09 21:17 ` Eric Dumazet
@ 2016-06-09 21:35 ` Florian Westphal
  2016-06-09 22:21   ` Daniel Borkmann
  1 sibling, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2016-06-09 21:35 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, netfilter-devel, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik

Saeed Mahameed <saeedm@mellanox.com> wrote:
> index a1bd161..67de200 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
>  	}
>  
>  	sock->file = file;
> +	file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
>  	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
>  	file->private_data = sock;
>  	return file;

This looks like this leaks sock_pid reference...?

(find_get_pid -> get_pid -> atomic_inc() , I don't see a put_pid in the
 patch)

Can't comment further than this since I'm not familiar with vfs; e.g.
I can't say if fown_struct is right place or not, or if this approach
even works when creating process has exited after fork, etc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-09 21:35 ` Florian Westphal
@ 2016-06-09 22:21   ` Daniel Borkmann
  2016-06-09 23:22     ` Daniel Borkmann
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2016-06-09 22:21 UTC (permalink / raw)
  To: Florian Westphal, Saeed Mahameed
  Cc: David S. Miller, netdev, netfilter-devel, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik

On 06/09/2016 11:35 PM, Florian Westphal wrote:
> Saeed Mahameed <saeedm@mellanox.com> wrote:
>> index a1bd161..67de200 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
>>   	}
>>
>>   	sock->file = file;
>> +	file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
>>   	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
>>   	file->private_data = sock;
>>   	return file;
>
> This looks like this leaks sock_pid reference...?
>
> (find_get_pid -> get_pid -> atomic_inc() , I don't see a put_pid in the
>   patch)
>
> Can't comment further than this since I'm not familiar with vfs; e.g.
> I can't say if fown_struct is right place or not, or if this approach
> even works when creating process has exited after fork, etc.

Or ... if you xmit the fd via unix domain socket to a different process
and initial owner terminates, which should give you invalid information
then; afaik, this would just increase struct file's refcnt and hand out
an unused fdnum ( get_unused_fd_flags() + fd_install(), etc).
For extending 'struct fown_struct', you probably also need to Cc fs folks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-09 22:21   ` Daniel Borkmann
@ 2016-06-09 23:22     ` Daniel Borkmann
  2016-06-10  6:40       ` Daniel Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2016-06-09 23:22 UTC (permalink / raw)
  To: Florian Westphal, Saeed Mahameed
  Cc: David S. Miller, netdev, netfilter-devel, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik, john.fastabend, daniel.wagner,
	hannes, tj, viro

On 06/10/2016 12:21 AM, Daniel Borkmann wrote:
> On 06/09/2016 11:35 PM, Florian Westphal wrote:
>> Saeed Mahameed <saeedm@mellanox.com> wrote:
>>> index a1bd161..67de200 100644
>>> --- a/net/socket.c
>>> +++ b/net/socket.c
>>> @@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
>>>       }
>>>
>>>       sock->file = file;
>>> +    file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
>>>       file->f_flags = O_RDWR | (flags & O_NONBLOCK);
>>>       file->private_data = sock;
>>>       return file;
>>
>> This looks like this leaks sock_pid reference...?
>>
>> (find_get_pid -> get_pid -> atomic_inc() , I don't see a put_pid in the
>>   patch)
>>
>> Can't comment further than this since I'm not familiar with vfs; e.g.
>> I can't say if fown_struct is right place or not, or if this approach
>> even works when creating process has exited after fork, etc.
>
> Or ... if you xmit the fd via unix domain socket to a different process
> and initial owner terminates, which should give you invalid information
> then; afaik, this would just increase struct file's refcnt and hand out
> an unused fdnum ( get_unused_fd_flags() + fd_install(), etc).
> For extending 'struct fown_struct', you probably also need to Cc fs folks.

[ Cc'ing John, Daniel, et al ]

Btw, while I just looked at scm_detach_fds(), I think commits ...

  * 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
  * d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")

... might not be correct, maybe I'm missing something ...? Lets say process A
has a socket fd that it sends via SCM_RIGHTS to process B. Process A was the
one that called sk_alloc() originally. Now in scm_detach_fds() we install a new
fd for process B pointing to the same sock (file's private_data) and above commits
update the cached socket cgroup data for net_cls/net_prio to the new process B.
So, if process A for example still sends data over that socket, skbs will then
wrongly match on B's cgroup membership instead of A's, no?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-09 23:22     ` Daniel Borkmann
@ 2016-06-10  6:40       ` Daniel Wagner
  2016-06-15 18:47         ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Wagner @ 2016-06-10  6:40 UTC (permalink / raw)
  To: Daniel Borkmann, Florian Westphal, Saeed Mahameed
  Cc: David S. Miller, netdev, netfilter-devel, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik, john.fastabend, hannes, tj,
	viro

Hi Daniel,

> [ Cc'ing John, Daniel, et al ]
> 
> Btw, while I just looked at scm_detach_fds(), I think commits ...
> 
>  * 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set
> correctly")
>  * d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set
> correctly")
> 
> ... might not be correct, maybe I'm missing something ...? Lets say
> process A
> has a socket fd that it sends via SCM_RIGHTS to process B. Process A was
> the
> one that called sk_alloc() originally. Now in scm_detach_fds() we
> install a new
> fd for process B pointing to the same sock (file's private_data) and
> above commits
> update the cached socket cgroup data for net_cls/net_prio to the new
> process B.
> So, if process A for example still sends data over that socket, skbs
> will then
> wrongly match on B's cgroup membership instead of A's, no?

I can't remember the details right now (need to read up again but I wont
have time till Wednesday).

>From your analysis I would say that is not the desired effect. A should
match against its own cgroup and not the one of B.

cheers,
daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-09 21:17 ` Eric Dumazet
@ 2016-06-10 14:29   ` David Laight
  2016-06-10 15:31     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: David Laight @ 2016-06-10 14:29 UTC (permalink / raw)
  To: 'Eric Dumazet', Saeed Mahameed
  Cc: David S. Miller, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik

From: Eric Dumazet
> Sent: 09 June 2016 22:17
> On Thu, 2016-06-09 at 23:50 +0300, Saeed Mahameed wrote:
> > From: Matthew Finlay <matt@mellanox.com>
> 
> 
> > diff --git a/net/socket.c b/net/socket.c
> > index a1bd161..67de200 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
> >  	}
> >
> >  	sock->file = file;
> > +	file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
> >  	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
> >  	file->private_data = sock;
> >  	return file;
> 
> Wow, that is a serious memory leak weapon (of struct pid)
> 
> Why don't you store the pid value, instead of a pointer ?

Since the numeric 'pid' values can be reused (with no grace time) you'd
need to hold a reference to the pid structure (added in about 2.6.27) instead.
Which is just a smaller memory leak!
(and annoyingly a non-gpl driver can't drop a reference to it).

	David


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-10 14:29   ` David Laight
@ 2016-06-10 15:31     ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2016-06-10 15:31 UTC (permalink / raw)
  To: David Laight
  Cc: Saeed Mahameed, David S. Miller, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik

On Fri, 2016-06-10 at 14:29 +0000, David Laight wrote:
> From: Eric Dumazet
> > Sent: 09 June 2016 22:17
> > On Thu, 2016-06-09 at 23:50 +0300, Saeed Mahameed wrote:
> > > From: Matthew Finlay <matt@mellanox.com>
> > 
> > 
> > > diff --git a/net/socket.c b/net/socket.c
> > > index a1bd161..67de200 100644
> > > --- a/net/socket.c
> > > +++ b/net/socket.c
> > > @@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
> > >  	}
> > >
> > >  	sock->file = file;
> > > +	file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
> > >  	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
> > >  	file->private_data = sock;
> > >  	return file;
> > 
> > Wow, that is a serious memory leak weapon (of struct pid)
> > 
> > Why don't you store the pid value, instead of a pointer ?
> 
> Since the numeric 'pid' values can be reused (with no grace time) you'd
> need to hold a reference to the pid structure (added in about 2.6.27) instead.
> Which is just a smaller memory leak!

Smaller than what ?


I fail to see how keeping a reference on the pid structure of the
process who created a socket can be useful, other than some optional
LSM.

A socket can be given via af_unix to another process.

Original process might have died.

Keeping a ref on the pid wont prevent this.

So the 'pid' here looks as a pure hint/info. Better store it directly
and avoid all the ref counting and indirection games that are going to
slow down nfnetlink quite a lot with all these extra cache line misses
and locks.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
  2016-06-10  6:40       ` Daniel Wagner
@ 2016-06-15 18:47         ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-06-15 18:47 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Daniel Borkmann, Florian Westphal, Saeed Mahameed,
	David S. Miller, netdev, netfilter-devel, Yevgeny Petrilin,
	Andre Melkoumian, Matthew Finlay, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik, john.fastabend, hannes, viro

Hello,

On Fri, Jun 10, 2016 at 08:40:34AM +0200, Daniel Wagner wrote:
> > [ Cc'ing John, Daniel, et al ]
> > 
> > Btw, while I just looked at scm_detach_fds(), I think commits ...
> > 
> >  * 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set
> > correctly")
> >  * d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set
> > correctly")
> > 
> > ... might not be correct, maybe I'm missing something ...? Lets say
> > process A
> > has a socket fd that it sends via SCM_RIGHTS to process B. Process A was
> > the
> > one that called sk_alloc() originally. Now in scm_detach_fds() we
> > install a new
> > fd for process B pointing to the same sock (file's private_data) and
> > above commits
> > update the cached socket cgroup data for net_cls/net_prio to the new
> > process B.
> > So, if process A for example still sends data over that socket, skbs
> > will then
> > wrongly match on B's cgroup membership instead of A's, no?
> 
> I can't remember the details right now (need to read up again but I wont
> have time till Wednesday).
> 
> From your analysis I would say that is not the desired effect. A should
> match against its own cgroup and not the one of B.

We don't have a good answer for resources which are shared across
different cgroups.  It is often too expensive to track such sharing
accurately and crude approximation (creator-owned, last-used or
whatever) is used widely even outside cgroup.  Different cgroup
controllers tried different approaches but most are settling down for
creator ownership with exceptions for high impact cases.

I don't think there's a solution which satifies all cases here.  Given
that, doing the minimum amount of work (not worrying about SCM_RIGHTS
transfers) is the right thing to do, but we've had this re-labeling
since 2012, so leaving as-is is likely the best option at this point.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-15 18:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-09 20:50 [PATCH net-next] nfnetlink_queue: enable PID info retrieval Saeed Mahameed
2016-06-09 21:17 ` Eric Dumazet
2016-06-10 14:29   ` David Laight
2016-06-10 15:31     ` Eric Dumazet
2016-06-09 21:35 ` Florian Westphal
2016-06-09 22:21   ` Daniel Borkmann
2016-06-09 23:22     ` Daniel Borkmann
2016-06-10  6:40       ` Daniel Wagner
2016-06-15 18:47         ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).