Re: [PATCH net] netlink, mmap: transform mmap skb into full skb on taps

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Borkmann <daniel@iogearbox.net>
To: David Miller <davem@davemloft.net>
Cc: chamaken@gmail.com, fw@strlen.de, netdev@vger.kernel.org
Subject: Re: [PATCH net] netlink, mmap: transform mmap skb into full skb on taps
Date: Fri, 11 Sep 2015 22:35:08 +0200	[thread overview]
Message-ID: <55F33AFC.1080206@iogearbox.net> (raw)
In-Reply-To: <20150911.124205.1992250802389107845.davem@davemloft.net>

On 09/11/2015 09:42 PM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Fri, 11 Sep 2015 12:25:45 +0200
>
>> Already calling into skb_clone() is an issue itself, as the data
>> area is user space buffer, and skb_clone() as well as skb_copy()
>> access skb_shinfo() area. :/ So in that regard netlink mmap skbs are
>> even further restrictive on what we can do than netlink large skbs.
>
> Indeed, this is fatal.
>
> So we'd still need something special like your
> netlink_to_full_skb_clone to elide trying to touch the skb_shinfo
> area.
>
> I thought briefly about somehow cobbling up extra space in the ring
> entries so we could have a real skb_shinfo() there, but that's illegal
> too as the user could scribble all over it randomly while we interpret
> the contents.  We don't own that memory.  So this doesn't work.

Yes, agreed.

> We could rename the clone_preserves_destructor and have it also mean
> that the SKB lacks frags and skb_shinfo() should not be inspected.
>
> Something like this:
[...]
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2738d35..898c53d 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
[...]
> @@ -2220,7 +2221,8 @@ static inline void skb_orphan(struct sk_buff *skb)
>    */
>   static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
>   {
> -	if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY)))
> +	if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) ||
> +		   skb->private_buffers))

(These two would need to be swapped.)

>   		return 0;
>   	return skb_copy_ubufs(skb, gfp_mask);
>   }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index dad4dd3..54f9d6e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -825,7 +825,10 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
>   	n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
>   	n->cloned = 1;
>   	n->nohdr = 0;
> -	n->destructor = NULL;
> +	if (!skb->private_buffers)
> +		n->destructor = NULL;
> +	else
> +		C(destructor);
>   	C(tail);
>   	C(end);
>   	C(head);

We would also have to conditionally skip the __skb_clone()'s ...

   atomic_inc(&(skb_shinfo(skb)->dataref));

Thus, the issue here is that while netlink_alloc_large_skb() and
netlink_ring_setup_skb() would set both skb->private_buffers = 1,
the large skb case would actually need to inspect dataref count
(which it also can legally do) to properly release the vmalloc'ed
area again, while the other case must not even touch it. So if I
see this correctly, it looks like it's unfortunately not possible
to combine the two cases in a single flag. :/

If there's a good case to burn this flag outside of netlink for e.g.
vmalloc backend memory on skbs, it could be solved like that, while
the mmap case be declared netlink's problem. ;) I currently don't
have a better idea than to copy these guys, hmmm.

[...]
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 7f86d3b..523adac 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -854,6 +855,14 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb)
>   #define netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, scm)	0
>   #endif /* CONFIG_NETLINK_MMAP */
>
> +static bool skb_can_release_head(struct sk_buff *skb)
> +{
> +	if (!skb->cloned ||
> +	    !atomic_dec_return(&(skb_shinfo(skb)->dataref)))
> +		return true;
> +	return false;
> +}
> +
>   static void netlink_skb_destructor(struct sk_buff *skb)
>   {
>   #ifdef CONFIG_NETLINK_MMAP
> @@ -866,31 +875,35 @@ static void netlink_skb_destructor(struct sk_buff *skb)
[...]

next prev parent reply	other threads:[~2015-09-11 20:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-10 18:05 [PATCH net] netlink, mmap: transform mmap skb into full skb on taps Daniel Borkmann
2015-09-11  5:11 ` David Miller
2015-09-11 10:25   ` Daniel Borkmann
2015-09-11 19:42     ` David Miller
2015-09-11 20:35       ` Daniel Borkmann [this message]
2015-09-11 21:34         ` David Miller
2015-09-11 22:18           ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F33AFC.1080206@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=chamaken@gmail.com \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).