From: Patrick McHardy <kaber@trash.net>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netdev@vger.kernel.org, davem@davemloft.net, eric.dumazet@gmail.com
Subject: Re: [PATCHv2 net-next] netlink: allow large data transfers from user-space
Date: Mon, 3 Jun 2013 19:01:37 +0200 [thread overview]
Message-ID: <20130603170136.GA23920@macbook.localnet> (raw)
In-Reply-To: <1370277599-27072-1-git-send-email-pablo@netfilter.org>
On Mon, Jun 03, 2013 at 06:39:59PM +0200, Pablo Neira Ayuso wrote:
> I can hit ENOBUFS in the sendmsg() path with a large batch that is
> composed of many netlink messages. Here that limit is 8 MBytes of
> skbuff data area as kmalloc does not manage to get more than that.
>
> While discussing atomic rule-set for nftables with Patrick McHardy,
> we decided to put all rule-set updates that need to be applied
> atomically in one single batch to simplify the existing approach.
> However, as explained above, the existing netlink code limits us
> to a maximum of ~20000 rules that fit in one single batch without
> hitting ENOBUFS. iptables does not have such limitation as it is
> using vmalloc.
>
> This patch adds netlink_alloc_large_skb() which is only used in
> the netlink_sendmsg() path. It uses alloc_skb if the memory
> requested is <= one memory page, that should be the common case
> for most subsystems, else vmalloc for higher memory allocations.
I know I suggested to do this - just wondering right now, how will we
indiciate to userspace that a change has been applied atomically when
sending notifications? Not sure whether it matters unless userspace
will be able to get a dump while we're in the middle of updating
the ruleset. I guess that won't be possible, right?
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> v1: initial version
> v2: Use NLMSG_GOODSIZE instead of PAGE_SIZE, suggested by Eric Dumazet.
>
> net/netlink/af_netlink.c | 37 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 12ac6b4..7c71d07 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -750,6 +750,10 @@ static void netlink_skb_destructor(struct sk_buff *skb)
> skb->data = NULL;
> }
> #endif
> + if (is_vmalloc_addr(skb->head)) {
> + vfree(skb->head);
> + skb->data = NULL;
> + }
> if (skb->sk != NULL)
> sock_rfree(skb);
> }
> @@ -1420,6 +1424,35 @@ struct sock *netlink_getsockbyfilp(struct file *filp)
> return sock;
> }
>
> +static struct sk_buff *netlink_alloc_large_skb(unsigned int size)
> +{
> + struct sk_buff *skb;
> + void *data;
> +
> + if (size <= NLMSG_GOODSIZE)
> + return alloc_skb(size, GFP_KERNEL);
> +
> + skb = alloc_skb_head(GFP_KERNEL);
> + if (skb == NULL)
> + return NULL;
> +
> + data = vmalloc(size);
> + if (data == NULL)
> + goto err;
> +
> + skb->head = data;
> + skb->data = data;
> + skb_reset_tail_pointer(skb);
> + skb->end = skb->tail + size;
> + skb->len = 0;
> + skb->destructor = netlink_skb_destructor;
> +
> + return skb;
> +err:
> + kfree_skb(skb);
> + return NULL;
> +}
> +
> /*
> * Attach a skb to a netlink socket.
> * The caller must hold a reference to the destination socket. On error, the
> @@ -1510,7 +1543,7 @@ static struct sk_buff *netlink_trim(struct sk_buff *skb, gfp_t allocation)
> return skb;
>
> delta = skb->end - skb->tail;
> - if (delta * 2 < skb->truesize)
> + if (is_vmalloc_addr(skb->head) || delta * 2 < skb->truesize)
> return skb;
>
> if (skb_shared(skb)) {
> @@ -2096,7 +2129,7 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
> if (len > sk->sk_sndbuf - 32)
> goto out;
> err = -ENOBUFS;
> - skb = alloc_skb(len, GFP_KERNEL);
> + skb = netlink_alloc_large_skb(len);
> if (skb == NULL)
> goto out;
>
> --
> 1.7.10.4
next prev parent reply other threads:[~2013-06-03 17:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-03 16:39 [PATCHv2 net-next] netlink: allow large data transfers from user-space Pablo Neira Ayuso
2013-06-03 17:01 ` Patrick McHardy [this message]
2013-06-03 17:29 ` Pablo Neira Ayuso
2013-06-03 17:12 ` Eric Dumazet
2013-06-03 17:41 ` Pablo Neira Ayuso
2013-06-03 18:00 ` Eric Dumazet
2013-06-03 19:21 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130603170136.GA23920@macbook.localnet \
--to=kaber@trash.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.