netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Set truesize in pskb_expand_head
@ 2004-09-24 23:20 Herbert Xu
  2004-09-24 23:33 ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2004-09-24 23:20 UTC (permalink / raw)
  To: David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 396 bytes --]

Hi:

In order for the skb trimming to work, we need to set truesize in
pskb_expand_head.  This patch does exactly that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 342 bytes --]

===== net/core/skbuff.c 1.37 vs edited =====
--- 1.37/net/core/skbuff.c	2004-08-03 12:20:05 +10:00
+++ edited/net/core/skbuff.c	2004-09-25 09:15:23 +10:00
@@ -541,6 +541,7 @@
 
 	off = (data + nhead) - skb->head;
 
+	skb->truesize = size + sizeof(struct sk_buff);
 	skb->head     = data;
 	skb->end      = data + size;
 	skb->data    += off;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Set truesize in pskb_expand_head
  2004-09-24 23:20 Set truesize in pskb_expand_head Herbert Xu
@ 2004-09-24 23:33 ` David S. Miller
  2004-09-24 23:35   ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: David S. Miller @ 2004-09-24 23:33 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

On Sat, 25 Sep 2004 09:20:53 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> In order for the skb trimming to work, we need to set truesize in
> pskb_expand_head.  This patch does exactly that.

I tried to warn people about this earlier in the netlink
NLMSG_GOODSIZE thread.... ho hum.

This change mucks up socket buffer accounting.

If you change skb->truesize, then when the kfree_skb(skb)
happens a different skb->truesize will be subtracted from
the socket buffer allocation than what was used when the
skb was first charged to the socket.

Basically, once a skb might potentially be charged to a
socket, you cannot change truesize unless you are extremely
careful.

This is another reason why I recommended to use a scratch
buffer, btw :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Set truesize in pskb_expand_head
  2004-09-24 23:33 ` David S. Miller
@ 2004-09-24 23:35   ` Herbert Xu
  2004-09-24 23:37     ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2004-09-24 23:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Fri, Sep 24, 2004 at 04:33:28PM -0700, David S. Miller wrote:
> On Sat, 25 Sep 2004 09:20:53 +1000
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > In order for the skb trimming to work, we need to set truesize in
> > pskb_expand_head.  This patch does exactly that.
> 
> I tried to warn people about this earlier in the netlink
> NLMSG_GOODSIZE thread.... ho hum.
> 
> This change mucks up socket buffer accounting.
> 
> If you change skb->truesize, then when the kfree_skb(skb)
> happens a different skb->truesize will be subtracted from
> the socket buffer allocation than what was used when the
> skb was first charged to the socket.

Dave, it hasn't been charged yet...
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Set truesize in pskb_expand_head
  2004-09-24 23:35   ` Herbert Xu
@ 2004-09-24 23:37     ` Herbert Xu
  2004-09-24 23:45       ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2004-09-24 23:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Sat, Sep 25, 2004 at 09:35:55AM +1000, herbert wrote:
> 
> Dave, it hasn't been charged yet...

Well at least not in the netlink case.  But you're that we can't do this
in general.  I'll adjust it right after the pskb_expand_head call then.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Set truesize in pskb_expand_head
  2004-09-24 23:37     ` Herbert Xu
@ 2004-09-24 23:45       ` David S. Miller
  2004-09-25  0:05         ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: David S. Miller @ 2004-09-24 23:45 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

On Sat, 25 Sep 2004 09:37:13 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Sat, Sep 25, 2004 at 09:35:55AM +1000, herbert wrote:
> > 
> > Dave, it hasn't been charged yet...
> 
> Well at least not in the netlink case.  But you're that we can't do this
> in general.  I'll adjust it right after the pskb_expand_head call then.

I still am hesitant about your idea.

You're willing to do a very likely re-alloc every transaction
where as I'm suggesting a single allocation for the socket at
creation time which is used for every transaction on that
socket.

Let's talk more precisely, perhaps I'm missing something,
my scheme:

   Single PAGE_SIZE scratch buffer per socket, results in one
   allocation at socket creation time, and one SKB allocation
   while building a response.

your scheme:

   Allocate full PAGE_SIZE SKB for each netlink transaction,
   reallocate data area at end of building each transaction.

So, two data area kmalloc()'s per netlink response compared
to one for mine.

Your scheme doesn't even avoid the copy.  What's the advantage?
Furthermore my per-socket PAGE_SIZE scratch area is likely to
get very good cache hit rates making the copy very inexpensive.

Comments?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Set truesize in pskb_expand_head
  2004-09-24 23:45       ` David S. Miller
@ 2004-09-25  0:05         ` Herbert Xu
  2004-09-27 19:27           ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2004-09-25  0:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 2251 bytes --]

On Fri, Sep 24, 2004 at 04:45:40PM -0700, David S. Miller wrote:
> 
> I still am hesitant about your idea.

Alright let's talk more about it :)

> You're willing to do a very likely re-alloc every transaction
> where as I'm suggesting a single allocation for the socket at
> creation time which is used for every transaction on that
> socket.

Yes.  However, what we really care about is not how many
reallocations there are, but how likely are they going to fail.

In your scheme, every netlink socket gets a page regardless
of whether it's going to be used (think of a socket for
multicast reception only).  That means at the point just
before the realloc, your scheme would have allocated at least
as much memory as my method, if not more.  Therefore I'd
argue that the realloc is more likely to fail for you.

Let's also consider what happens when it does fail.  In your
scheme you've got to drop the packet.  In mine, you just continue
as we do now.
 
> your scheme:
> 
>    Allocate full PAGE_SIZE SKB for each netlink transaction,
>    reallocate data area at end of building each transaction.

That's not quite true.  Some users (e.g., tcp_diag) can and do
accurately estimate the final message size.  They do not require
a realloc/copy at all.

> So, two data area kmalloc()'s per netlink response compared
> to one for mine.

Now I admit that is one weakness in my scheme.  However, I'd
argue that the performance cost of doing one extra kmalloc is
lost when compared to the rest of the work in filling up the
skb and copying it, no?

> Your scheme doesn't even avoid the copy.  What's the advantage?

Actually, it does avoid the copy for all dumped packets except the
final one as well as those packets that are near the right size already.

> Furthermore my per-socket PAGE_SIZE scratch area is likely to
> get very good cache hit rates making the copy very inexpensive.

True.  But surely it is better to not copy at all where possible?

Here is the real patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 948 bytes --]

===== net/netlink/af_netlink.c 1.53 vs edited =====
--- 1.53/net/netlink/af_netlink.c	2004-09-22 12:32:44 +10:00
+++ edited/net/netlink/af_netlink.c	2004-09-25 09:53:19 +10:00
@@ -536,12 +536,25 @@
 	sock_put(sk);
 }
 
+static inline void netlink_trim(struct sk_buff *skb, int allocation)
+{
+	int delta = skb->end - skb->tail;
+
+	if (delta * 2 < skb->truesize)
+		return;
+	if (pskb_expand_head(skb, 0, -delta, allocation))
+		return;
+	skb->truesize -= delta;
+}
+
 int netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 pid, int nonblock)
 {
 	struct sock *sk;
 	int err;
 	long timeo;
 
+	netlink_trim(skb, gfp_any());
+
 	timeo = sock_sndtimeo(ssk, nonblock);
 retry:
 	sk = netlink_getsockbypid(ssk, pid);
@@ -587,6 +600,8 @@
 	struct sk_buff *skb2 = NULL;
 	int protocol = ssk->sk_protocol;
 	int failure = 0, delivered = 0;
+
+	netlink_trim(skb, allocation);
 
 	/* While we sleep in clone, do not allow to change socket list */
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Set truesize in pskb_expand_head
  2004-09-25  0:05         ` Herbert Xu
@ 2004-09-27 19:27           ` David S. Miller
  2004-09-27 21:29             ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: David S. Miller @ 2004-09-27 19:27 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

On Sat, 25 Sep 2004 10:05:03 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Here is the real patch.

Ok.. idea wise.  But, why are you subtracting 'delta' from
skb->truesize?  That's the amount actually used not the
amount trimmed by pskb_expand_head().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Set truesize in pskb_expand_head
  2004-09-27 19:27           ` David S. Miller
@ 2004-09-27 21:29             ` Herbert Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2004-09-27 21:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Mon, Sep 27, 2004 at 12:27:00PM -0700, David S. Miller wrote:
> 
> Ok.. idea wise.  But, why are you subtracting 'delta' from
> skb->truesize?  That's the amount actually used not the
> amount trimmed by pskb_expand_head().

delta = skb->end - skb->tail which is the amount we want to trim.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-09-27 21:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-24 23:20 Set truesize in pskb_expand_head Herbert Xu
2004-09-24 23:33 ` David S. Miller
2004-09-24 23:35   ` Herbert Xu
2004-09-24 23:37     ` Herbert Xu
2004-09-24 23:45       ` David S. Miller
2004-09-25  0:05         ` Herbert Xu
2004-09-27 19:27           ` David S. Miller
2004-09-27 21:29             ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).