netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] truesize lies
@ 2011-10-09 19:02 Eric Dumazet
  2011-10-09 19:25 ` Andi Kleen
  2011-10-09 21:27 ` David Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2011-10-09 19:02 UTC (permalink / raw)
  To: netdev

I noticed on my laptop a difference if I use wifi or wired internet
connectivity.

On wifi mode, I can see netstat -s giving sign of bad rcvbuf tuning :

   371 packets collapsed in receive queue due to low socket buffer

After some analysis, I found full sized tcp skbs (len=1440) had a 2864
bytes truesize on my wifi adapter. Ouch...

On tg3 adapter, truesize is MTU + NET_SKB_PAD + sizeof(sk_buff).

On some devices (say... NIU, but other drivers have same problem :
bnx2x, ), truesize is sizeof(sk_buff) + frame_length.

So if 'truesize' really means to account exact memory cost of frames
(including the empty space after used part), I would say :

1) NIU is lying (it should account in niu_rx_skb_append() not the used
part of the page, but reserved part : PAGE_SIZE/rbr_blocks_per_page)

2) Autotuning is a bit pessimistic : It works only if truesize is MSS +
sizeof(sk_buff) + 16 + MAX_TCP_HEADER


I guess most driver writers adjust truesize to please TCP stack and get
nice performance numbers.

skb->truesize = frame_len + sizeof(sk_buff);

What should we do exactly ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] truesize lies
  2011-10-09 19:02 [RFC] truesize lies Eric Dumazet
@ 2011-10-09 19:25 ` Andi Kleen
  2011-10-09 21:27 ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2011-10-09 19:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Eric Dumazet <eric.dumazet@gmail.com> writes:

> After some analysis, I found full sized tcp skbs (len=1440) had a 2864
> bytes truesize on my wifi adapter. Ouch...

It's probably more like 4096. slab rounds to powers of two.

> What should we do exactly ?

Give up? It's always been a lie.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] truesize lies
  2011-10-09 19:02 [RFC] truesize lies Eric Dumazet
  2011-10-09 19:25 ` Andi Kleen
@ 2011-10-09 21:27 ` David Miller
  2011-10-09 21:55   ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: David Miller @ 2011-10-09 21:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 09 Oct 2011 21:02:28 +0200

> I guess most driver writers adjust truesize to please TCP stack and get
> nice performance numbers.
> 
> skb->truesize = frame_len + sizeof(sk_buff);
> 
> What should we do exactly ?

We should pick a specification that accounts for all the metadata as well
as unused data areas properly, and enforce this in netif_receive_skb().

That will get the inaccurate cases fixed real fast.

We've had similar problems in the past and dealt with it in the same
exact way.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] truesize lies
  2011-10-09 21:27 ` David Miller
@ 2011-10-09 21:55   ` Eric Dumazet
  2011-10-09 22:10     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2011-10-09 21:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le dimanche 09 octobre 2011 à 17:27 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 09 Oct 2011 21:02:28 +0200
> 
> > I guess most driver writers adjust truesize to please TCP stack and get
> > nice performance numbers.
> > 
> > skb->truesize = frame_len + sizeof(sk_buff);
> > 
> > What should we do exactly ?
> 
> We should pick a specification that accounts for all the metadata as well
> as unused data areas properly, and enforce this in netif_receive_skb().
> 
> That will get the inaccurate cases fixed real fast.
> 
> We've had similar problems in the past and dealt with it in the same
> exact way.


Some drivers splits a page in two (or more) pieces, so we cant know what
was really reserved by a driver for paged skbs.

Only thing we could enforce at the moment is the proper accounting for
skb head :

WARN_ON(skb->truesize < (skb_end_pointer(skb) - skb->head) + sizeof(sk_buff) + skb->data_len);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] truesize lies
  2011-10-09 21:55   ` Eric Dumazet
@ 2011-10-09 22:10     ` David Miller
  2011-10-10  6:54       ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2011-10-09 22:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 09 Oct 2011 23:55:20 +0200

> Some drivers splits a page in two (or more) pieces, so we cant know what
> was really reserved by a driver for paged skbs.
> 
> Only thing we could enforce at the moment is the proper accounting for
> skb head :
> 
> WARN_ON(skb->truesize < (skb_end_pointer(skb) - skb->head) + sizeof(sk_buff) + skb->data_len);

This is partly true.

Drivers that use page pools divide pages up into different pools, each
with some specific block size.  At least this is how NIU works.

So NIU knows exactly how much of the block is logically part of that
SKB yet unused.

And if we really wanted to we could add a frag[].reserved field that
keeps track of this for debugging.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] truesize lies
  2011-10-09 22:10     ` David Miller
@ 2011-10-10  6:54       ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2011-10-10  6:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le dimanche 09 octobre 2011 à 18:10 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 09 Oct 2011 23:55:20 +0200
> 
> > Some drivers splits a page in two (or more) pieces, so we cant know what
> > was really reserved by a driver for paged skbs.
> > 
> > Only thing we could enforce at the moment is the proper accounting for
> > skb head :
> > 
> > WARN_ON(skb->truesize < (skb_end_pointer(skb) - skb->head) + sizeof(sk_buff) + skb->data_len);
> 
> This is partly true.
> 
> Drivers that use page pools divide pages up into different pools, each
> with some specific block size.  At least this is how NIU works.
> 
> So NIU knows exactly how much of the block is logically part of that
> SKB yet unused.
> 
> And if we really wanted to we could add a frag[].reserved field that
> keeps track of this for debugging.

Hmm, this would still be possible for a driver to lie and consume lot of
ram on 64bit arches.

How about using a "struct page" field then, since for a given page, we
can assume/enforce it was divided in equal units (kind of a negative
order) by a driver.

Instead of using alloc_page(order) a driver could use

alloc_page_truesize(gfp_t mask, int page_order, int subunit_order)

For example, an x86 driver using 4KB page splitted in two pieces would
use alloc_page_truesize(gfp, 0, 1)

To get frag[i].reserved (aka truesize), we then would do

static inline int frag_truesize(const skb_frag_t *frag)
{
	return compound_head(frag->page)->truesize;
}

(Not even sure compound_head() is even necessary here, but I see NIU
uses it in niu_rbr_add_page() ?)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-10-10  6:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-09 19:02 [RFC] truesize lies Eric Dumazet
2011-10-09 19:25 ` Andi Kleen
2011-10-09 21:27 ` David Miller
2011-10-09 21:55   ` Eric Dumazet
2011-10-09 22:10     ` David Miller
2011-10-10  6:54       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).