* [PATCH] xen/netfront: improve truesize tracking
@ 2012-12-18 13:51 Ian Campbell
2012-12-18 14:10 ` Konrad Rzeszutek Wilk
2012-12-18 15:12 ` Eric Dumazet
0 siblings, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2012-12-18 13:51 UTC (permalink / raw)
To: netdev
Cc: Ian Campbell, Sander Eikelenboom, Konrad Rzeszutek Wilk, annie li,
xen-devel
Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
than that. We have already accounted for this in
NETFRONT_SKB_CB(skb)->pull_to so use that instead.
Fixes WARN_ON from skb_try_coalesce.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: annie li <annie.li@oracle.com>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
Cc: stable@kernel.org # 3.7.x only
---
drivers/net/xen-netfront.c | 15 +++++----------
1 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index caa0110..b06ef81 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -971,17 +971,12 @@ err:
* overheads. Here, we add the size of the data pulled
* in xennet_fill_frags().
*
- * We also adjust for any unused space in the main
- * data area by subtracting (RX_COPY_THRESHOLD -
- * len). This is especially important with drivers
- * which split incoming packets into header and data,
- * using only 66 bytes of the main data area (see the
- * e1000 driver for example.) On such systems,
- * without this last adjustement, our achievable
- * receive throughout using the standard receive
- * buffer size was cut by 25%(!!!).
+ * We also adjust for the __pskb_pull_tail done in
+ * handle_incoming_queue which pulls data from the
+ * frags into the head area, which is already
+ * accounted in RX_COPY_THRESHOLD.
*/
- skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
+ skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
skb->len += skb->data_len;
if (rx->flags & XEN_NETRXF_csum_blank)
--
1.7.2.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 13:51 [PATCH] xen/netfront: improve truesize tracking Ian Campbell
@ 2012-12-18 14:10 ` Konrad Rzeszutek Wilk
2012-12-18 14:15 ` Ian Campbell
2012-12-18 15:12 ` Eric Dumazet
1 sibling, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-18 14:10 UTC (permalink / raw)
To: Ian Campbell; +Cc: netdev, Sander Eikelenboom, annie li, xen-devel
On Tue, Dec 18, 2012 at 01:51:51PM +0000, Ian Campbell wrote:
> Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> than that. We have already accounted for this in
> NETFRONT_SKB_CB(skb)->pull_to so use that instead.
>
> Fixes WARN_ON from skb_try_coalesce.
This should probably be also on the stable tree for 3.7 at least?
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
^^ - Reported-by:
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
^^ - Acked-by:
> Cc: annie li <annie.li@oracle.com>
> Cc: xen-devel@lists.xensource.com
> Cc: netdev@vger.kernel.org
> Cc: stable@kernel.org # 3.7.x only
> ---
> drivers/net/xen-netfront.c | 15 +++++----------
> 1 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..b06ef81 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -971,17 +971,12 @@ err:
> * overheads. Here, we add the size of the data pulled
> * in xennet_fill_frags().
> *
> - * We also adjust for any unused space in the main
> - * data area by subtracting (RX_COPY_THRESHOLD -
> - * len). This is especially important with drivers
> - * which split incoming packets into header and data,
> - * using only 66 bytes of the main data area (see the
> - * e1000 driver for example.) On such systems,
> - * without this last adjustement, our achievable
> - * receive throughout using the standard receive
> - * buffer size was cut by 25%(!!!).
> + * We also adjust for the __pskb_pull_tail done in
> + * handle_incoming_queue which pulls data from the
> + * frags into the head area, which is already
> + * accounted in RX_COPY_THRESHOLD.
> */
> - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> + skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
> skb->len += skb->data_len;
>
> if (rx->flags & XEN_NETRXF_csum_blank)
> --
> 1.7.2.5
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 14:10 ` Konrad Rzeszutek Wilk
@ 2012-12-18 14:15 ` Ian Campbell
0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2012-12-18 14:15 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: netdev@vger.kernel.org, Sander Eikelenboom, annie li,
xen-devel@lists.xensource.com
On Tue, 2012-12-18 at 14:10 +0000, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 18, 2012 at 01:51:51PM +0000, Ian Campbell wrote:
> > Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> > than that. We have already accounted for this in
> > NETFRONT_SKB_CB(skb)->pull_to so use that instead.
> >
> > Fixes WARN_ON from skb_try_coalesce.
>
> This should probably be also on the stable tree for 3.7 at least?
Yes, hence "Cc: stable@kernel.org # 3.7.x only" below ;-)
> >
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Sander Eikelenboom <linux@eikelenboom.it>
> ^^ - Reported-by:
>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ^^ - Acked-by:
>
> > Cc: annie li <annie.li@oracle.com>
> > Cc: xen-devel@lists.xensource.com
> > Cc: netdev@vger.kernel.org
> > Cc: stable@kernel.org # 3.7.x only
> > ---
> > drivers/net/xen-netfront.c | 15 +++++----------
> > 1 files changed, 5 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index caa0110..b06ef81 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -971,17 +971,12 @@ err:
> > * overheads. Here, we add the size of the data pulled
> > * in xennet_fill_frags().
> > *
> > - * We also adjust for any unused space in the main
> > - * data area by subtracting (RX_COPY_THRESHOLD -
> > - * len). This is especially important with drivers
> > - * which split incoming packets into header and data,
> > - * using only 66 bytes of the main data area (see the
> > - * e1000 driver for example.) On such systems,
> > - * without this last adjustement, our achievable
> > - * receive throughout using the standard receive
> > - * buffer size was cut by 25%(!!!).
> > + * We also adjust for the __pskb_pull_tail done in
> > + * handle_incoming_queue which pulls data from the
> > + * frags into the head area, which is already
> > + * accounted in RX_COPY_THRESHOLD.
> > */
> > - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> > + skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
> > skb->len += skb->data_len;
> >
> > if (rx->flags & XEN_NETRXF_csum_blank)
> > --
> > 1.7.2.5
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 13:51 [PATCH] xen/netfront: improve truesize tracking Ian Campbell
2012-12-18 14:10 ` Konrad Rzeszutek Wilk
@ 2012-12-18 15:12 ` Eric Dumazet
2012-12-18 15:26 ` Ian Campbell
1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2012-12-18 15:12 UTC (permalink / raw)
To: Ian Campbell
Cc: netdev, Sander Eikelenboom, Konrad Rzeszutek Wilk, annie li,
xen-devel
On Tue, 2012-12-18 at 13:51 +0000, Ian Campbell wrote:
> Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> than that. We have already accounted for this in
> NETFRONT_SKB_CB(skb)->pull_to so use that instead.
>
> Fixes WARN_ON from skb_try_coalesce.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: annie li <annie.li@oracle.com>
> Cc: xen-devel@lists.xensource.com
> Cc: netdev@vger.kernel.org
> Cc: stable@kernel.org # 3.7.x only
> ---
> drivers/net/xen-netfront.c | 15 +++++----------
> 1 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..b06ef81 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -971,17 +971,12 @@ err:
> * overheads. Here, we add the size of the data pulled
> * in xennet_fill_frags().
> *
> - * We also adjust for any unused space in the main
> - * data area by subtracting (RX_COPY_THRESHOLD -
> - * len). This is especially important with drivers
> - * which split incoming packets into header and data,
> - * using only 66 bytes of the main data area (see the
> - * e1000 driver for example.) On such systems,
> - * without this last adjustement, our achievable
> - * receive throughout using the standard receive
> - * buffer size was cut by 25%(!!!).
> + * We also adjust for the __pskb_pull_tail done in
> + * handle_incoming_queue which pulls data from the
> + * frags into the head area, which is already
> + * accounted in RX_COPY_THRESHOLD.
> */
> - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> + skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
> skb->len += skb->data_len;
>
> if (rx->flags & XEN_NETRXF_csum_blank)
But skb truesize is not what you think.
You must account the exact memory used by this skb, not only the used
part of it.
At the very minimum, it should be
skb->truesize += skb->data_len;
But it really should be the allocated size of the fragment.
If its a page, then its a page, even if you use one single byte in it.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 15:12 ` Eric Dumazet
@ 2012-12-18 15:26 ` Ian Campbell
2012-12-18 16:13 ` Eric Dumazet
2012-12-19 11:34 ` Sander Eikelenboom
0 siblings, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2012-12-18 15:26 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev@vger.kernel.org, Sander Eikelenboom, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
On Tue, 2012-12-18 at 15:12 +0000, Eric Dumazet wrote:
> On Tue, 2012-12-18 at 13:51 +0000, Ian Campbell wrote:
> > Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> > than that. We have already accounted for this in
> > NETFRONT_SKB_CB(skb)->pull_to so use that instead.
> >
> > Fixes WARN_ON from skb_try_coalesce.
> >
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Sander Eikelenboom <linux@eikelenboom.it>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: annie li <annie.li@oracle.com>
> > Cc: xen-devel@lists.xensource.com
> > Cc: netdev@vger.kernel.org
> > Cc: stable@kernel.org # 3.7.x only
> > ---
> > drivers/net/xen-netfront.c | 15 +++++----------
> > 1 files changed, 5 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index caa0110..b06ef81 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -971,17 +971,12 @@ err:
> > * overheads. Here, we add the size of the data pulled
> > * in xennet_fill_frags().
> > *
> > - * We also adjust for any unused space in the main
> > - * data area by subtracting (RX_COPY_THRESHOLD -
> > - * len). This is especially important with drivers
> > - * which split incoming packets into header and data,
> > - * using only 66 bytes of the main data area (see the
> > - * e1000 driver for example.) On such systems,
> > - * without this last adjustement, our achievable
> > - * receive throughout using the standard receive
> > - * buffer size was cut by 25%(!!!).
> > + * We also adjust for the __pskb_pull_tail done in
> > + * handle_incoming_queue which pulls data from the
> > + * frags into the head area, which is already
> > + * accounted in RX_COPY_THRESHOLD.
> > */
> > - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> > + skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
> > skb->len += skb->data_len;
> >
> > if (rx->flags & XEN_NETRXF_csum_blank)
>
>
> But skb truesize is not what you think.
Indeed, it seems I was completely backwards about what it means!
> You must account the exact memory used by this skb, not only the used
> part of it.
>
> At the very minimum, it should be
>
> skb->truesize += skb->data_len;
>
> But it really should be the allocated size of the fragment.
>
> If its a page, then its a page, even if you use one single byte in it.
So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
Sander, can you try that change?
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 15:26 ` Ian Campbell
@ 2012-12-18 16:13 ` Eric Dumazet
2012-12-18 16:22 ` Ian Campbell
2012-12-19 11:34 ` Sander Eikelenboom
1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2012-12-18 16:13 UTC (permalink / raw)
To: Ian Campbell
Cc: netdev@vger.kernel.org, Sander Eikelenboom, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:
> So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
>
I dont know what are the real frag sizes in your case.
Some drivers allocate a full page for an ethernet frame, others use half
of a page, it really depends.
As the frag ABI doesnt contain real size, its ok in this case to account
the actual frag size.
(skb->data_len in your driver)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 16:13 ` Eric Dumazet
@ 2012-12-18 16:22 ` Ian Campbell
2012-12-18 16:35 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2012-12-18 16:22 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev@vger.kernel.org, Sander Eikelenboom, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
On Tue, 2012-12-18 at 16:13 +0000, Eric Dumazet wrote:
> On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:
>
> > So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> >
>
> I dont know what are the real frag sizes in your case.
I think it's a page, see xennet_alloc_rx_buffers and the alloc_page
therein.
> Some drivers allocate a full page for an ethernet frame, others use half
> of a page, it really depends.
>
> As the frag ABI doesnt contain real size, its ok in this case to account
> the actual frag size.
>
> (skb->data_len in your driver)
I guess I'm a bit confused by what truesize means again then ;-),
because in that case the original patch is correct although it would
have been less confusing to do:
skb->truesize += skb->data_len;
in xennet_poll() and then do the subtraction of
NETFRONT_SKB_CB(skb)->pull_to in handle_incoming_queue() where we
actually do the pull up.
Unless __pskb_pull_tail does that adjustment for us, but if it does I
can't see where.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 16:22 ` Ian Campbell
@ 2012-12-18 16:35 ` Eric Dumazet
0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2012-12-18 16:35 UTC (permalink / raw)
To: Ian Campbell
Cc: netdev@vger.kernel.org, Sander Eikelenboom, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
On Tue, 2012-12-18 at 16:22 +0000, Ian Campbell wrote:
> On Tue, 2012-12-18 at 16:13 +0000, Eric Dumazet wrote:
> > On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:
> >
> > > So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> > >
> >
> > I dont know what are the real frag sizes in your case.
>
> I think it's a page, see xennet_alloc_rx_buffers and the alloc_page
> therein.
>
If they are order-0 pages, then PAGE_SIZE * nr_frags is OK.
> > Some drivers allocate a full page for an ethernet frame, others use half
> > of a page, it really depends.
> >
> > As the frag ABI doesnt contain real size, its ok in this case to account
> > the actual frag size.
> >
> > (skb->data_len in your driver)
>
> I guess I'm a bit confused by what truesize means again then ;-),
> because in that case the original patch is correct although it would
> have been less confusing to do:
> skb->truesize += skb->data_len;
> in xennet_poll() and then do the subtraction of
> NETFRONT_SKB_CB(skb)->pull_to in handle_incoming_queue() where we
> actually do the pull up.
>
> Unless __pskb_pull_tail does that adjustment for us, but if it does I
> can't see where.
Thats because skb frags only contain :
a page pointer.
An offset
A size. (Exact number or used bytes in this frag)
And not the 'originally allocated size. It could be 256, 768, 2048,
4096, 65536 bytes, nobody but the driver really knows.
So when we pull X bytes from a fragment to skb->head, there is no way to
remember what was the original size of the fragment.
Only the driver allocating the frag knows its truesize.
Once skb is given to the stack, we lose this information, and rely on
skb->truesize being an accurate estimation.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-18 15:26 ` Ian Campbell
2012-12-18 16:13 ` Eric Dumazet
@ 2012-12-19 11:34 ` Sander Eikelenboom
2012-12-19 16:17 ` Eric Dumazet
1 sibling, 1 reply; 22+ messages in thread
From: Sander Eikelenboom @ 2012-12-19 11:34 UTC (permalink / raw)
To: Ian Campbell
Cc: Eric Dumazet, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
Tuesday, December 18, 2012, 4:26:38 PM, you wrote:
> On Tue, 2012-12-18 at 15:12 +0000, Eric Dumazet wrote:
>> On Tue, 2012-12-18 at 13:51 +0000, Ian Campbell wrote:
>> > Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
>> > than that. We have already accounted for this in
>> > NETFRONT_SKB_CB(skb)->pull_to so use that instead.
>> >
>> > Fixes WARN_ON from skb_try_coalesce.
>> >
>> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> > Cc: Sander Eikelenboom <linux@eikelenboom.it>
>> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > Cc: annie li <annie.li@oracle.com>
>> > Cc: xen-devel@lists.xensource.com
>> > Cc: netdev@vger.kernel.org
>> > Cc: stable@kernel.org # 3.7.x only
>> > ---
>> > drivers/net/xen-netfront.c | 15 +++++----------
>> > 1 files changed, 5 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> > index caa0110..b06ef81 100644
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -971,17 +971,12 @@ err:
>> > * overheads. Here, we add the size of the data pulled
>> > * in xennet_fill_frags().
>> > *
>> > - * We also adjust for any unused space in the main
>> > - * data area by subtracting (RX_COPY_THRESHOLD -
>> > - * len). This is especially important with drivers
>> > - * which split incoming packets into header and data,
>> > - * using only 66 bytes of the main data area (see the
>> > - * e1000 driver for example.) On such systems,
>> > - * without this last adjustement, our achievable
>> > - * receive throughout using the standard receive
>> > - * buffer size was cut by 25%(!!!).
>> > + * We also adjust for the __pskb_pull_tail done in
>> > + * handle_incoming_queue which pulls data from the
>> > + * frags into the head area, which is already
>> > + * accounted in RX_COPY_THRESHOLD.
>> > */
>> > - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> > + skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
>> > skb->len += skb->data_len;
>> >
>> > if (rx->flags & XEN_NETRXF_csum_blank)
>>
>>
>> But skb truesize is not what you think.
> Indeed, it seems I was completely backwards about what it means!
>> You must account the exact memory used by this skb, not only the used
>> part of it.
>>
>> At the very minimum, it should be
>>
>> skb->truesize += skb->data_len;
>>
>> But it really should be the allocated size of the fragment.
>>
>> If its a page, then its a page, even if you use one single byte in it.
> So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> Sander, can you try that change?
Hi Ian,
It ran overnight and i haven't seen the warn_once trigger.
(but i also didn't with the previous patch)
--
Sander
> Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-19 11:34 ` Sander Eikelenboom
@ 2012-12-19 16:17 ` Eric Dumazet
2012-12-20 12:51 ` Sander Eikelenboom
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2012-12-19 16:17 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
> Hi Ian,
>
> It ran overnight and i haven't seen the warn_once trigger.
> (but i also didn't with the previous patch)
>
As I said, the miminum value to not trigger the warning was what Ian
patch was doing, but it was still a not accurate estimation.
Doing the real accounting might trigger slow transferts, or dropped
packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
So the real question was : If accounting for full pages, is your
applications run as smooth as before, with no huge performance
regression ?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-19 16:17 ` Eric Dumazet
@ 2012-12-20 12:51 ` Sander Eikelenboom
2012-12-20 14:23 ` [Xen-devel] " Sander Eikelenboom
2012-12-20 15:39 ` Eric Dumazet
0 siblings, 2 replies; 22+ messages in thread
From: Sander Eikelenboom @ 2012-12-20 12:51 UTC (permalink / raw)
To: Eric Dumazet
Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
Wednesday, December 19, 2012, 5:17:49 PM, you wrote:
> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
>> Hi Ian,
>>
>> It ran overnight and i haven't seen the warn_once trigger.
>> (but i also didn't with the previous patch)
>>
> As I said, the miminum value to not trigger the warning was what Ian
> patch was doing, but it was still a not accurate estimation.
> Doing the real accounting might trigger slow transferts, or dropped
> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
> So the real question was : If accounting for full pages, is your
> applications run as smooth as before, with no huge performance
> regression ?
Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.
I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..
- The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
- The first variant (current code) seems to always substract from the truesize for small packets.
- The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
- The third variant seems to be a pretty wasteful estimation.
So the last variant seems to be rather wasteful, and the second one the most accurate so far.
Eric:
From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
[ 116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
[ 117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
[ 117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
[ 117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
[ 117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
[ 117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
[ 117.539403] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
[ 117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
[ 117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
[ 121.981917] net_ratelimit: 27 callbacks suppressed
[ 121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c26e28b..8833e38 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
struct sk_buff_head tmpq;
unsigned long flags;
int err;
+ int tsz,len;
spin_lock(&np->rx_lock);
@@ -1037,9 +1038,22 @@ err:
* receive throughout using the standard receive
* buffer size was cut by 25%(!!!).
*/
- skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
+
+
+
+
+ tsz = skb->truesize;
+ len = skb->len;
+ /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
+ skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
skb->len += skb->data_len;
+ net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
+ skb->dev->name, skb->dev->mtu, skb->data_len, len, skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
+ skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
+ skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
+ PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
+
if (rx->flags & XEN_NETRXF_csum_blank)
skb->ip_summed = CHECKSUM_PARTIAL;
else if (rx->flags & XEN_NETRXF_data_validated)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3ab989b..6d0cd86 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
WARN_ON_ONCE(delta < len);
+ if(delta < len) {
+ net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+ to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+ }
+
+ if (delta > len && delta - len > 100) {
+ net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+ to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+ }
+
memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
skb_shinfo(from)->frags,
skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
2012-12-20 12:51 ` Sander Eikelenboom
@ 2012-12-20 14:23 ` Sander Eikelenboom
2012-12-20 14:58 ` Sander Eikelenboom
2012-12-20 15:39 ` Eric Dumazet
1 sibling, 1 reply; 22+ messages in thread
From: Sander Eikelenboom @ 2012-12-20 14:23 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
xen-devel@lists.xensource.com, Ian Campbell,
Konrad Rzeszutek Wilk
Thursday, December 20, 2012, 1:51:39 PM, you wrote:
> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:
>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
>>> Hi Ian,
>>>
>>> It ran overnight and i haven't seen the warn_once trigger.
>>> (but i also didn't with the previous patch)
>>>
>> As I said, the miminum value to not trigger the warning was what Ian
>> patch was doing, but it was still a not accurate estimation.
>> Doing the real accounting might trigger slow transferts, or dropped
>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
>> So the real question was : If accounting for full pages, is your
>> applications run as smooth as before, with no huge performance
>> regression ?
> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.
> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..
> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
> - The first variant (current code) seems to always substract from the truesize for small packets.
> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
> - The third variant seems to be a pretty wasteful estimation.
> So the last variant seems to be rather wasteful, and the second one the most accurate so far.
> Eric:
> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
> [ 116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
> [ 117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
> [ 117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
> [ 117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
> [ 117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
> [ 117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
> [ 117.539403] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
> [ 117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
> [ 117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
> [ 121.981917] net_ratelimit: 27 callbacks suppressed
> [ 121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c26e28b..8833e38 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
> struct sk_buff_head tmpq;
> unsigned long flags;
> int err;
> + int tsz,len;
> spin_lock(&np->rx_lock);
> @@ -1037,9 +1038,22 @@ err:
> * receive throughout using the standard receive
> * buffer size was cut by 25%(!!!).
> */
> - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> +
> +
> +
> +
> + tsz = skb->truesize;
> + len = skb->len;
> + /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
> + skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> skb->len += skb->data_len;
> + net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
> + skb->dev->name, skb->dev->mtu, skb->data_len, len, skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
> + skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
> + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
> + PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
> +
> if (rx->flags & XEN_NETRXF_csum_blank)
> skb->ip_summed = CHECKSUM_PARTIAL;
> else if (rx->flags & XEN_NETRXF_data_validated)
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 3ab989b..6d0cd86 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> WARN_ON_ONCE(delta < len);
> + if(delta < len) {
> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> + to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> + }
> +
+ if (delta >> len && delta - len > 100) {
> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> + to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> + }
> +
> memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
> skb_shinfo(from)->frags,
> skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(
Dec 20 15:17:33 media kernel: [ 393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 394.940211] ------------[ cut here ]------------
Dec 20 15:17:35 media kernel: [ 394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
Dec 20 15:17:35 media kernel: [ 394.940282] Modules linked in:
Dec 20 15:17:35 media kernel: [ 394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
Dec 20 15:17:35 media kernel: [ 394.940330] Call Trace:
Dec 20 15:17:35 media kernel: [ 394.940343] <IRQ> [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
Dec 20 15:17:35 media kernel: [ 394.940384] [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
Dec 20 15:17:35 media kernel: [ 394.940409] [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
Dec 20 15:17:35 media kernel: [ 394.940434] [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
Dec 20 15:17:35 media kernel: [ 394.940458] [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
Dec 20 15:17:35 media kernel: [ 394.940481] [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
Dec 20 15:17:35 media kernel: [ 394.940504] [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
Dec 20 15:17:35 media kernel: [ 394.940528] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [ 394.940551] [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
Dec 20 15:17:35 media kernel: [ 394.940576] [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
Dec 20 15:17:35 media kernel: [ 394.940600] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [ 394.940623] [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
Dec 20 15:17:35 media kernel: [ 394.940666] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [ 394.940694] [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
Dec 20 15:17:35 media kernel: [ 394.940720] [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
Dec 20 15:17:35 media kernel: [ 394.940745] [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
Dec 20 15:17:35 media kernel: [ 394.940784] [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
Dec 20 15:17:35 media kernel: [ 394.940807] [<ffffffff818e4b68>] ip_rcv+0x218/0x300
Dec 20 15:17:35 media kernel: [ 394.940829] [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
Dec 20 15:17:35 media kernel: [ 394.940853] [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
Dec 20 15:17:35 media kernel: [ 394.940889] [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
Dec 20 15:17:35 media kernel: [ 394.940914] [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
Dec 20 15:17:35 media kernel: [ 394.940939] [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
Dec 20 15:17:35 media kernel: [ 394.940964] [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
Dec 20 15:17:35 media kernel: [ 394.941000] [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
Dec 20 15:17:35 media kernel: [ 394.941024] [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
Dec 20 15:17:35 media kernel: [ 394.941046] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [ 394.941075] [<ffffffff8184ed66>] net_rx_action+0x136/0x260
Dec 20 15:17:35 media kernel: [ 394.941098] [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
Dec 20 15:17:35 media kernel: [ 394.941133] [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
Dec 20 15:17:35 media kernel: [ 394.941157] [<ffffffff819b623c>] call_softirq+0x1c/0x30
Dec 20 15:17:35 media kernel: [ 394.941179] [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
Dec 20 15:17:35 media kernel: [ 394.941201] [<ffffffff8107041e>] irq_exit+0x9e/0xd0
Dec 20 15:17:35 media kernel: [ 394.941235] [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
Dec 20 15:17:35 media kernel: [ 394.941259] [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
Dec 20 15:17:35 media kernel: [ 394.941279] <EOI> [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [ 394.941318] [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [ 394.941356] [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
Dec 20 15:17:35 media kernel: [ 394.941381] [<ffffffff810092b2>] ? check_events+0x12/0x20
Dec 20 15:17:35 media kernel: [ 394.941405] [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
Dec 20 15:17:35 media kernel: [ 394.941432] [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
Dec 20 15:17:35 media kernel: [ 394.941473] [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
Dec 20 15:17:35 media kernel: [ 394.941507] [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
Dec 20 15:17:35 media kernel: [ 394.941533] [<ffffffff819b2434>] ? __schedule+0x444/0x880
Dec 20 15:17:35 media kernel: [ 394.941555] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [ 394.941580] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [ 394.941614] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [ 394.941638] [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
Dec 20 15:17:35 media kernel: [ 394.941663] [<ffffffff819b2904>] ? schedule+0x24/0x70
Dec 20 15:17:35 media kernel: [ 394.941697] [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
Dec 20 15:17:35 media kernel: [ 394.941725] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [ 394.941748] [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
Dec 20 15:17:35 media kernel: [ 394.941770] [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
Dec 20 15:17:35 media kernel: [ 394.941808] [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
Dec 20 15:17:35 media kernel: [ 394.941833] [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
Dec 20 15:17:35 media kernel: [ 394.941856] [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
Dec 20 15:17:35 media kernel: [ 394.941878] [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
Dec 20 15:17:35 media kernel: [ 394.941913] [<ffffffff810b5b17>] ? lock_release+0x117/0x250
Dec 20 15:17:35 media kernel: [ 394.941938] [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
Dec 20 15:17:35 media kernel: [ 394.941959] [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
Dec 20 15:17:35 media kernel: [ 394.941981] [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
Dec 20 15:17:35 media kernel: [ 394.942015] [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
Dec 20 15:17:35 media kernel: [ 394.942036] ---[ end trace 6f3a832c9e91c8af ]---
Dec 20 15:17:35 media kernel: [ 394.942056] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
Dec 20 15:17:35 media kernel: [ 394.968199] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
Dec 20 15:17:35 media kernel: [ 395.262814] net_ratelimit: 371 callbacks suppressed
Dec 20 15:17:35 media kernel: [ 395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
2012-12-20 14:23 ` [Xen-devel] " Sander Eikelenboom
@ 2012-12-20 14:58 ` Sander Eikelenboom
0 siblings, 0 replies; 22+ messages in thread
From: Sander Eikelenboom @ 2012-12-20 14:58 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
xen-devel@lists.xensource.com, Ian Campbell,
Konrad Rzeszutek Wilk
Thursday, December 20, 2012, 3:23:26 PM, you wrote:
> Thursday, December 20, 2012, 1:51:39 PM, you wrote:
>> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:
>>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
>>>> Hi Ian,
>>>>
>>>> It ran overnight and i haven't seen the warn_once trigger.
>>>> (but i also didn't with the previous patch)
>>>>
>>> As I said, the miminum value to not trigger the warning was what Ian
>>> patch was doing, but it was still a not accurate estimation.
>>> Doing the real accounting might trigger slow transferts, or dropped
>>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
>>> So the real question was : If accounting for full pages, is your
>>> applications run as smooth as before, with no huge performance
>>> regression ?
>> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.
>> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
>> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..
>> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
>> - The first variant (current code) seems to always substract from the truesize for small packets.
>> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
>> - The third variant seems to be a pretty wasteful estimation.
>> So the last variant seems to be rather wasteful, and the second one the most accurate so far.
>> Eric:
>> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
>> [ 116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
>> [ 117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
>> [ 117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
>> [ 117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
>> [ 117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
>> [ 117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
>> [ 117.539403] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
>> [ 117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
>> [ 117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
>> [ 121.981917] net_ratelimit: 27 callbacks suppressed
>> [ 121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c26e28b..8833e38 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>> struct sk_buff_head tmpq;
>> unsigned long flags;
>> int err;
>> + int tsz,len;
>> spin_lock(&np->rx_lock);
>> @@ -1037,9 +1038,22 @@ err:
>> * receive throughout using the standard receive
>> * buffer size was cut by 25%(!!!).
>> */
>> - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> +
>> +
>> +
>> +
>> + tsz = skb->truesize;
>> + len = skb->len;
>> + /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
>> + skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> skb->len += skb->data_len;
>> + net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
>> + skb->dev->name, skb->dev->mtu, skb->data_len, len, skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
>> + skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
>> + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
>> + PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
>> +
>> if (rx->flags & XEN_NETRXF_csum_blank)
>> skb->ip_summed = CHECKSUM_PARTIAL;
>> else if (rx->flags & XEN_NETRXF_data_validated)
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 3ab989b..6d0cd86 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>> WARN_ON_ONCE(delta < len);
>> + if(delta < len) {
>> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> + to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> + }
>> +
+ if (delta >>> len && delta - len > 100) {
>> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> + to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> + }
>> +
>> memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>> skb_shinfo(from)->frags,
>> skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
> Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(
> Dec 20 15:17:33 media kernel: [ 393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 394.940211] ------------[ cut here ]------------
> Dec 20 15:17:35 media kernel: [ 394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
> Dec 20 15:17:35 media kernel: [ 394.940282] Modules linked in:
> Dec 20 15:17:35 media kernel: [ 394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
> Dec 20 15:17:35 media kernel: [ 394.940330] Call Trace:
> Dec 20 15:17:35 media kernel: [ 394.940343] <IRQ> [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
> Dec 20 15:17:35 media kernel: [ 394.940384] [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
> Dec 20 15:17:35 media kernel: [ 394.940409] [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
> Dec 20 15:17:35 media kernel: [ 394.940434] [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
> Dec 20 15:17:35 media kernel: [ 394.940458] [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
> Dec 20 15:17:35 media kernel: [ 394.940481] [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
> Dec 20 15:17:35 media kernel: [ 394.940504] [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
> Dec 20 15:17:35 media kernel: [ 394.940528] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940551] [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
> Dec 20 15:17:35 media kernel: [ 394.940576] [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
> Dec 20 15:17:35 media kernel: [ 394.940600] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940623] [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940666] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [ 394.940694] [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
> Dec 20 15:17:35 media kernel: [ 394.940720] [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
> Dec 20 15:17:35 media kernel: [ 394.940745] [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
> Dec 20 15:17:35 media kernel: [ 394.940784] [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
> Dec 20 15:17:35 media kernel: [ 394.940807] [<ffffffff818e4b68>] ip_rcv+0x218/0x300
> Dec 20 15:17:35 media kernel: [ 394.940829] [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
> Dec 20 15:17:35 media kernel: [ 394.940853] [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
> Dec 20 15:17:35 media kernel: [ 394.940889] [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
> Dec 20 15:17:35 media kernel: [ 394.940914] [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
> Dec 20 15:17:35 media kernel: [ 394.940939] [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
> Dec 20 15:17:35 media kernel: [ 394.940964] [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
> Dec 20 15:17:35 media kernel: [ 394.941000] [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
> Dec 20 15:17:35 media kernel: [ 394.941024] [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
> Dec 20 15:17:35 media kernel: [ 394.941046] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [ 394.941075] [<ffffffff8184ed66>] net_rx_action+0x136/0x260
> Dec 20 15:17:35 media kernel: [ 394.941098] [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
> Dec 20 15:17:35 media kernel: [ 394.941133] [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
> Dec 20 15:17:35 media kernel: [ 394.941157] [<ffffffff819b623c>] call_softirq+0x1c/0x30
> Dec 20 15:17:35 media kernel: [ 394.941179] [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
> Dec 20 15:17:35 media kernel: [ 394.941201] [<ffffffff8107041e>] irq_exit+0x9e/0xd0
> Dec 20 15:17:35 media kernel: [ 394.941235] [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
> Dec 20 15:17:35 media kernel: [ 394.941259] [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
> Dec 20 15:17:35 media kernel: [ 394.941279] <EOI> [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [ 394.941318] [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [ 394.941356] [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
> Dec 20 15:17:35 media kernel: [ 394.941381] [<ffffffff810092b2>] ? check_events+0x12/0x20
> Dec 20 15:17:35 media kernel: [ 394.941405] [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
> Dec 20 15:17:35 media kernel: [ 394.941432] [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
> Dec 20 15:17:35 media kernel: [ 394.941473] [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
> Dec 20 15:17:35 media kernel: [ 394.941507] [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
> Dec 20 15:17:35 media kernel: [ 394.941533] [<ffffffff819b2434>] ? __schedule+0x444/0x880
> Dec 20 15:17:35 media kernel: [ 394.941555] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [ 394.941580] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [ 394.941614] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [ 394.941638] [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
> Dec 20 15:17:35 media kernel: [ 394.941663] [<ffffffff819b2904>] ? schedule+0x24/0x70
> Dec 20 15:17:35 media kernel: [ 394.941697] [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
> Dec 20 15:17:35 media kernel: [ 394.941725] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [ 394.941748] [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
> Dec 20 15:17:35 media kernel: [ 394.941770] [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
> Dec 20 15:17:35 media kernel: [ 394.941808] [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
> Dec 20 15:17:35 media kernel: [ 394.941833] [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
> Dec 20 15:17:35 media kernel: [ 394.941856] [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
> Dec 20 15:17:35 media kernel: [ 394.941878] [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
> Dec 20 15:17:35 media kernel: [ 394.941913] [<ffffffff810b5b17>] ? lock_release+0x117/0x250
> Dec 20 15:17:35 media kernel: [ 394.941938] [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
> Dec 20 15:17:35 media kernel: [ 394.941959] [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
> Dec 20 15:17:35 media kernel: [ 394.941981] [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
> Dec 20 15:17:35 media kernel: [ 394.942015] [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
> Dec 20 15:17:35 media kernel: [ 394.942036] ---[ end trace 6f3a832c9e91c8af ]---
> Dec 20 15:17:35 media kernel: [ 394.942056] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
> Dec 20 15:17:35 media kernel: [ 394.968199] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
> Dec 20 15:17:35 media kernel: [ 395.262814] net_ratelimit: 371 callbacks suppressed
> Dec 20 15:17:35 media kernel: [ 395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Hmm perhaps a better example, i have indented some perhaps interesting points:
Dec 20 14:12:57 media kernel: [ 794.895136] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.895431] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.895616] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18824(19720) variant3:24576(25472)
Dec 20 14:12:57 media kernel: [ 794.895804] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 794.895823] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:3 variant1:7050(7946) variant2:7050(7946) variant3:12288(13184)
Dec 20 14:12:57 media kernel: [ 794.895868] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:3
Dec 20 14:12:57 media kernel: [ 794.896133] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.896152] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.896200] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 794.907232] eth0: mtu:1500 data_len:23234 len before:0 len after:23234 truesize before:896 truesize after:23874 nr_frags:7 variant1:22978(23874) variant2:22978(23874) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907517] eth0: mtu:1500 data_len:24682 len before:0 len after:24682 truesize before:896 truesize after:25322 nr_frags:7 variant1:24426(25322) variant2:24426(25322) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907693] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907882] eth0: mtu:1500 data_len:14546 len before:0 len after:14546 truesize before:896 truesize after:15186 nr_frags:5 variant1:14290(15186) variant2:14290(15186) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.907901] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 794.907938] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:13482 len:13032 from->truesize:13738 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:4
Dec 20 14:12:57 media kernel: [ 794.908191] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28880(29776) variant3:36864(37760)
Dec 20 14:12:57 media kernel: [ 794.908386] eth0: mtu:1500 data_len:30474 len before:0 len after:30474 truesize before:896 truesize after:31114 nr_frags:8 variant1:30218(31114) variant2:30218(31114) variant3:32768(33664)
A1) Here we have a packet data_len: 5858 and truesize set to 6498 and nr_frags: 2
Dec 20 14:12:57 media kernel: [ 794.908560] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:2 variant1:5602(6498) variant2:5602(6498) variant3:8192(9088)
Dec 20 14:12:57 media kernel: [ 794.908581] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
A2) That seems to end up in skb_try_coalesce, from->nr_frags is still 2, delta >> LEN in this case, no warning but perhaps wasteful ?
Dec 20 14:12:57 media kernel: [ 794.908616] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:6242 len:5792 from->truesize:6498 skb_headlen(from):190 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2
Dec 20 14:12:57 media kernel: [ 794.908834] eth0: mtu:1500 data_len:33370 len before:0 len after:33370 truesize before:896 truesize after:34010 nr_frags:9 variant1:33114(34010) variant2:33114(34010) variant3:36864(37760)
B1) Here we have again a packet data_len: 5858 and truesize set to 6498, but nr_frags: 3 this time.
Dec 20 14:12:57 media kernel: [ 794.908992] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:3 variant1:5602(6498) variant2:5792(6688) variant3:12288(13184)
Dec 20 14:12:57 media kernel: [ 794.909012] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:8 variant1:28770(29666) variant2:28770(29666) variant3:32768(33664)
B2) That seems to end up in skb_try_coalesce, from->nr_frags is now 2 instead of 3, delta < LEN in this case, so it would have triggered the warn_on_once
Dec 20 14:12:57 media kernel: [ 794.909040] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:5602 len:5792 from->truesize:6498 skb_headlen(from):0 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2
Dec 20 14:12:57 media kernel: [ 794.909673] eth0: mtu:1500 data_len:1514 len before:0 len after:1514 truesize before:896 truesize after:2154 nr_frags:1 variant1:1258(2154) variant2:1258(2154) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.909692] eth0: mtu:1500 data_len:522 len before:0 len after:522 truesize before:896 truesize after:1162 nr_frags:1 variant1:266(1162) variant2:266(1162) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.909736] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:906 len:456 from->truesize:1162 skb_headlen(from):190 skb_shinfo(to)->nr_frags:2 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 794.910205] eth0: mtu:1500 data_len:36266 len before:0 len after:36266 truesize before:896 truesize after:36906 nr_frags:10 variant1:36010(36906) variant2:36010(36906) variant3:40960(41856)
Dec 20 14:12:57 media kernel: [ 794.910706] eth0: mtu:1500 data_len:37714 len before:0 len after:37714 truesize before:896 truesize after:38354 nr_frags:10 variant1:37458(38354) variant2:37458(38354) variant3:40960(41856)
Dec 20 14:12:57 media kernel: [ 794.911472] eth0: mtu:1500 data_len:27578 len before:0 len after:27578 truesize before:896 truesize after:28218 nr_frags:8 variant1:27322(28218) variant2:27322(28218) variant3:32768(33664)
Dec 20 14:12:57 media kernel: [ 794.911695] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28770(29666) variant3:36864(37760)
Dec 20 14:12:57 media kernel: [ 795.015511] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 795.015585] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:10 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 795.015641] eth0: mtu:1500 data_len:10202 len before:0 len after:10202 truesize before:896 truesize after:10842 nr_frags:4 variant1:9946(10842) variant2:9946(10842) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 795.015657] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
Dec 20 14:12:58 media kernel: [ 795.817824] net_ratelimit: 9 callbacks suppressed
--
Sander
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-20 12:51 ` Sander Eikelenboom
2012-12-20 14:23 ` [Xen-devel] " Sander Eikelenboom
@ 2012-12-20 15:39 ` Eric Dumazet
2012-12-21 11:21 ` Sander Eikelenboom
1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2012-12-20 15:39 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
On Thu, 2012-12-20 at 13:51 +0100, Sander Eikelenboom wrote:
> Eric:
> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
I would use the most exact value, which is :
skb->truesize += nr_frags * PAGE_SIZE;
Then, if we can spot later a regression in some stacks, adapt the
limiting parameters. I did a lot of work in GRO and TCP stack to reduce
the memory, and further changes are possible.
We really want to account memory, because we want to control how memory
is used on our machines and don't let some users use more than the
amount that was allowed to them.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-20 15:39 ` Eric Dumazet
@ 2012-12-21 11:21 ` Sander Eikelenboom
2012-12-21 18:33 ` Rick Jones
0 siblings, 1 reply; 22+ messages in thread
From: Sander Eikelenboom @ 2012-12-21 11:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
Thursday, December 20, 2012, 4:39:28 PM, you wrote:
> On Thu, 2012-12-20 at 13:51 +0100, Sander Eikelenboom wrote:
>> Eric:
>> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
> I would use the most exact value, which is :
> skb->truesize += nr_frags * PAGE_SIZE;
> Then, if we can spot later a regression in some stacks, adapt the
> limiting parameters. I did a lot of work in GRO and TCP stack to reduce
> the memory, and further changes are possible.
> We really want to account memory, because we want to control how memory
> is used on our machines and don't let some users use more than the
> amount that was allowed to them.
Hi Eric and Ian,
I have ran some perfnet tests (although i'm not an expert, so i'm not sure i have done the right tests).
I you have better tests, please do say so ..
"current" is with netfront as is (skb->truesize += skb->data_len - RX_COPY_THRESHOLD;)
"patched" is with IanC's latest patch (skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;)
Tested between domU and dom0 (bridged) on a system with only one guest. The results don't seem to differ very much.
+ netperf -H 192.168.1.1 -t TCP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 87380 16384 16384 60.00 954438.38
patched 87380 16384 16384 60.00 975236.19
+ netperf -H 192.168.1.1 -t TCP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 50
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 87380 2048 2048 60.00 17614.79
patched 87380 2048 2048 60.00 17207.46
+ netperf -H 192.168.1.1 -t TCP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 50 -M 1432 -m 1432
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 87380 2048 1432 60.00 35.28
patched 87380 2048 1432 60.00 35.28
+ netperf -H 192.168.1.1 -t TCP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 9000
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 87380 18000 18000 60.00 157762.45
patched 87380 18000 18000 60.00 158606.02
+ netperf -H 192.168.1.1 -t TCP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 9000 -M 1432 -m 1432
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 87380 18000 1432 60.00 78567.39
patched 87380 18000 1432 60.00 78329.98
+ netperf -H 192.168.1.1 -t UDP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 65507 60.00 248771 0 265238.24
current 212992 60.00 214267 228450.01
patched 212992 65507 60.00 251188 0 267814.90
patched 212992 60.00 235101 250662.67
+ netperf -H 192.168.1.1 -t UDP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 50
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 2048 2048 60.00 1329653 0 44321.73
current 212992 60.00 1329650 44321.62
patched 2048 2048 60.00 1363257 0 45441.68
patched 212992 60.00 1363253 45441.57
+ netperf -H 192.168.1.1 -t UDP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 50 -M 1432 -m 1432
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 2048 1432 60.00 1516249 0 35339.61
current 212992 60.00 1516247 35339.56
patched 2048 1432 60.00 1483705 0 34581.11
patched 212992 60.00 1483701 34581.01
+ netperf -H 192.168.1.1 -t UDP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 9000
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 18000 18000 60.00 540410 26 158322.98
current 212992 60.00 540349 158305.24
patched 18000 18000 60.00 555449 32 162728.98
patched 212992 60.00 555392 162712.28
+ netperf -H 192.168.1.1 -t UDP_STREAM -fK -i10,5 -l 60 -I95,5 -P1 -v2 -- -s 9000 -M 1432 -m 1432
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 18000 1432 60.00 5144189 0 119896.95
current 212992 60.00 5138354 119760.96
patched 18000 1432 60.00 5104540 0 118972.85
patched 212992 60.00 5099802 118862.44
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-21 11:21 ` Sander Eikelenboom
@ 2012-12-21 18:33 ` Rick Jones
2013-01-03 20:40 ` Sander Eikelenboom
0 siblings, 1 reply; 22+ messages in thread
From: Rick Jones @ 2012-12-21 18:33 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Eric Dumazet, Ian Campbell, netdev@vger.kernel.org,
Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
I'm guessing that trusize checks matter more on the "inbound" path than
the outbound path? If that is indeed the case, then instead of, or in
addition to using the -s option to set the local (netperf side) socket
buffer size, you should use a -S option to set the remote (netserver
side) socket buffer size.
happy benchmarking,
rick jones
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2012-12-21 18:33 ` Rick Jones
@ 2013-01-03 20:40 ` Sander Eikelenboom
2013-01-07 13:41 ` Ian Campbell
0 siblings, 1 reply; 22+ messages in thread
From: Sander Eikelenboom @ 2013-01-03 20:40 UTC (permalink / raw)
To: Rick Jones
Cc: Eric Dumazet, Ian Campbell, netdev@vger.kernel.org,
Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
Friday, December 21, 2012, 7:33:43 PM, you wrote:
> I'm guessing that trusize checks matter more on the "inbound" path than
> the outbound path? If that is indeed the case, then instead of, or in
> addition to using the -s option to set the local (netperf side) socket
> buffer size, you should use a -S option to set the remote (netserver
> side) socket buffer size.
> happy benchmarking,
> rick jones
OK, ran them with -S as well:
"current" is with netfront as is (skb->truesize += skb->data_len - RX_COPY_THRESHOLD;)
"patched" is with IanC's latest patch (skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;)
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 87380 16384 16384 60.00 963213.91
patched 87380 16384 16384 60.00 965904.01
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 2280 16384 16384 60.00 10029.44
patched 2280 16384 16384 60.00 10453.63
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 2280 16384 1432 60.00 10085.23
patched 2280 16384 1432 60.00 10089.98
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 18000 16384 16384 60.00 53644.35
patched 18000 16384 16384 60.00 51892.59
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 18000 16384 1432 60.00 37559.94
patched 18000 16384 1432 60.00 40630.66
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 28000 16384 16384 60.00 103766.68
patched 28000 16384 16384 60.00 93277.98
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. KBytes/sec
current 28000 16384 1432 60.00 73494.97
patched 28000 16384 1432 60.00 65312.94
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 65507 60.00 252087 0 268773.83
current 212992 60.00 237435 253151.28
patched 212992 65507 60.00 254997 0 271876.61
patched 212992 60.00 226636 241637.51
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 65507 60.00 252586 0 269305.73
current 2280 60.00 229371 244553.96
patched 212992 65507 60.00 256209 0 273168.32
patched 2280 60.00 201670 215019.54
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 1432 60.00 5474009 0 127583.97
current 2280 60.00 4282184 99805.83
patched 212992 1432 60.00 5521903 0 128700.40
patched 2280 60.00 4522227 105400.70
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 65507 60.00 251315 0 267950.39
current 18000 60.00 227856 242938.94
patched 212992 65507 60.00 256529 0 273509.57
patched 18000 60.00 224780 239658.47
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 1432 60.00 5484882 0 127837.39
current 18000 60.00 5458678 127226.64
patched 212992 1432 60.00 5585209 0 130175.81
patched 18000 60.00 5560244 129593.94
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 65507 60.00 250322 0 266891.35
current 28000 60.00 191711 204401.50
patched 212992 65507 60.00 257466 0 274508.96
patched 28000 60.00 241620 257613.85
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # KBytes/sec
current 212992 1432 60.00 5532780 0 128953.85
current 28000 60.00 5507094 128355.17
patched 212992 1432 60.00 5576824 0 129980.46
patched 28000 60.00 5555917 129493.16
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2013-01-03 20:40 ` Sander Eikelenboom
@ 2013-01-07 13:41 ` Ian Campbell
2013-01-07 14:11 ` Sander Eikelenboom
2013-01-07 16:17 ` Eric Dumazet
0 siblings, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2013-01-07 13:41 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Rick Jones, Eric Dumazet, netdev@vger.kernel.org,
Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
On Thu, 2013-01-03 at 20:40 +0000, Sander Eikelenboom wrote:
> Friday, December 21, 2012, 7:33:43 PM, you wrote:
>
> > I'm guessing that trusize checks matter more on the "inbound" path than
> > the outbound path? If that is indeed the case, then instead of, or in
> > addition to using the -s option to set the local (netperf side) socket
> > buffer size, you should use a -S option to set the remote (netserver
> > side) socket buffer size.
>
> > happy benchmarking,
>
> > rick jones
>
>
> OK, ran them with -S as well:
Are these all from domU -> dom0 ? Did you try traffic going the other
way?
> "current" is with netfront as is (skb->truesize += skb->data_len - RX_COPY_THRESHOLD;)
> "patched" is with IanC's latest patch (skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;)
skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to; is
probably more interesting to compare against since we know the current
one is buggy.
These number generally look good, largely +/- 1%, often in favour of the
updated code but these two stand out as worrying:
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. KBytes/sec
>
> current 18000 16384 1432 60.00 37559.94
> patched 18000 16384 1432 60.00 40630.66
>
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. KBytes/sec
>
> current 28000 16384 16384 60.00 103766.68
> patched 28000 16384 16384 60.00 93277.98
That's at least a 10% slow down in both cases.
> UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # KBytes/sec
>
> current 212992 65507 60.00 252586 0 269305.73
> current 2280 60.00 229371 244553.96
> patched 212992 65507 60.00 256209 0 273168.32
> patched 2280 60.00 201670 215019.54
The recv numbers here aren't too pleasing either.
However, given that this fixes a real issue which people are seeing I'd
be inclined to go with it, at least for now.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2013-01-07 13:41 ` Ian Campbell
@ 2013-01-07 14:11 ` Sander Eikelenboom
2013-01-07 14:22 ` Ian Campbell
2013-01-07 16:17 ` Eric Dumazet
1 sibling, 1 reply; 22+ messages in thread
From: Sander Eikelenboom @ 2013-01-07 14:11 UTC (permalink / raw)
To: Ian Campbell
Cc: Rick Jones, Eric Dumazet, netdev@vger.kernel.org,
Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
Monday, January 7, 2013, 2:41:03 PM, you wrote:
> On Thu, 2013-01-03 at 20:40 +0000, Sander Eikelenboom wrote:
>> Friday, December 21, 2012, 7:33:43 PM, you wrote:
>>
>> > I'm guessing that trusize checks matter more on the "inbound" path than
>> > the outbound path? If that is indeed the case, then instead of, or in
>> > addition to using the -s option to set the local (netperf side) socket
>> > buffer size, you should use a -S option to set the remote (netserver
>> > side) socket buffer size.
>>
>> > happy benchmarking,
>>
>> > rick jones
>>
>>
>> OK, ran them with -S as well:
> Are these all from domU -> dom0 ? Did you try traffic going the other
> way?
Yes running netperf in domU and netserver in dom0, but i must say i'm far from a netperf expert.
So i don't even know for sure if the tests i ran give a good picture.
>> "current" is with netfront as is (skb->truesize += skb->data_len - RX_COPY_THRESHOLD;)
>> "patched" is with IanC's latest patch (skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;)
skb->>truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to; is
> probably more interesting to compare against since we know the current
> one is buggy.
Will see if i can run against that as well, although i thought Eric said to prefer the "skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;"
> These number generally look good, largely +/- 1%, often in favour of the
> updated code but these two stand out as worrying:
>> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
>> Recv Send Send
>> Socket Socket Message Elapsed
>> Size Size Size Time Throughput
>> bytes bytes bytes secs. KBytes/sec
>>
>> current 18000 16384 1432 60.00 37559.94
>> patched 18000 16384 1432 60.00 40630.66
>>
>> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
>> Recv Send Send
>> Socket Socket Message Elapsed
>> Size Size Size Time Throughput
>> bytes bytes bytes secs. KBytes/sec
>>
>> current 28000 16384 16384 60.00 103766.68
>> patched 28000 16384 16384 60.00 93277.98
> That's at least a 10% slow down in both cases.
>> UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
>> Socket Message Elapsed Messages
>> Size Size Time Okay Errors Throughput
>> bytes bytes secs # # KBytes/sec
>>
>> current 212992 65507 60.00 252586 0 269305.73
>> current 2280 60.00 229371 244553.96
>> patched 212992 65507 60.00 256209 0 273168.32
>> patched 2280 60.00 201670 215019.54
> The recv numbers here aren't too pleasing either.
> However, given that this fixes a real issue which people are seeing I'd
> be inclined to go with it, at least for now.
> Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2013-01-07 14:11 ` Sander Eikelenboom
@ 2013-01-07 14:22 ` Ian Campbell
2013-01-07 15:26 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2013-01-07 14:22 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Rick Jones, Eric Dumazet, netdev@vger.kernel.org,
Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
On Mon, 2013-01-07 at 14:11 +0000, Sander Eikelenboom wrote:
> Monday, January 7, 2013, 2:41:03 PM, you wrote:
>
> > On Thu, 2013-01-03 at 20:40 +0000, Sander Eikelenboom wrote:
> >> Friday, December 21, 2012, 7:33:43 PM, you wrote:
> >>
> >> > I'm guessing that trusize checks matter more on the "inbound" path than
> >> > the outbound path? If that is indeed the case, then instead of, or in
> >> > addition to using the -s option to set the local (netperf side) socket
> >> > buffer size, you should use a -S option to set the remote (netserver
> >> > side) socket buffer size.
> >>
> >> > happy benchmarking,
> >>
> >> > rick jones
> >>
> >>
> >> OK, ran them with -S as well:
>
> > Are these all from domU -> dom0 ? Did you try traffic going the other
> > way?
>
> Yes running netperf in domU and netserver in dom0, but i must say i'm far from a netperf expert.
> So i don't even know for sure if the tests i ran give a good picture.
>
> >> "current" is with netfront as is (skb->truesize += skb->data_len - RX_COPY_THRESHOLD;)
> >> "patched" is with IanC's latest patch (skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;)
>
> skb->>truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to; is
> > probably more interesting to compare against since we know the current
> > one is buggy.
>
> Will see if i can run against that as well, although i thought Eric
> said to prefer the "skb->truesize += PAGE_SIZE *
> skb_shinfo(skb)->nr_frags;"
Right, I meant to compare "PAGE_SIZE * skb_shinfo(skb)->nr_frags" vs
"skb->data_len - NETFRONT_SKB_CB(skb)->pull_to". TBH I trust Eric so I'm
inclined to just go with what he suggests.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2013-01-07 14:22 ` Ian Campbell
@ 2013-01-07 15:26 ` Eric Dumazet
0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-07 15:26 UTC (permalink / raw)
To: Ian Campbell
Cc: Sander Eikelenboom, Rick Jones, netdev@vger.kernel.org,
Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
On Mon, 2013-01-07 at 14:22 +0000, Ian Campbell wrote:
> On Mon, 2013-01-07 at 14:11 +0000, Sander Eikelenboom wrote:
> one is buggy.
> >
> > Will see if i can run against that as well, although i thought Eric
> > said to prefer the "skb->truesize += PAGE_SIZE *
> > skb_shinfo(skb)->nr_frags;"
>
> Right, I meant to compare "PAGE_SIZE * skb_shinfo(skb)->nr_frags" vs
> "skb->data_len - NETFRONT_SKB_CB(skb)->pull_to". TBH I trust Eric so I'm
> inclined to just go with what he suggests.
This is the more precise estimation, and it appears to not bring obvious
performance regression, so I definitely would favor this one ;)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xen/netfront: improve truesize tracking
2013-01-07 13:41 ` Ian Campbell
2013-01-07 14:11 ` Sander Eikelenboom
@ 2013-01-07 16:17 ` Eric Dumazet
1 sibling, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2013-01-07 16:17 UTC (permalink / raw)
To: Ian Campbell
Cc: Sander Eikelenboom, Rick Jones, netdev@vger.kernel.org,
Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
On Mon, 2013-01-07 at 13:41 +0000, Ian Campbell wrote:
> > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 (192.168.1.1) port 0 AF_INET : +/-2.500% @ 95% conf. : demo
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # KBytes/sec
> >
> > current 212992 65507 60.00 252586 0 269305.73
> > current 2280 60.00 229371 244553.96
> > patched 212992 65507 60.00 256209 0 273168.32
> > patched 2280 60.00 201670 215019.54
>
> The recv numbers here aren't too pleasing either.
The number of packets that can be queued into UDP socket depends on
sk->sk_rcvbuf (SO_RCVBUF) and skb truesize.
So what we notice here are packet drops (netstat -s would give us the
total counters)
To absorb a burst of incoming messages, an application would have to set
an appropriate receive buffer.
In this case, RCVBUF value was set to a very minimum, basically not
allowing more than one queued packet.
TCP stack has a 'collapse' mode, which basically converts skbs in
receive queue (or ofo queue) to better filled ones (skb->len very close
to skb->truesize) when memory limits are about to be hit.
Its very expensive, as it adds one more copy stage, but it happens only
in rare circumstances. Of course, when a driver uses one page of 4096
bytes to store one 1514 bytes ethernet frame, it can happen more often.
netstat -s | grep collap
25292 packets collapsed in receive queue due to low socket buffer
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2013-01-07 16:17 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-18 13:51 [PATCH] xen/netfront: improve truesize tracking Ian Campbell
2012-12-18 14:10 ` Konrad Rzeszutek Wilk
2012-12-18 14:15 ` Ian Campbell
2012-12-18 15:12 ` Eric Dumazet
2012-12-18 15:26 ` Ian Campbell
2012-12-18 16:13 ` Eric Dumazet
2012-12-18 16:22 ` Ian Campbell
2012-12-18 16:35 ` Eric Dumazet
2012-12-19 11:34 ` Sander Eikelenboom
2012-12-19 16:17 ` Eric Dumazet
2012-12-20 12:51 ` Sander Eikelenboom
2012-12-20 14:23 ` [Xen-devel] " Sander Eikelenboom
2012-12-20 14:58 ` Sander Eikelenboom
2012-12-20 15:39 ` Eric Dumazet
2012-12-21 11:21 ` Sander Eikelenboom
2012-12-21 18:33 ` Rick Jones
2013-01-03 20:40 ` Sander Eikelenboom
2013-01-07 13:41 ` Ian Campbell
2013-01-07 14:11 ` Sander Eikelenboom
2013-01-07 14:22 ` Ian Campbell
2013-01-07 15:26 ` Eric Dumazet
2013-01-07 16:17 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).