Increased multicast packet drops in 3.4

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* Increased multicast packet drops in 3.4
@ 2012-09-06  0:11 Shawn Bohrer
  2012-09-06  6:07 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Shawn Bohrer @ 2012-09-06  0:11 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet

I've been testing the 3.4 kernel compared to the 3.1 kernel and
noticed my application is experiencing a noticeable increase in packet
drops compared to 3.1.  In this case I have 8 processes all listening
on the same multicast group and occasionally 1 or more of the
processes will report drops based on gaps in the sequence numbers on
the packets.  One thing I find interesting is that some of the time 2
or 3 of the 8 processes will report that they missed the exact same
50+ packets.  Since the other processes receive the packets I know
that they are making it to the machine and past the driver.

So far I have not been able to _see_ any OS counters increase when the
drops occur but perhaps there is a location that I have not yet
looked.  I've been looking for drops in /proc/net/udp /proc/net/snmp
and /proc/net/dev.

I've tried using dropwatch/drop_monitor but it is awfully noisy even
after back porting many of the patches Eric Dumazet has contributed to
silence the false positives.  Similarly I setup trace-cmd/ftrace to
record skb:kfree_skb calls with a stacktrace and had my application
stop the trace when a drop was reported.  From these traces I see a
number of the following:

    md_connector-12791 [014]  7952.982818: kfree_skb:            skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
    md_connector-12791 [014]  7952.982821: kernel_stack:         <stack trace>
=> skb_release_data (ffffffff813c930b)
=> __kfree_skb (ffffffff813c934e)
=> skb_free_datagram_locked (ffffffff813ccca8)
=> udp_recvmsg (ffffffff8143335c)
=> inet_recvmsg (ffffffff8143cbfb)
=> sock_recvmsg_nosec (ffffffff813be80f)
=> __sys_recvmsg (ffffffff813bfe70)
=> __sys_recvmmsg (ffffffff813c2392)
=> sys_recvmmsg (ffffffff813c25b0)
=> system_call_fastpath (ffffffff8148cfd2)

Looking at the code it does look like these could be the drops, since
I do not see any counters incremented in this code path.  However I'm
not very familiar with this code so it could also be a false positive.
It does look like the above stack only gets called if
skb_has_frag_list(skb) does this imply the packet was over one MTU
(1500)?

I'd appreciate any input on possible causes/solutions for these drops.
Or ways that I can further debug this issue to find the root cause of
the increase in drops on 3.4.

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-06  0:11 Increased multicast packet drops in 3.4 Shawn Bohrer
@ 2012-09-06  6:07 ` Eric Dumazet
  2012-09-06  6:22 ` Eric Dumazet
  2012-09-06  6:26 ` Eric Dumazet
  2 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2012-09-06  6:07 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev

On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> I've been testing the 3.4 kernel compared to the 3.1 kernel and
> noticed my application is experiencing a noticeable increase in packet
> drops compared to 3.1.  In this case I have 8 processes all listening
> on the same multicast group and occasionally 1 or more of the
> processes will report drops based on gaps in the sequence numbers on
> the packets.  One thing I find interesting is that some of the time 2
> or 3 of the 8 processes will report that they missed the exact same
> 50+ packets.  Since the other processes receive the packets I know
> that they are making it to the machine and past the driver.
> 
> So far I have not been able to _see_ any OS counters increase when the
> drops occur but perhaps there is a location that I have not yet
> looked.  I've been looking for drops in /proc/net/udp /proc/net/snmp
> and /proc/net/dev.
> 
> I've tried using dropwatch/drop_monitor but it is awfully noisy even
> after back porting many of the patches Eric Dumazet has contributed to
> silence the false positives.  Similarly I setup trace-cmd/ftrace to
> record skb:kfree_skb calls with a stacktrace and had my application
> stop the trace when a drop was reported.  From these traces I see a
> number of the following:
> 
>     md_connector-12791 [014]  7952.982818: kfree_skb:            skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
>     md_connector-12791 [014]  7952.982821: kernel_stack:         <stack trace>
> => skb_release_data (ffffffff813c930b)
> => __kfree_skb (ffffffff813c934e)
> => skb_free_datagram_locked (ffffffff813ccca8)
> => udp_recvmsg (ffffffff8143335c)
> => inet_recvmsg (ffffffff8143cbfb)
> => sock_recvmsg_nosec (ffffffff813be80f)
> => __sys_recvmsg (ffffffff813bfe70)
> => __sys_recvmmsg (ffffffff813c2392)
> => sys_recvmmsg (ffffffff813c25b0)
> => system_call_fastpath (ffffffff8148cfd2)
> 
> Looking at the code it does look like these could be the drops, since
> I do not see any counters incremented in this code path.  However I'm
> not very familiar with this code so it could also be a false positive.
> It does look like the above stack only gets called if
> skb_has_frag_list(skb) does this imply the packet was over one MTU
> (1500)?
> 
> I'd appreciate any input on possible causes/solutions for these drops.
> Or ways that I can further debug this issue to find the root cause of
> the increase in drops on 3.4.
> 
> Thanks,
> Shawn
> 

Thanks Shawn for this excellent report.

I am taking a look right now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-06  0:11 Increased multicast packet drops in 3.4 Shawn Bohrer
  2012-09-06  6:07 ` Eric Dumazet
@ 2012-09-06  6:22 ` Eric Dumazet
  2012-09-06 13:03   ` Shawn Bohrer
  2012-09-06  6:26 ` Eric Dumazet
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-09-06  6:22 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev

On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> I've been testing the 3.4 kernel compared to the 3.1 kernel and
> noticed my application is experiencing a noticeable increase in packet
> drops compared to 3.1.  In this case I have 8 processes all listening
> on the same multicast group and occasionally 1 or more of the
> processes will report drops based on gaps in the sequence numbers on
> the packets.  One thing I find interesting is that some of the time 2
> or 3 of the 8 processes will report that they missed the exact same
> 50+ packets.  Since the other processes receive the packets I know
> that they are making it to the machine and past the driver.
> 
> So far I have not been able to _see_ any OS counters increase when the
> drops occur but perhaps there is a location that I have not yet
> looked.  I've been looking for drops in /proc/net/udp /proc/net/snmp
> and /proc/net/dev.
> 
> I've tried using dropwatch/drop_monitor but it is awfully noisy even
> after back porting many of the patches Eric Dumazet has contributed to
> silence the false positives.  Similarly I setup trace-cmd/ftrace to
> record skb:kfree_skb calls with a stacktrace and had my application
> stop the trace when a drop was reported.  From these traces I see a
> number of the following:
> 
>     md_connector-12791 [014]  7952.982818: kfree_skb:            skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
>     md_connector-12791 [014]  7952.982821: kernel_stack:         <stack trace>
> => skb_release_data (ffffffff813c930b)
> => __kfree_skb (ffffffff813c934e)
> => skb_free_datagram_locked (ffffffff813ccca8)
> => udp_recvmsg (ffffffff8143335c)
> => inet_recvmsg (ffffffff8143cbfb)
> => sock_recvmsg_nosec (ffffffff813be80f)
> => __sys_recvmsg (ffffffff813bfe70)
> => __sys_recvmmsg (ffffffff813c2392)
> => sys_recvmmsg (ffffffff813c25b0)
> => system_call_fastpath (ffffffff8148cfd2)
> 
> Looking at the code it does look like these could be the drops, since
> I do not see any counters incremented in this code path.  However I'm
> not very familiar with this code so it could also be a false positive.
> It does look like the above stack only gets called if
> skb_has_frag_list(skb) does this imply the packet was over one MTU
> (1500)?
> 
> I'd appreciate any input on possible causes/solutions for these drops.
> Or ways that I can further debug this issue to find the root cause of
> the increase in drops on 3.4.
> 
> Thanks,
> Shawn
> 

What NIC driver are you using ?

Could you trace if skb_copy_and_csum_datagram_iovec() or
skb_copy_datagram_iovec() returns an error (it could be EFAULT by
example) ?

If so, you could add some debugging to these functions to track what
exact error it is

It seems following patch is needed anyway :

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6f6d1ac..2c965c9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1226,6 +1226,8 @@ try_again:
 
 	if (unlikely(err)) {
 		trace_kfree_skb(skb, udp_recvmsg);
+		if (!peeked)
+			UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
 		goto out_free;
 	}
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-06  0:11 Increased multicast packet drops in 3.4 Shawn Bohrer
  2012-09-06  6:07 ` Eric Dumazet
  2012-09-06  6:22 ` Eric Dumazet
@ 2012-09-06  6:26 ` Eric Dumazet
  2 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2012-09-06  6:26 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev

On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> One thing I find interesting is that some of the time 2
> or 3 of the 8 processes will report that they missed the exact same
> 50+ packets.  Since the other processes receive the packets I know
> that they are making it to the machine and past the driver.

Could it be that these processes missing 50+ packets provide a non
accessible area of memory and these recvmsg() calls return EFAULT ?

Are you using special memory areas, using mprotect(), or huge pages for
example ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-06  6:22 ` Eric Dumazet
@ 2012-09-06 13:03   ` Shawn Bohrer
  2012-09-06 13:21     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Shawn Bohrer @ 2012-09-06 13:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Thu, Sep 06, 2012 at 08:22:40AM +0200, Eric Dumazet wrote:
> On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> > I've been testing the 3.4 kernel compared to the 3.1 kernel and
> > noticed my application is experiencing a noticeable increase in packet
> > drops compared to 3.1.  In this case I have 8 processes all listening
> > on the same multicast group and occasionally 1 or more of the
> > processes will report drops based on gaps in the sequence numbers on
> > the packets.  One thing I find interesting is that some of the time 2
> > or 3 of the 8 processes will report that they missed the exact same
> > 50+ packets.  Since the other processes receive the packets I know
> > that they are making it to the machine and past the driver.
> > 
> > So far I have not been able to _see_ any OS counters increase when the
> > drops occur but perhaps there is a location that I have not yet
> > looked.  I've been looking for drops in /proc/net/udp /proc/net/snmp
> > and /proc/net/dev.
> > 
> > I've tried using dropwatch/drop_monitor but it is awfully noisy even
> > after back porting many of the patches Eric Dumazet has contributed to
> > silence the false positives.  Similarly I setup trace-cmd/ftrace to
> > record skb:kfree_skb calls with a stacktrace and had my application
> > stop the trace when a drop was reported.  From these traces I see a
> > number of the following:
> > 
> >     md_connector-12791 [014]  7952.982818: kfree_skb:            skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
> >     md_connector-12791 [014]  7952.982821: kernel_stack:         <stack trace>
> > => skb_release_data (ffffffff813c930b)
> > => __kfree_skb (ffffffff813c934e)
> > => skb_free_datagram_locked (ffffffff813ccca8)
> > => udp_recvmsg (ffffffff8143335c)
> > => inet_recvmsg (ffffffff8143cbfb)
> > => sock_recvmsg_nosec (ffffffff813be80f)
> > => __sys_recvmsg (ffffffff813bfe70)
> > => __sys_recvmmsg (ffffffff813c2392)
> > => sys_recvmmsg (ffffffff813c25b0)
> > => system_call_fastpath (ffffffff8148cfd2)
> > 
> > Looking at the code it does look like these could be the drops, since
> > I do not see any counters incremented in this code path.  However I'm
> > not very familiar with this code so it could also be a false positive.
> > It does look like the above stack only gets called if
> > skb_has_frag_list(skb) does this imply the packet was over one MTU
> > (1500)?
> > 
> > I'd appreciate any input on possible causes/solutions for these drops.
> > Or ways that I can further debug this issue to find the root cause of
> > the increase in drops on 3.4.
> > 
> > Thanks,
> > Shawn
> > 
> 
> What NIC driver are you using ?
 
$ sudo ethtool -i eth4
driver: mlx4_en
version: 2.0 (Dec 2011)
firmware-version: 2.10.700
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no

This is the in tree driver from 3.4.9

[ sbohrer@berbox12:/home/sbohrer ]
$ /sbin/lspci | grep -i mell
05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

> Could you trace if skb_copy_and_csum_datagram_iovec() or
> skb_copy_datagram_iovec() returns an error (it could be EFAULT by
> example) ?
> 
> If so, you could add some debugging to these functions to track what
> exact error it is
> 
> It seems following patch is needed anyway :
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 6f6d1ac..2c965c9 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1226,6 +1226,8 @@ try_again:
>  
>  	if (unlikely(err)) {
>  		trace_kfree_skb(skb, udp_recvmsg);
> +		if (!peeked)
> +			UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
>  		goto out_free;
>  	}

Sorry, I should have mentioned that it doesn't appear I'm hitting that
tracepoint.  That tracepoint would have a location=udp_recvmsg and I
believe the stack trace would also start at udp_recvmsg.  I didn't see
any of these in the traces I captured.

I think the one I'm hitting is the following with some of my own extra
annotation:

=> kfree_skb()
=> skb_drop_list()
=> skb_drop_fraglist()
> > => skb_release_data (ffffffff813c930b)
=> skb_release_all()
> > => __kfree_skb (ffffffff813c934e)
> > => skb_free_datagram_locked (ffffffff813ccca8)
> > => udp_recvmsg (ffffffff8143335c)
> > => inet_recvmsg (ffffffff8143cbfb)
> > => sock_recvmsg_nosec (ffffffff813be80f)
> > => __sys_recvmsg (ffffffff813bfe70)
> > => __sys_recvmmsg (ffffffff813c2392)
> > => sys_recvmmsg (ffffffff813c25b0)
> > => system_call_fastpath (ffffffff8148cfd2)

kfree_skb() has the trace_kfree_skb() call on net/core/skbuff.c:3283

I can of course still try your patch and double check that I'm not
hitting that one.

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-06 13:03   ` Shawn Bohrer
@ 2012-09-06 13:21     ` Eric Dumazet
  2012-09-06 13:31       ` Eric Dumazet
  2012-09-07  4:00       ` Shawn Bohrer
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2012-09-06 13:21 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev

On Thu, 2012-09-06 at 08:03 -0500, Shawn Bohrer wrote:
> On Thu, Sep 06, 2012 at 08:22:40AM +0200, Eric Dumazet wrote:
> > On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> > > I've been testing the 3.4 kernel compared to the 3.1 kernel and
> > > noticed my application is experiencing a noticeable increase in packet
> > > drops compared to 3.1.  In this case I have 8 processes all listening
> > > on the same multicast group and occasionally 1 or more of the
> > > processes will report drops based on gaps in the sequence numbers on
> > > the packets.  One thing I find interesting is that some of the time 2
> > > or 3 of the 8 processes will report that they missed the exact same
> > > 50+ packets.  Since the other processes receive the packets I know
> > > that they are making it to the machine and past the driver.
> > > 
> > > So far I have not been able to _see_ any OS counters increase when the
> > > drops occur but perhaps there is a location that I have not yet
> > > looked.  I've been looking for drops in /proc/net/udp /proc/net/snmp
> > > and /proc/net/dev.
> > > 
> > > I've tried using dropwatch/drop_monitor but it is awfully noisy even
> > > after back porting many of the patches Eric Dumazet has contributed to
> > > silence the false positives.  Similarly I setup trace-cmd/ftrace to
> > > record skb:kfree_skb calls with a stacktrace and had my application
> > > stop the trace when a drop was reported.  From these traces I see a
> > > number of the following:
> > > 
> > >     md_connector-12791 [014]  7952.982818: kfree_skb:            skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
> > >     md_connector-12791 [014]  7952.982821: kernel_stack:         <stack trace>
> > > => skb_release_data (ffffffff813c930b)
> > > => __kfree_skb (ffffffff813c934e)
> > > => skb_free_datagram_locked (ffffffff813ccca8)
> > > => udp_recvmsg (ffffffff8143335c)
> > > => inet_recvmsg (ffffffff8143cbfb)
> > > => sock_recvmsg_nosec (ffffffff813be80f)
> > > => __sys_recvmsg (ffffffff813bfe70)
> > > => __sys_recvmmsg (ffffffff813c2392)
> > > => sys_recvmmsg (ffffffff813c25b0)
> > > => system_call_fastpath (ffffffff8148cfd2)
> > > 
> > > Looking at the code it does look like these could be the drops, since
> > > I do not see any counters incremented in this code path.  However I'm
> > > not very familiar with this code so it could also be a false positive.
> > > It does look like the above stack only gets called if
> > > skb_has_frag_list(skb) does this imply the packet was over one MTU
> > > (1500)?
> > > 
> > > I'd appreciate any input on possible causes/solutions for these drops.
> > > Or ways that I can further debug this issue to find the root cause of
> > > the increase in drops on 3.4.
> > > 
> > > Thanks,
> > > Shawn
> > > 
> > 
> > What NIC driver are you using ?
>  
> $ sudo ethtool -i eth4
> driver: mlx4_en
> version: 2.0 (Dec 2011)
> firmware-version: 2.10.700
> bus-info: 0000:05:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> 
> This is the in tree driver from 3.4.9
> 
> [ sbohrer@berbox12:/home/sbohrer ]
> $ /sbin/lspci | grep -i mell
> 05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> 
> > Could you trace if skb_copy_and_csum_datagram_iovec() or
> > skb_copy_datagram_iovec() returns an error (it could be EFAULT by
> > example) ?
> > 
> > If so, you could add some debugging to these functions to track what
> > exact error it is
> > 
> > It seems following patch is needed anyway :
> > 
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 6f6d1ac..2c965c9 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1226,6 +1226,8 @@ try_again:
> >  
> >  	if (unlikely(err)) {
> >  		trace_kfree_skb(skb, udp_recvmsg);
> > +		if (!peeked)
> > +			UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
> >  		goto out_free;
> >  	}
> 
> Sorry, I should have mentioned that it doesn't appear I'm hitting that
> tracepoint.  That tracepoint would have a location=udp_recvmsg and I
> believe the stack trace would also start at udp_recvmsg.  I didn't see
> any of these in the traces I captured.
> 
> I think the one I'm hitting is the following with some of my own extra
> annotation:
> 
> => kfree_skb()
> => skb_drop_list()
> => skb_drop_fraglist()
> > > => skb_release_data (ffffffff813c930b)
> => skb_release_all()
> > > => __kfree_skb (ffffffff813c934e)
> > > => skb_free_datagram_locked (ffffffff813ccca8)
> > > => udp_recvmsg (ffffffff8143335c)
> > > => inet_recvmsg (ffffffff8143cbfb)
> > > => sock_recvmsg_nosec (ffffffff813be80f)
> > > => __sys_recvmsg (ffffffff813bfe70)
> > > => __sys_recvmmsg (ffffffff813c2392)
> > > => sys_recvmmsg (ffffffff813c25b0)
> > > => system_call_fastpath (ffffffff8148cfd2)
> 
> kfree_skb() has the trace_kfree_skb() call on net/core/skbuff.c:3283
> 
> I can of course still try your patch and double check that I'm not
> hitting that one.


kfree_skb() can free a list of skb, and we use a generic function to do
so, without forwarding the drop/notdrop status. So its unfortunate, but
adding extra parameters just for the sake of drop_monitor is not worth
it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()

Are you receiving fragmented UDP frames ?

I ask this because with latest kernels (linux-3.5), we should no longer
build a list of skb, but a single skb with page fragments.

commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat May 19 03:02:20 2012 +0000

    ipv4: use skb coalescing in defragmentation
    
    ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
    reducing memory used by them (truesize), and reducing number of cache
    line misses and overhead for the consumer.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Alexander Duyck <alexander.h.duyck@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-06 13:21     ` Eric Dumazet
@ 2012-09-06 13:31       ` Eric Dumazet
  2012-09-07  4:00       ` Shawn Bohrer
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2012-09-06 13:31 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev

On Thu, 2012-09-06 at 15:21 +0200, Eric Dumazet wrote:

> 
> Are you receiving fragmented UDP frames ?
> 
> I ask this because with latest kernels (linux-3.5), we should no longer
> build a list of skb, but a single skb with page fragments.
> 
> commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Sat May 19 03:02:20 2012 +0000
> 
>     ipv4: use skb coalescing in defragmentation
>     
>     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
>     reducing memory used by them (truesize), and reducing number of cache
>     line misses and overhead for the consumer.
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 

Unfortunately mlx4 pulls too many bytes from the frame to skb->head, so
it defeats coalescing completely.

Try following patch (if you also try linux-3.5)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 9d27e42..700e70e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -150,7 +150,7 @@ enum {
 #define ETH_LLC_SNAP_SIZE	8
 
 #define SMALL_PACKET_SIZE      (256 - NET_IP_ALIGN)
-#define HEADER_COPY_SIZE       (128 - NET_IP_ALIGN)
+#define HEADER_COPY_SIZE       ETH_HLEN
 #define MLX4_LOOPBACK_TEST_PAYLOAD (HEADER_COPY_SIZE - ETH_HLEN)
 
 #define MLX4_EN_MIN_MTU		46

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-06 13:21     ` Eric Dumazet
  2012-09-06 13:31       ` Eric Dumazet
@ 2012-09-07  4:00       ` Shawn Bohrer
  2012-09-07  6:08         ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Shawn Bohrer @ 2012-09-07  4:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> kfree_skb() can free a list of skb, and we use a generic function to do
> so, without forwarding the drop/notdrop status. So its unfortunate, but
> adding extra parameters just for the sake of drop_monitor is not worth
> it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()

I understand that this means that dropwatch or the skb:kfree_skb
tracepoint won't know if the packet was really dropped, but do we
know in this case from the context of the stack trace?  I'm assuming
since we didn't receive an error that the packet was delivered and
these aren't real drops.

> Are you receiving fragmented UDP frames ?

I looked at the sending application and it yes it is possible it is
sending fragmented frames.

> I ask this because with latest kernels (linux-3.5), we should no longer
> build a list of skb, but a single skb with page fragments.
> 
> commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Sat May 19 03:02:20 2012 +0000
> 
>     ipv4: use skb coalescing in defragmentation
>     
>     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
>     reducing memory used by them (truesize), and reducing number of cache
>     line misses and overhead for the consumer.
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

I'll have to give 3.5 a try tomorrow and see if it has the same
problem.  After backporting all of your patches to convert kfree_skb()
to consume_skb() to 3.4 I actually don't have that many different
places hitting the skb:kfree_skb tracepoint at the time of the drop.
Here are some of the ones I have left that might be relevant.

          <idle>-0     [001] 11933.738797: kfree_skb:            skbaddr=0xffff8805ebcf9500 protocol=2048 location=0xffffffff81404e33
          <idle>-0     [001] 11933.738801: kernel_stack:         <stack trace>
=> ip_rcv (ffffffff81404e33)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

My IPSTATS_MIB_INHDRERRORS, IPSTATS_MIB_INDISCARDS, and
IPSTATS_MIB_INTRUNCATEDPKTS counters are all 0 so maybe this is from
skb->pkt_type == PACKET_OTHERHOST?

          <idle>-0     [001] 11933.937378: kfree_skb:            skbaddr=0xffff8805ebcf8c00 protocol=2048 location=0xffffffff81404660
          <idle>-0     [001] 11933.937385: kernel_stack:         <stack trace>
=> ip_rcv_finish (ffffffff81404660)
=> ip_rcv (ffffffff81404f61)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

I see two places here that I might be hitting that don't increment any
counters.  I can try instrumenting these to see which one I hit.

          <idle>-0     [003] 11932.454375: kfree_skb:            skbaddr=0xffff880584843700 protocol=4 location=0xffffffffa00492d4
          <idle>-0     [003] 11932.454382: kernel_stack:         <stack trace>
=> llc_rcv (ffffffffa00492d4)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

This is protocol=4 so I don't know if it is really relevant but then
again I don't know what this is.

          <idle>-0     [003] 11914.266635: kfree_skb:            skbaddr=0xffff880584843b00 protocol=2048 location=0xffffffff8143bfa8
          <idle>-0     [003] 11914.266641: kernel_stack:         <stack trace>
=> igmp_rcv (ffffffff8143bfa8)
=> ip_local_deliver_finish (ffffffff814049ed)
=> ip_local_deliver (ffffffff81404d1a)
=> ip_rcv_finish (ffffffff814046ad)
=> ip_rcv (ffffffff81404f61)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> mlx4_en_process_rx_cq (ffffffffa010a4fe)
=> mlx4_en_poll_rx_cq (ffffffffa010a9ef)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

Also don't know if this one is relevant.  This looks like an igmp
packet so probably not my drop, but I am receiving multicast packets
in this case so maybe it is somehow related.

If any of these spark any ideas let me know otherwise I'll keep
digging and try 3.5/3.6 tomorrow.

Thanks,
Shawn


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-07  4:00       ` Shawn Bohrer
@ 2012-09-07  6:08         ` Eric Dumazet
  2012-09-07 22:38           ` Shawn Bohrer
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-09-07  6:08 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev

On Thu, 2012-09-06 at 23:00 -0500, Shawn Bohrer wrote:
> On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> > kfree_skb() can free a list of skb, and we use a generic function to do
> > so, without forwarding the drop/notdrop status. So its unfortunate, but
> > adding extra parameters just for the sake of drop_monitor is not worth
> > it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> > only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()
> 
> I understand that this means that dropwatch or the skb:kfree_skb
> tracepoint won't know if the packet was really dropped, but do we
> know in this case from the context of the stack trace?  I'm assuming
> since we didn't receive an error that the packet was delivered and
> these aren't real drops.

I am starting to believe this is an application error.

This application uses recvmmsg() to fetch a lot of messages in one
syscall, and it might well be it throws out a batch of 50+ messages
because of an application bug. Yes, this starts with 3.4, but it can b
triggered by a timing difference or something that is not a proper
kernel bug...

> 
> > Are you receiving fragmented UDP frames ?
> 
> I looked at the sending application and it yes it is possible it is
> sending fragmented frames.
> 
> > I ask this because with latest kernels (linux-3.5), we should no longer
> > build a list of skb, but a single skb with page fragments.
> > 
> > commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Sat May 19 03:02:20 2012 +0000
> > 
> >     ipv4: use skb coalescing in defragmentation
> >     
> >     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
> >     reducing memory used by them (truesize), and reducing number of cache
> >     line misses and overhead for the consumer.
> >     
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I'll have to give 3.5 a try tomorrow and see if it has the same
> problem.  After backporting all of your patches to convert kfree_skb()
> to consume_skb() to 3.4 I actually don't have that many different
> places hitting the skb:kfree_skb tracepoint at the time of the drop.
> Here are some of the ones I have left that might be relevant.
> 
>           <idle>-0     [001] 11933.738797: kfree_skb:            skbaddr=0xffff8805ebcf9500 protocol=2048 location=0xffffffff81404e33
>           <idle>-0     [001] 11933.738801: kernel_stack:         <stack trace>
> => ip_rcv (ffffffff81404e33)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> My IPSTATS_MIB_INHDRERRORS, IPSTATS_MIB_INDISCARDS, and
> IPSTATS_MIB_INTRUNCATEDPKTS counters are all 0 so maybe this is from
> skb->pkt_type == PACKET_OTHERHOST?
> 
>           <idle>-0     [001] 11933.937378: kfree_skb:            skbaddr=0xffff8805ebcf8c00 protocol=2048 location=0xffffffff81404660
>           <idle>-0     [001] 11933.937385: kernel_stack:         <stack trace>
> => ip_rcv_finish (ffffffff81404660)
> => ip_rcv (ffffffff81404f61)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> I see two places here that I might be hitting that don't increment any
> counters.  I can try instrumenting these to see which one I hit.
> 
>           <idle>-0     [003] 11932.454375: kfree_skb:            skbaddr=0xffff880584843700 protocol=4 location=0xffffffffa00492d4
>           <idle>-0     [003] 11932.454382: kernel_stack:         <stack trace>
> => llc_rcv (ffffffffa00492d4)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> This is protocol=4 so I don't know if it is really relevant but then
> again I don't know what this is.

You can ignore this

> 
>           <idle>-0     [003] 11914.266635: kfree_skb:            skbaddr=0xffff880584843b00 protocol=2048 location=0xffffffff8143bfa8
>           <idle>-0     [003] 11914.266641: kernel_stack:         <stack trace>
> => igmp_rcv (ffffffff8143bfa8)
> => ip_local_deliver_finish (ffffffff814049ed)
> => ip_local_deliver (ffffffff81404d1a)
> => ip_rcv_finish (ffffffff814046ad)
> => ip_rcv (ffffffff81404f61)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => mlx4_en_process_rx_cq (ffffffffa010a4fe)
> => mlx4_en_poll_rx_cq (ffffffffa010a9ef)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> Also don't know if this one is relevant.  This looks like an igmp
> packet so probably not my drop, but I am receiving multicast packets
> in this case so maybe it is somehow related.

Yes, we need to change igmp to call consume_skb() for all frames that
were properly handled.

So you can ignore this as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Increased multicast packet drops in 3.4
  2012-09-07  6:08         ` Eric Dumazet
@ 2012-09-07 22:38           ` Shawn Bohrer
  0 siblings, 0 replies; 10+ messages in thread
From: Shawn Bohrer @ 2012-09-07 22:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Fri, Sep 07, 2012 at 08:08:45AM +0200, Eric Dumazet wrote:
> On Thu, 2012-09-06 at 23:00 -0500, Shawn Bohrer wrote:
> > On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> > > kfree_skb() can free a list of skb, and we use a generic function to do
> > > so, without forwarding the drop/notdrop status. So its unfortunate, but
> > > adding extra parameters just for the sake of drop_monitor is not worth
> > > it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> > > only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()
> > 
> > I understand that this means that dropwatch or the skb:kfree_skb
> > tracepoint won't know if the packet was really dropped, but do we
> > know in this case from the context of the stack trace?  I'm assuming
> > since we didn't receive an error that the packet was delivered and
> > these aren't real drops.
> 
> I am starting to believe this is an application error.
> 
> This application uses recvmmsg() to fetch a lot of messages in one
> syscall, and it might well be it throws out a batch of 50+ messages
> because of an application bug. Yes, this starts with 3.4, but it can b
> triggered by a timing difference or something that is not a proper
> kernel bug...

Eric, you are absolutely correct.  There is at least one bug in the
application.  The code that re-orders out of order packets would give
up around the 50+ point and assume the packets in between were
dropped.  I did prove that if we keep reading from the socket we do
receive those packets.  So no packets are being dropped in the kernel.
I also proved I this is happening on 3.1 as well but 3.4 does trigger
it more often.

I'm still debugging the application because it appears I'm getting
very large batches of packets out of order.  It isn't clear to me if
this is another application bug, the kernel, the switch or something
else.

Thanks for all of your help looking into this (non)-issue.  If I have
further questions about the kernel side I'll let you know.

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-09-07 22:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-06  0:11 Increased multicast packet drops in 3.4 Shawn Bohrer
2012-09-06  6:07 ` Eric Dumazet
2012-09-06  6:22 ` Eric Dumazet
2012-09-06 13:03   ` Shawn Bohrer
2012-09-06 13:21     ` Eric Dumazet
2012-09-06 13:31       ` Eric Dumazet
2012-09-07  4:00       ` Shawn Bohrer
2012-09-07  6:08         ` Eric Dumazet
2012-09-07 22:38           ` Shawn Bohrer
2012-09-06  6:26 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox