From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Increased multicast packet drops in 3.4 Date: Fri, 07 Sep 2012 08:08:45 +0200 Message-ID: <1346998125.2484.220.camel@edumazet-glaptop> References: <20120906001108.GA6035@BohrerMBP.rgmadvisors.com> <1346912560.13121.175.camel@edumazet-glaptop> <20120906130316.GA2310@BohrerMBP.gateway.2wire.net> <1346937667.2484.33.camel@edumazet-glaptop> <20120907040043.GA2714@BohrerMBP.rgmadvisors.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: Shawn Bohrer Return-path: Received: from mail-wi0-f170.google.com ([209.85.212.170]:62670 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759127Ab2IGGIt (ORCPT ); Fri, 7 Sep 2012 02:08:49 -0400 Received: by wibhq12 with SMTP id hq12so208066wib.1 for ; Thu, 06 Sep 2012 23:08:48 -0700 (PDT) In-Reply-To: <20120907040043.GA2714@BohrerMBP.rgmadvisors.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 2012-09-06 at 23:00 -0500, Shawn Bohrer wrote: > On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote: > > kfree_skb() can free a list of skb, and we use a generic function to do > > so, without forwarding the drop/notdrop status. So its unfortunate, but > > adding extra parameters just for the sake of drop_monitor is not worth > > it. skb_drop_fraglist() doesnt know if the parent skb is dropped or > > only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb() > > I understand that this means that dropwatch or the skb:kfree_skb > tracepoint won't know if the packet was really dropped, but do we > know in this case from the context of the stack trace? I'm assuming > since we didn't receive an error that the packet was delivered and > these aren't real drops. I am starting to believe this is an application error. This application uses recvmmsg() to fetch a lot of messages in one syscall, and it might well be it throws out a batch of 50+ messages because of an application bug. Yes, this starts with 3.4, but it can b triggered by a timing difference or something that is not a proper kernel bug... > > > Are you receiving fragmented UDP frames ? > > I looked at the sending application and it yes it is possible it is > sending fragmented frames. > > > I ask this because with latest kernels (linux-3.5), we should no longer > > build a list of skb, but a single skb with page fragments. > > > > commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45 > > Author: Eric Dumazet > > Date: Sat May 19 03:02:20 2012 +0000 > > > > ipv4: use skb coalescing in defragmentation > > > > ip_frag_reasm() can use skb_try_coalesce() to build optimized skb, > > reducing memory used by them (truesize), and reducing number of cache > > line misses and overhead for the consumer. > > > > Signed-off-by: Eric Dumazet > > Cc: Alexander Duyck > > Signed-off-by: David S. Miller > > I'll have to give 3.5 a try tomorrow and see if it has the same > problem. After backporting all of your patches to convert kfree_skb() > to consume_skb() to 3.4 I actually don't have that many different > places hitting the skb:kfree_skb tracepoint at the time of the drop. > Here are some of the ones I have left that might be relevant. > > -0 [001] 11933.738797: kfree_skb: skbaddr=0xffff8805ebcf9500 protocol=2048 location=0xffffffff81404e33 > -0 [001] 11933.738801: kernel_stack: > => ip_rcv (ffffffff81404e33) > => __netif_receive_skb (ffffffff813ce123) > => netif_receive_skb (ffffffff813d0da1) > => process_responses (ffffffffa018486c) > => napi_rx_handler (ffffffffa0185606) > => net_rx_action (ffffffff813d2449) > => __do_softirq (ffffffff8103bfd0) > => call_softirq (ffffffff8148a14c) > => do_softirq (ffffffff81003e85) > => irq_exit (ffffffff8103c3a5) > => do_IRQ (ffffffff8148a693) > => ret_from_intr (ffffffff814814a7) > => cpu_idle (ffffffff8100ac16) > => start_secondary (ffffffff81af5e66) > > My IPSTATS_MIB_INHDRERRORS, IPSTATS_MIB_INDISCARDS, and > IPSTATS_MIB_INTRUNCATEDPKTS counters are all 0 so maybe this is from > skb->pkt_type == PACKET_OTHERHOST? > > -0 [001] 11933.937378: kfree_skb: skbaddr=0xffff8805ebcf8c00 protocol=2048 location=0xffffffff81404660 > -0 [001] 11933.937385: kernel_stack: > => ip_rcv_finish (ffffffff81404660) > => ip_rcv (ffffffff81404f61) > => __netif_receive_skb (ffffffff813ce123) > => netif_receive_skb (ffffffff813d0da1) > => process_responses (ffffffffa018486c) > => napi_rx_handler (ffffffffa0185606) > => net_rx_action (ffffffff813d2449) > => __do_softirq (ffffffff8103bfd0) > => call_softirq (ffffffff8148a14c) > => do_softirq (ffffffff81003e85) > => irq_exit (ffffffff8103c3a5) > => do_IRQ (ffffffff8148a693) > => ret_from_intr (ffffffff814814a7) > => cpu_idle (ffffffff8100ac16) > => start_secondary (ffffffff81af5e66) > > I see two places here that I might be hitting that don't increment any > counters. I can try instrumenting these to see which one I hit. > > -0 [003] 11932.454375: kfree_skb: skbaddr=0xffff880584843700 protocol=4 location=0xffffffffa00492d4 > -0 [003] 11932.454382: kernel_stack: > => llc_rcv (ffffffffa00492d4) > => __netif_receive_skb (ffffffff813ce123) > => netif_receive_skb (ffffffff813d0da1) > => process_responses (ffffffffa018486c) > => napi_rx_handler (ffffffffa0185606) > => net_rx_action (ffffffff813d2449) > => __do_softirq (ffffffff8103bfd0) > => call_softirq (ffffffff8148a14c) > => do_softirq (ffffffff81003e85) > => irq_exit (ffffffff8103c3a5) > => do_IRQ (ffffffff8148a693) > => ret_from_intr (ffffffff814814a7) > => cpu_idle (ffffffff8100ac16) > => start_secondary (ffffffff81af5e66) > > This is protocol=4 so I don't know if it is really relevant but then > again I don't know what this is. You can ignore this > > -0 [003] 11914.266635: kfree_skb: skbaddr=0xffff880584843b00 protocol=2048 location=0xffffffff8143bfa8 > -0 [003] 11914.266641: kernel_stack: > => igmp_rcv (ffffffff8143bfa8) > => ip_local_deliver_finish (ffffffff814049ed) > => ip_local_deliver (ffffffff81404d1a) > => ip_rcv_finish (ffffffff814046ad) > => ip_rcv (ffffffff81404f61) > => __netif_receive_skb (ffffffff813ce123) > => netif_receive_skb (ffffffff813d0da1) > => mlx4_en_process_rx_cq (ffffffffa010a4fe) > => mlx4_en_poll_rx_cq (ffffffffa010a9ef) > => net_rx_action (ffffffff813d2449) > => __do_softirq (ffffffff8103bfd0) > => call_softirq (ffffffff8148a14c) > => do_softirq (ffffffff81003e85) > => irq_exit (ffffffff8103c3a5) > => do_IRQ (ffffffff8148a693) > => ret_from_intr (ffffffff814814a7) > => cpu_idle (ffffffff8100ac16) > => start_secondary (ffffffff81af5e66) > > Also don't know if this one is relevant. This looks like an igmp > packet so probably not my drop, but I am receiving multicast packets > in this case so maybe it is somehow related. Yes, we need to change igmp to call consume_skb() for all frames that were properly handled. So you can ignore this as well.