From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A981DC43381 for ; Sat, 9 Mar 2019 21:07:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5125E20657 for ; Sat, 9 Mar 2019 21:07:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JI7c9xFX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726451AbfCIVHO (ORCPT ); Sat, 9 Mar 2019 16:07:14 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:42273 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726409AbfCIVHN (ORCPT ); Sat, 9 Mar 2019 16:07:13 -0500 Received: by mail-wr1-f66.google.com with SMTP id o9so1028336wrv.9 for ; Sat, 09 Mar 2019 13:07:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=snsjYG3PuSrP/nNo5ffylZBYq9PFuQQOf5uHWNm0Dkk=; b=JI7c9xFXE32eyNwBaLDnrd/nl7UnRXOn+RQDOOcspoVUo0cNxLRkGCIi00/rn1/RAF WnkK3fjXkG1yb7sfOxxy+2pTdwzelTls10yPN24mNYH5JNiyvVa9iLdl36qGzYk1AgYU /VUJQiihVrP8hYbWVyPZwmg4uD3PbNlLTgQv1Q/yBc6kt+rzwCSdtDGrXLlK3B0fNQEU nnc0nDvpDj+CbRg44gqCK/DyP75kWANnlC6Qo/SJQmNYr7mvY5xOf6++hzF43gwme+7f AUm9xqvhybufbZPASZRLL2OES8sXX5D53tpy4+l4WUidtKwPvaee57qw8A4w4SpAtPkR 7qPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=snsjYG3PuSrP/nNo5ffylZBYq9PFuQQOf5uHWNm0Dkk=; b=XnattPqBtkT+KU30vDpsJOiJUvip3pkNB5Mzo2pzKBKcbzUuSMNiwKLjYgmX/4VGy1 2zhMryBsHv3wHqAa5cnOE198/pxa/Geh9eNCt6b4WprMHNHobclb7tR4d69CkMAzYGbk nhtAMp6wMqr+ztQDXxsP9rfGJa0/udMIBJC6CnHCJj5XQ6FGKUP/7kJRJt+dIR5GFOux iEMR+NzMHVxSCEx298kqSDA/XfsH0B7c/FdJcC1VBWsyfpR2PGNJpFyBjBPGysKAR5tT CdnvNa/jf7t7//CDTKfjuIxvamWbtNS7/3U59PuFYzmieok5N+LrvTciqhjDlfj9GeuW fupw== X-Gm-Message-State: APjAAAWsurpA86QHuM7jDZVPJE3v+nwUFM6LYSt9NaNUSkrhlStO0sx7 TSLuLnTDtBryOa/CBQLna03CMBAetekj9hZOV0w= X-Google-Smtp-Source: APXvYqz5it5zXcpmT/wKWPZcVD8/kTaixu++h5AJHDXePHvn0uWP5FHBClfuy9SnqRtQGV7IWQe5ydyuZKSJg+WZauY= X-Received: by 2002:a5d:52ca:: with SMTP id r10mr15964960wrv.187.1552165630890; Sat, 09 Mar 2019 13:07:10 -0800 (PST) MIME-Version: 1.0 References: <20190228185758.6f37i2sfnr2mdrce@localhost> In-Reply-To: From: Paul Thomas Date: Sat, 9 Mar 2019 16:06:58 -0500 Message-ID: Subject: Re: [Linuxptp-devel] strangeness To: Harini Katakam Cc: "linuxptp-devel@lists.sourceforge.net" , netdev@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Harini, On Sat, Mar 9, 2019 at 12:38 AM Harini Katakam wrote: > > Hi Paul, > On Sat, Mar 9, 2019 at 3:13 AM Paul Thomas wrote: > > > > On Fri, Mar 8, 2019 at 1:07 PM Paul Thomas wrote: > > > > > > Hi Harini, > > > > > > On Fri, Mar 8, 2019 at 1:08 AM Harini Katakam wrote: > > > > > > > > Hi Paul, > > > > On Fri, Mar 8, 2019 at 12:33 AM Paul Thomas wrote: > > > > > > > > > > On Thu, Mar 7, 2019 at 12:32 AM Harini Katakam wrote: > > > > > > > > > > > > Hi Paul, > > > > > > > > > > > OK, I think things are becoming more clear. After just doing ioctl(fd, > > > > > SIOCSHWTSTAMP, &ifreq) from userspace (tx_bd_control = > > > > > TSTAMP_ALL_FRAMES in macb_ptp.c) then with the nc experiment some udp > > > > > transmits do not make it to macb_start_xmit() until receive traffic on > > > > > the nc connection comes in (one-to-one, one new rx packet means one > > > > > old tx packet goes out). > > > > > > > > Could you please share any wireshark log or dump for what is being > > > > received here? > > > > > > Here are two wireshark captures, the thing to note in the bad one is > > > that packets No. 5, 7, 9 from .102 to .103 were actually sent just > > > after packet No. 2 but they don't show up on the wire until the > > > packets the other way (one for one). > > > > > > > > > > > > > > > > > Working setup: > > > > > Before the tx_bd_control = TSTAMP_ALL_FRAMES. > > > > > Every time I hit "sN Enter" from nc I see a macb_start_xmit > > > > > print_hex_dump() and I see the packet on the nc client side: > > > > > # nc -l -u -p 9999 > > > > > ... > > > > > s11 > > > > > [ 347.517080] macb_start_xmit data: 00000000: 20 b0 f7 04 0a 29 20 b0 > > > > > f7 04 0a 26 08 00 45 00 ....) ....&..E. > > > > > s12 > > > > > [ 348.964369] macb_start_xmit data: 00000000: 20 b0 f7 04 0a 29 20 b0 > > > > > f7 04 0a 26 08 00 45 00 ....) ....&..E. > > > > > ... > > > > > > > > > > Broken setup: > > > > > After the tx_bd_control = TSTAMP_ALL_FRAMES. > > > > > Not the first nc packet, but many of the subsequent ones never make it > > > > > to macb_start_xmit() > > > > > # nc -l -u -p 9999 > > > > > ... > > > > > s3 > > > > > s4 > > > > > s5 > > > > > ... > > > > > Eventually after I send data from the client nc I do see the > > > > > macb_start_xmit() lines. > > > > > > > > Thanks for this debug. If macb_start_xmit is never called, one of > > > > the preceeding checks (such as if skb is present or if the TX queues > > > > are off etc) > > > > should fail. I'm still tracing this but I'm not sure under what > > > > circumstances only > > > > some UDP packets will be prevented from being transmitted. > > > In this specific test the first tx packets always goes through, and > > > the subsequent ones don't until rx packets. So it's not random when > > > they go through, I could have been clearer about that. > > > > > > > Just to be sure, could you please confirm you are not seeing any > > > > "buffer exhausted" messaged from TX error tasks? > > > Correct, I'm not seeing any "buffer exhausted" errors. > > > > > > thanks, > > > Paul > > > > And one more piece that may be helpful. I think I narrowed down what's > > happening in the receive that finally flushes out a pending tx packet. > > It seems to be the netif_receive_skb(skb); line in gem_rx() (line > > 1067). I tested with an mdelay before and after this call: > > mdelay(1000);//mdelay here is slow to flush the pending tx > > packet (as seen by nc client) > > netif_receive_skb(skb); > > //mdelay(1000);//mdelay here is fast to flush the pending tx > > packet (as seen by nc client) > > This seems very strange to me, I quickly glanced at what > > netif_receive_skb() is doing and didn't see anything connected with > > the TX path, but those are the symptoms. > > Thanks for the logs and debug. > I'm afraid I can't think of how this receive affects TX path. > Even if the IP somehow has any dependency between TX and RX path > (which is doesn't, on ZynqMP, to my knowledge), it wouldn't explain > why packets dont reach _xmit function at all. > Let me debug a little more. I looked into this a little bit more and maybe see something. Just to get started I did a backtrace from macb_start_xmit(), which shows the following (I'm sure sure this code path is is clear to lots of folks, but it was helpful for me): [ 106.758748] dump_backtrace (/home/share/kernels/linux-5.0-rc8/arch/arm64/kernel/traps.c:103) [ 106.762400] show_stack (/home/share/kernels/linux-5.0-rc8/arch/arm64/kernel/traps.c:155) [ 106.765700] dump_stack (/home/share/kernels/linux-5.0-rc8/lib/dump_stack.c:115) [ 106.769006] macb_start_xmit (/home/share/kernels/linux-5.0-rc8/drivers/net/ethernet/cadence/macb_main.c:1809) [ 106.772912] dev_hard_start_xmit (/home/share/kernels/linux-5.0-rc8/./include/linux/netdevice.h:4395 /home/share/kernels/linux-5.0-rc8/net/core/dev.c:3278 /home/share/kernels/linux-5.0-rc8/net/core/dev.c:3294) [ 106.777078] sch_direct_xmit (/home/share/kernels/linux-5.0-rc8/net/sched/sch_generic.c:327) [ 106.780984] __qdisc_run (/home/share/kernels/linux-5.0-rc8/net/sched/sch_generic.c:398) [ 106.784543] __dev_queue_xmit (/home/share/kernels/linux-5.0-rc8/./include/linux/seqlock.h:234 /home/share/kernels/linux-5.0-rc8/./include/linux/seqlock.h:388 /home/share/kernels/linux-5.0-rc8/./include/net/sch_generic.h:160 /home/share/kernels/linux-5.0-rc8/./include/net/pkt_sched.h:122 /home/share/kernels/linux-5.0-rc8/net/core/dev.c:3473 /home/share/kernels/linux-5.0-rc8/net/core/dev.c:3832) [ 106.788535] dev_queue_xmit (/home/share/kernels/linux-5.0-rc8/net/core/dev.c:3898) [ 106.792182] ip_finish_output2 (/home/share/kernels/linux-5.0-rc8/./include/net/neighbour.h:498 /home/share/kernels/linux-5.0-rc8/./include/net/neighbour.h:506 /home/share/kernels/linux-5.0-rc8/net/ipv4/ip_output.c:229) [ 106.796261] ip_finish_output (/home/share/kernels/linux-5.0-rc8/net/ipv4/ip_output.c:317) [ 106.800253] ip_output (/home/share/kernels/linux-5.0-rc8/net/ipv4/ip_output.c:409) [ 106.803639] ip_local_out (/home/share/kernels/linux-5.0-rc8/net/ipv4/ip_output.c:127) [ 106.807110] ip_send_skb (/home/share/kernels/linux-5.0-rc8/net/ipv4/ip_output.c:1466) [ 106.810496] udp_send_skb.isra.12 (/home/share/kernels/linux-5.0-rc8/net/ipv4/udp.c:902) [ 106.814836] udp_sendmsg (/home/share/kernels/linux-5.0-rc8/net/ipv4/udp.c:1188) [ 106.818395] inet_sendmsg (/home/share/kernels/linux-5.0-rc8/net/ipv4/af_inet.c:799) [ 106.821868] sock_sendmsg (/home/share/kernels/linux-5.0-rc8/net/socket.c:622 /home/share/kernels/linux-5.0-rc8/net/socket.c:631) [ 106.825348] sock_write_iter (/home/share/kernels/linux-5.0-rc8/net/socket.c:901) [ 106.829080] __vfs_write (/home/share/kernels/linux-5.0-rc8/fs/read_write.c:475 /home/share/kernels/linux-5.0-rc8/fs/read_write.c:487) [ 106.832638] vfs_write (/home/share/kernels/linux-5.0-rc8/fs/read_write.c:549) [ 106.835937] ksys_write (/home/share/kernels/linux-5.0-rc8/fs/read_write.c:598) [ 106.839236] __arm64_sys_write (/home/share/kernels/linux-5.0-rc8/fs/read_write.c:607) [ 106.843143] el0_svc_common (/home/share/kernels/linux-5.0-rc8/arch/arm64/kernel/syscall.c:52 /home/share/kernels/linux-5.0-rc8/arch/arm64/kernel/syscall.c:83) [ 106.846875] el0_svc_handler (/home/share/kernels/linux-5.0-rc8/arch/arm64/kernel/syscall.c:130) [ 106.850607] el0_svc (/home/share/kernels/linux-5.0-rc8/arch/arm64/kernel/entry.S:915) I thought I could put some prints in to see where things are getting hung up, but I traced this back all the way to sock_write_iter() and this doesn't happen until right when the packets go out. So next I turned to strace to actually see what nc was doing. In the normal condition nc has a pselect6() that it is waiting on for the reads, but in the broken case it seems to get through the pselect6() and is blocking on the read(). Hopefully, for someone who understands how pselect is handled from the networking stack and driver, this will be very clear what is going on and what needs to be done to fix this. -Paul > > Regards, > Harini