All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Rod Webster <rod@vmn.com.au>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-rt-users@vger.kernel.org
Subject: Re: Excessive network latency when using Realtek R8168/R8111 et al NIC
Date: Tue, 23 May 2023 20:04:17 -0300	[thread overview]
Message-ID: <ZG1GcdTf5q9ObM5N@tpad> (raw)
In-Reply-To: <CANV1gkcuPVZz6=BrNxWBd9SqB4ydNmJOge6Tir2xG9C050oH1w@mail.gmail.com>

On Tue, May 23, 2023 at 06:02:13AM +1000, Rod Webster wrote:
> This stuff is hard! I just realised that rtapi_app is a red herring!
> rtapi_app is Linuxcnc and there is nothing wrong with it. Its thread
> is on a 1000us cycle so it seems it gets all its jobs done in 200us
> and then sleeps for 800us which makes perfect sense!
> 
> The issue we have is deeper than that. I think we should be looking at
> the NIC interrupt (but don't trust the novice!).
> The network communication is consuming more than the 800us slack from
> time to time. When that happens, our hardware sees the timing overrun
> and increments an internal packet error count. If too many of these
> happen in succession, the hardware decides the RT environment can't be
> relied on, disables further communication and returns an "error
> finishing read" to Linuxcnc to say it's given up.
> 
> Marcelo, we didn't resort to C. We were able to use a bash script and
> use a linuxcnc tool called halcmd to query the hardware as shown here.
> #!/usr/bin/bash
> stat=0
> while (($stat < 1))
> do
> stat=`halcmd getp hm2_7i96s.0.packet-error-total`
> done
> trace-cmd stop

You can also write a value to the trace before stopping it:

echo "pin high, stopping tracing!" > /sys/kernel/debug/tracing/trace_marker

Just to make sure you are seeing the trace up to the correct event.

> I think we need to increase the stat threshold  so we get more samples
> in our trace before stopping it. The current trace will only have one
> instance.
> Thanks for letting me see the issue more clearly.
> 
> 
> Rod Webster
> 
> 
> 
> Rod Webster
> 
> VMN®
> 
> www.vmn.com.au
> 
> Ph: 1300 896 832
> 
> Mob: +61 435 765 611
> 
> 
> 
> 
> On Tue, 23 May 2023 at 03:13, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Mon, May 22, 2023 at 08:06:39PM +1000, Rod Webster wrote:
> > > Yes, I won't be using a DKMS driver and will test with the in-tree driver.
> > > I rolled a PC back to the Debian 6.1 RT kernel and the default in-tree
> > > R8169 driver yesterday with isolcpus=2,3.
> > >
> > > I had a shot at it last night with the napi thread but was confused by
> > > the trace results so will try again.
> > > I could not see our PacketErrorTest script appearing in the trace but
> > > it stopped the trace when it should.
> >
> > Rod,
> >
> > Can you describe the PacketErrorTest script and how do you know, from
> > it, that latency has crossed the threshold? What was confusing about
> > the traces?
> >
> > Earlier you wrote:
> > I've had some feedback from a hardware supplier.
> > The Linuxcnc RT ethernet driver has a pin that reports ethernet packet
> > error count.
> > This increments when the ethernet response exceeds our real time servo
> > thread period (typically 1 ms)
> > We should be able to monitor this pin and stop the trace if it is >  0
> > without too much effort.
> >
> > So you are seeing that pin increase its value?
> >
> > When you do, you should write to the "tracing_on" file (a zero):
> >
> >         /*
> >          * if we're not tracing and the tracing_on fd is not open,
> >          * open the tracing_on file so that we can stop the trace
> >          * if we hit a breaktrace threshold
> >          */
> >         if (trace_fd < 0) {
> >                 sprintf(path, "%s/%s", fileprefix, "tracing_on");
> >                 if ((trace_fd = open(path, O_WRONLY)) < 0)
> >                         warn("unable to open tracing_on file: %s\n", path);
> >         }
> >
> >         /* now stop any trace */
> >         write(trace_fd, "0\n", 2);
> >
> > The trace file (/sys/kernel/debug/tracing/trace) should now contain
> > the trace up to the write to tracing_on file.
> >
> > Sebastian: didnt see anything weird in the traces as well:
> >
> > "Follows the delta between a list of timestamped events,
> > where each event is: "TIMESTAMP: rtapi is scheduled IN CPU-3"
> > (there is at one time only a single process scheduled in cpu-3).
> >
> > [191.050245, 191.050278]: 0.000033
> > [191.050278, 191.050293]: 0.000015
> > [191.050293, 191.050307]: 0.000014
> > [191.050307, 191.050322]: 0.000015
> > [191.050322, 191.050336]: 0.000014
> >
> > rtapi_app goes to sleep for approximately 15us, and is awakened
> > by a timer.
> >
> > [191.050336, 191.050350]: 0.000014
> > [191.050350, 191.050365]: 0.000015
> > [191.050365, 191.050379]: 0.000014
> > [191.050379, 191.050393]: 0.000014
> > [191.050393, 191.050408]: 0.000015
> > [191.050408, 191.051243]: 0.000835
> > [191.051243, 191.051272]: 0.000029
> >
> > It eventually sleeps for 800us (and this pattern repeats)".
> >
> > Perhaps tracing the system call which sends out the packet would be
> > useful as well.
> >
> > > Also our RT process did not appear to be on an isolated thread which
> > > seemed odd. I did not get as far as setting the affinity for napi.
> > >
> > > I've also installed  Bookworm RC3 on another PC with the default RT
> > > kernel on a Gigabyte Brix N3160 with an R8111/R8168 NIC and the
> > > in-tree R8169 driver.
> > > This PC historically had max latency of about 130000 ns  back with
> > > kernel 4.x and now its reporting 23000 ns with the 6.1 kernel over 24
> > > hours testing with the linuxcnc tools. Quite amazing what you guys
> > > have achieved! It will be a few days before I can lug my hardware to
> > > work to test on it as well for another data point.
> > >
> > > I really appreciate your willingness to help. This remains a
> > > frustrating issue for many Linuxcnc users who are using my compiled
> > > 6.3 kernel to get out of jail.
> > > Bear with me as I bumble my way through!
> > >
> > > Rod Webster
> > >
> > >
> > >
> > >
> > >
> > > Rod Webster
> > >
> > > VMN®
> > >
> > > www.vmn.com.au
> > >
> > > Ph: 1300 896 832
> > >
> > > Mob: +61 435 765 611
> > >
> > >
> > >
> > >
> > > On Mon, 22 May 2023 at 19:32, Sebastian Andrzej Siewior
> > > <bigeasy@linutronix.de> wrote:
> > > >
> > > > On 2023-05-19 21:41:30 [+1000], Rod Webster wrote:
> > > > > I will do some testing tomorrow now I know what's going on but I
> > > > > wonder how relevant it will be given this turnaround?
> > > >
> > > > I'm not going to look at the dkms/OOT driver. If there is an actual
> > > > problem or something odd with mainline and RT and I will look into it.
> > > > I managed to find an older box which is using the r8169 and I'm going to
> > > > boot and see if there the driver is doing something odd what I didn't
> > > > see earlier…
> > > >
> > > > Please make sure that you are using the in-tree driver. If you
> > > > experience high latencies then I'm willing to help investigate.
> > > >
> > > > > Rod Webster
> > > >
> > > > Sebastian
> > >
> > >
> >
> 
> 


  parent reply	other threads:[~2023-05-23 23:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18  3:23 Excessive network latency when using Realtek R8168/R8111 et al NIC Rod Webster
2023-04-28  8:51 ` Sebastian Andrzej Siewior
     [not found]   ` <CANV1gkcr4jBUY-iH-iJJPdrVMp+1Nq1YrNPONferMC1AutJgkg@mail.gmail.com>
2023-04-28 13:12     ` Sebastian Andrzej Siewior
2023-04-28 21:37       ` Rod Webster
2023-04-29  1:00         ` Rod Webster
2023-05-16 10:59         ` Sebastian Andrzej Siewior
     [not found]           ` <CANV1gkftrZvhUhXV-mJ-mYmsue3ER33cXCNmVD1bGAc6TmTHuA@mail.gmail.com>
     [not found]             ` <CANV1gkfsAfDt76=STFrekQA4M6sfVKyq7bujA=Tu+S6k+EGYcg@mail.gmail.com>
2023-05-19  8:37               ` Sebastian Andrzej Siewior
2023-05-19 11:41                 ` Rod Webster
2023-05-22  9:32                   ` Sebastian Andrzej Siewior
2023-05-22 10:06                     ` Rod Webster
2023-05-22 14:45                       ` Marcelo Tosatti
2023-05-22 20:02                         ` Rod Webster
2023-05-22 20:37                           ` Peter Wallace
2023-05-22 20:50                             ` Rod Webster
2023-05-23 23:21                             ` Marcelo Tosatti
2023-05-24 14:09                               ` Peter Wallace
2023-05-23 23:04                           ` Marcelo Tosatti [this message]
2023-05-24  9:37                             ` Stephane ANCELOT

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZG1GcdTf5q9ObM5N@tpad \
    --to=mtosatti@redhat.com \
    --cc=bigeasy@linutronix.de \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=rod@vmn.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.