Re: netif_rx packet dumping

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Baruch Even <baruch@ev-en.org>
To: hadi@cyberus.ca
Cc: Stephen Hemminger <shemminger@osdl.org>,
	John Heffner <jheffner@psc.edu>,
	"David S. Miller" <davem@davemloft.net>,
	rhee@eos.ncsu.edu, Yee-Ting.Li@nuim.ie, netdev@oss.sgi.com
Subject: Re: netif_rx packet dumping
Date: Tue, 08 Mar 2005 15:56:04 +0000	[thread overview]
Message-ID: <422DCB14.1040805@ev-en.org> (raw)
In-Reply-To: <1110203711.1088.1393.camel@jzny.localdomain>

jamal wrote:
> On Fri, 2005-03-04 at 03:47, Baruch Even wrote:
> 
>>jamal wrote:
> 
> 
>>>Can you explain a little more? Why does the the throttling cause any
>>>bad behavior thats any different from the queue being full? In both
>>>cases, packets arriving during that transient will be dropped.
>>
>>If you have 300 packets in the queue and the throttling kicks in you now 
>>drop ALL packets until the queue is empty, this will normally take some 
>>time, during all of this time you are dropping all the ACKs that are 
>>coming in, you lose SACK information and potentially you leave no packet 
>>in flight so that the next packet will be sent only due to retransmit 
>>timer waking up, at which point your congestion control algorithm starts
>>from cwnd=1.
>>
>>You can look at the report http://hamilton.ie/net/LinuxHighSpeed.pdf for 
>>some graphs of the effects.
> 
> Were the processors tied to NICs? 

No. These are single CPU machines (with HT).

> Your experiment is more than likely a single flow, correct?

Yes.

> In other words the whole queue was infact dedicated just for your one
> flow - thats why you can call this queue a transient burst queue. 

Indeed, For a router or a web server handling several thousand flows it 
might be different, but I don't expect it handles a single packet in one 
  ms (or more) as it happens for the current end-system ack handling code.

> Do you still have the data that shows how many packets were dropped
> during this period. Do you still have the experimental data? I am
> particulary interested in seeing the softnet stats as well as tcp
> netstats.

No, These tests were not run by me, I'll probably rerun similar tests as 
well to base my work on, send me in private how do I get the stats from 
the kernel and I'll add it to my test scripts.

> I think your main problem was the huge amounts of SACK on the writequeue
> and the resultant processing i.e section 1.1 and how you resolved that.

That is my main guess as well, the original work was done rather 
quickly, we are now reorganizing thoughts and redoing the tests in a 
more orderly fashion.

> I dont see any issue in dropping ACKs, many of them even for such large
> windows as you have - TCPs ACKs are cummulative. It is true if you drop
> "large" enough amounts of ACKS, you will end up in timeouts - but large
> enough in your case must be in the minimal 1000 packets. And to say you
> dropped a 1000 packets while processing 300 means you were taking too
> long processing the 300.

With the current code SACK processing takes a long time, so it is 
possible that it happened to drop more than a thousand packets while 
handling 300. I think that after the fixing of the SACK code, the rest 
might work without getting to much into the ingress queue. But that 
might still change when we go to even higher speeds.

> Then what would be really interesting is to see the perfomance you get
> from multiple flows with and without congestion.

We'd need to get a very high speed link for multiple high speed flows.

> I am not against a the benchmarky nature of the single flow and tuning
> for that, but we should also look at a wider scope at the effect before
> you handwave based on the result of one testcase.

I can't say I didn't handwave, but then, there is little experimentation 
done to see if the other claims are correct and that AFQ is really 
needed so early in the packet receive stage. There are also voices that 
say AFQ sucks and causes more damage than good, I don't remember details 
currently.

> So if i was you i would repeat 1.2 with the fix from 1.1 as well as
> tying the NIC to one CPU. And it would be a good idea to present more
> detailed results - not just tcp windows fluctuating (you may not need
> them for the paper, but would be useful to see for debugging purposes
> other parameters).

I'd be happy to hear what other benchmarks you would like to see, I 
currently intend to add some ack processing time analysis and oprofile 
information. With possibly showing the size of the ingress queue as a 
measure as well.

Making it as thorough as possible is one of my goals. Input is always 
welcome.

Baruch

next prev parent reply	other threads:[~2005-03-08 15:56 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-03 20:38 netif_rx packet dumping Stephen Hemminger
2005-03-03 20:55 ` David S. Miller
2005-03-03 21:01   ` Stephen Hemminger
2005-03-03 21:18   ` jamal
2005-03-03 21:21     ` Stephen Hemminger
2005-03-03 21:24       ` jamal
2005-03-03 21:32         ` David S. Miller
2005-03-03 21:54           ` Stephen Hemminger
2005-03-03 22:02             ` John Heffner
2005-03-03 22:26               ` jamal
2005-03-03 23:16                 ` Stephen Hemminger
2005-03-03 23:40                   ` jamal
2005-03-03 23:48                   ` Baruch Even
2005-03-04  3:45                     ` jamal
2005-03-04  8:47                       ` Baruch Even
2005-03-07 13:55                         ` jamal
2005-03-08 15:56                           ` Baruch Even [this message]
2005-03-08 22:02                             ` jamal
2005-03-22 21:55                             ` cliff white
2005-03-03 23:48                   ` John Heffner
2005-03-04  1:42                     ` Lennert Buytenhek
2005-03-04  3:10                       ` John Heffner
2005-03-04  3:31                         ` Lennert Buytenhek
2005-03-04 19:52                 ` Edgar E Iglesias
2005-03-04 19:54                   ` Stephen Hemminger
2005-03-04 21:41                     ` Edgar E Iglesias
2005-03-04 19:49             ` Jason Lunz
2005-03-03 22:01           ` jamal
2005-03-03 21:26 ` Baruch Even
2005-03-03 21:36   ` David S. Miller
2005-03-03 21:44     ` Baruch Even
2005-03-03 21:54       ` Andi Kleen
2005-03-03 22:04         ` David S. Miller
2005-03-03 21:57       ` David S. Miller
2005-03-03 22:14         ` Baruch Even
2005-03-08 15:42         ` Baruch Even
2005-03-08 17:00           ` Andi Kleen
2005-03-08 18:01             ` Baruch Even
2005-03-08 18:09             ` David S. Miller
2005-03-08 18:18               ` Andi Kleen
2005-03-08 18:37                 ` Thomas Graf
2005-03-08 18:51                   ` Arnaldo Carvalho de Melo
2005-03-08 22:16                   ` Andi Kleen
2005-03-08 18:27               ` Ben Greear
2005-03-09 23:57                 ` Thomas Graf
2005-03-10  0:03                   ` Stephen Hemminger
2005-03-10  8:33                   ` Andi Kleen
2005-03-10 14:08                     ` Thomas Graf
2005-03-31 16:33         ` Baruch Even
2005-03-03 22:03   ` jamal
2005-03-03 22:31     ` Baruch Even

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=422DCB14.1040805@ev-en.org \
    --to=baruch@ev-en.org \
    --cc=Yee-Ting.Li@nuim.ie \
    --cc=davem@davemloft.net \
    --cc=hadi@cyberus.ca \
    --cc=jheffner@psc.edu \
    --cc=netdev@oss.sgi.com \
    --cc=rhee@eos.ncsu.edu \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).