netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	David Miller <davem@davemloft.net>,
	wenji@fnal.gov, akpm@osdl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
Date: Thu, 30 Nov 2006 11:32:40 +0100	[thread overview]
Message-ID: <20061130103240.GA25733@elte.hu> (raw)
In-Reply-To: <20061130102205.GA20654@2ka.mipt.ru>


* Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> > David's line of thinking for a solution sounds better to me. This 
> > patch does not prevent the process from being preempted (for 
> > potentially a long time), by any means.
> 
> It steals timeslices from other processes to complete tcp_recvmsg() 
> task, and only when it does it for too long, it will be preempted. 
> Processing backlog queue on behalf of need_resched() will break 
> fairness too - processing itself can take a lot of time, so process 
> can be scheduled away in that part too.

correct - it's just the wrong thing to do. The '10% performance win' 
that was measured was against _9 other tasks who contended for the same 
CPU resource_. I.e. it's /not/ an absolute 'performance win' AFAICS, 
it's a simple shift in CPU cycles away from the other 9 tasks and 
towards the task that does TCP receive.

Note that even without the change the TCP receiving task is already 
getting a disproportionate share of cycles due to softirq processing! 
Under a load of 10.0 it went from 500 mbits to 74 mbits, while the 
'fair' share would be 50 mbits. So the TCP receiver /already/ has an 
unfair advantage. The patch only deepends that unfairness.

The solution is really simple and needs no kernel change at all: if you 
want the TCP receiver to get a larger share of timeslices then either 
renice it to -20 or renice the other tasks to +19.

The other disadvantage, even ignoring that it's the wrong thing to do, 
is the crudeness of preempt_disable() that i mentioned in the other 
post:

---------->

independently of the issue at hand, in general the explicit use of 
preempt_disable() in non-infrastructure code is quite a heavy tool. Its 
effects are heavy and global: it disables /all/ preemption (even on 
PREEMPT_RT). Furthermore, when preempt_disable() is used for per-CPU 
data structures then [unlike for example to a spin-lock] the connection 
between the 'data' and the 'lock' is not explicit - causing all kinds of 
grief when trying to convert such code to a different preemption model. 
(such as PREEMPT_RT :-)

So my plan is to remove all "open-coded" use of preempt_disable() [and 
raw use of local_irq_save/restore] from the kernel and replace it with 
some facility that connects data and lock. (Note that this will not 
result in any actual changes on the instruction level because internally 
every such facility still maps to preempt_disable() on non-PREEMPT_RT 
kernels, so on non-PREEMPT_RT kernels such code will still be the same 
as before.)

	Ingo

  reply	other threads:[~2006-11-30 10:32 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-30  1:56 [patch 1/4] - Potential performance bottleneck for Linxu TCP Wenji Wu
2006-11-30  2:19 ` David Miller
2006-11-30  6:17   ` Ingo Molnar
2006-11-30  6:30     ` David Miller
2006-11-30  6:47       ` Ingo Molnar
2006-11-30  7:12         ` David Miller
2006-11-30  7:35           ` Ingo Molnar
2006-11-30  9:52             ` Evgeniy Polyakov
2006-11-30 10:07               ` Nick Piggin
2006-11-30 10:22                 ` Evgeniy Polyakov
2006-11-30 10:32                   ` Ingo Molnar [this message]
2006-11-30 17:04                     ` Wenji Wu
2006-11-30 20:20                       ` Ingo Molnar
2006-11-30 20:58                         ` Wenji Wu
2006-11-30 20:22                     ` David Miller
2006-11-30 20:30                       ` Ingo Molnar
2006-11-30 20:38                         ` David Miller
2006-11-30 20:49                           ` Ingo Molnar
2006-11-30 20:54                             ` Ingo Molnar
2006-11-30 20:55                             ` David Miller
2006-11-30 20:14                   ` David Miller
2006-11-30 20:42                     ` Wenji Wu
2006-12-01  9:53                     ` Evgeniy Polyakov
2006-12-01 23:18                       ` David Miller
2006-11-30  6:56       ` Ingo Molnar
2006-11-30 16:08   ` Wenji Wu
2006-11-30 20:06     ` David Miller
2006-11-30  9:33 ` Christoph Hellwig
2006-11-30 16:51   ` Lee Revell
  -- strict thread matches above, loose matches on Subject: below --
2006-11-30  2:02 Wenji Wu
2006-11-30  6:19 ` Ingo Molnar
2006-11-29 23:27 [Changelog] " Wenji Wu
2006-11-29 23:28 ` [patch 1/4] " Wenji Wu
2006-11-30  0:53   ` David Miller
2006-11-30  1:08     ` Andrew Morton
2006-11-30  1:13       ` David Miller
2006-11-30  6:04       ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061130103240.GA25733@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@osdl.org \
    --cc=davem@davemloft.net \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=wenji@fnal.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).