public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* irq load balancing
@ 2007-09-11 23:18 Venkat Subbiah
  2007-09-12 11:51 ` Stephen Hemminger
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Venkat Subbiah @ 2007-09-11 23:18 UTC (permalink / raw)
  To: linux-kernel

Most of the load in my system is triggered by a single ethernet IRQ. Essentially the IRQ schedules a tasklet and most of the work is done in the taskelet which is scheduled in the IRQ. From what I read looks like the tasklet would be executed on the same CPU on which it was scheduled. So this means even in an SMP system it will be one processor which is overloaded.

So will using the user space IRQ loadbalancer really help? What I am doubtful about is that the user space load balance comes along and changes the affinity once in a while. But really what I need is every interrupt to go to a different CPU in a round robin fashion.

Looks like the APIC  can distribute IRQ's dynamically? Is this supported in the kernel and any config or proc interface to turn this on/off.


Thx,
Venkat


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq load balancing
  2007-09-11 23:18 irq load balancing Venkat Subbiah
@ 2007-09-12 11:51 ` Stephen Hemminger
  2007-09-12 11:55 ` kalash nainwal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2007-09-12 11:51 UTC (permalink / raw)
  To: linux-kernel

On Tue, 11 Sep 2007 16:18:15 -0700
"Venkat Subbiah" <venkats@cranite.com> wrote:

> Most of the load in my system is triggered by a single ethernet IRQ. Essentially the IRQ schedules a tasklet and most of the work is done in the taskelet which is scheduled in the IRQ. From what I read looks like the tasklet would be executed on the same CPU on which it was scheduled. So this means even in an SMP system it will be one processor which is overloaded.

The network device should use NAPI and process many packets per IRQ.
What device driver is it?

> So will using the user space IRQ loadbalancer really help? What I am doubtful about is that the user space load balance comes along and changes the affinity once in a while. But really what I need is every interrupt to go to a different CPU in a round robin fashion.

Userspace IRQ balancer detects network devices and intentionally does
not balance them. See: http://irqbalance.org/documentation.php

> Looks like the APIC  can distribute IRQ's dynamically? Is this supported in the kernel and any config or proc interface to turn this on/off.

Network device IRQ distribution usually is bad, because it causes cache
thrashing.
> 
> Thx,
> Venkat
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq load balancing
  2007-09-11 23:18 irq load balancing Venkat Subbiah
  2007-09-12 11:51 ` Stephen Hemminger
@ 2007-09-12 11:55 ` kalash nainwal
  2007-09-12 14:47 ` Arjan van de Ven
  2007-09-12 21:44 ` Chris Snook
  3 siblings, 0 replies; 9+ messages in thread
From: kalash nainwal @ 2007-09-12 11:55 UTC (permalink / raw)
  To: Venkat Subbiah; +Cc: linux-kernel

On 9/12/07, Venkat Subbiah <venkats@cranite.com> wrote:
> Most of the load in my system is triggered by a single ethernet IRQ. Essentially the IRQ schedules a tasklet and most of the work is done in the taskelet which is scheduled in the IRQ. From what I read looks like the tasklet would be executed on the same CPU on which it was scheduled. So this means even in an SMP system it will be one processor which is overloaded.
>
> So will using the user space IRQ loadbalancer really help? What I am doubtful about is that the user space load balance comes along and changes the affinity once in a while. But really what I need is every interrupt to go to a different CPU in a round robin fashion.
>
> Looks like the APIC  can distribute IRQ's dynamically? Is this supported in the kernel and any config or proc interface to turn this on/off.
>

/proc/irq/<irq#>/smp_affinity. But this is not generally suggested for
performance reasons (cache issues etc).

>
> Thx,
> Venkat
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq load balancing
  2007-09-11 23:18 irq load balancing Venkat Subbiah
  2007-09-12 11:51 ` Stephen Hemminger
  2007-09-12 11:55 ` kalash nainwal
@ 2007-09-12 14:47 ` Arjan van de Ven
  2007-09-12 21:44 ` Chris Snook
  3 siblings, 0 replies; 9+ messages in thread
From: Arjan van de Ven @ 2007-09-12 14:47 UTC (permalink / raw)
  To: Venkat Subbiah; +Cc: linux-kernel

On Tue, 11 Sep 2007 16:18:15 -0700
"Venkat Subbiah" <venkats@cranite.com> wrote:

> Most of the load in my system is triggered by a single ethernet IRQ.
> Essentially the IRQ schedules a tasklet and most of the work is done
> in the taskelet which is scheduled in the IRQ. From what I read looks
> like the tasklet would be executed on the same CPU on which it was
> scheduled. So this means even in an SMP system it will be one
> processor which is overloaded.
> 
> So will using the user space IRQ loadbalancer really help? What I am
> doubtful about is that the user space load balance comes along and
> changes the affinity once in a while. But really what I need is every
> interrupt to go to a different CPU in a round robin fashion.

if you round robin network interrupts your performance will be really
really bad....

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq load balancing
  2007-09-11 23:18 irq load balancing Venkat Subbiah
                   ` (2 preceding siblings ...)
  2007-09-12 14:47 ` Arjan van de Ven
@ 2007-09-12 21:44 ` Chris Snook
  2007-09-13 20:31   ` Venkat Subbiah
  3 siblings, 1 reply; 9+ messages in thread
From: Chris Snook @ 2007-09-12 21:44 UTC (permalink / raw)
  To: Venkat Subbiah; +Cc: linux-kernel

Venkat Subbiah wrote:
> Most of the load in my system is triggered by a single ethernet IRQ.
> Essentially the IRQ schedules a tasklet and most of the work is done in the
> taskelet which is scheduled in the IRQ. From what I read looks like the
> tasklet would be executed on the same CPU on which it was scheduled. So this
> means even in an SMP system it will be one processor which is overloaded.
> 
> So will using the user space IRQ loadbalancer really help?

A little bit.  It'll keep other IRQs on different CPUs, which will prevent other 
interrupts from causing cache and TLB evictions that could slow down the 
interrupt handler for the NIC.

> What I am doubtful
> about is that the user space load balance comes along and changes the
> affinity once in a while. But really what I need is every interrupt to go to
> a different CPU in a round robin fashion.

Doing it in a round-robin fashion will be disastrous for performance.  Your 
cache miss rate will go through the roof and you'll hit the slow paths in the 
network stack most of the time.

> Looks like the APIC  can distribute IRQ's dynamically? Is this supported in
> the kernel and any config or proc interface to turn this on/off.

/proc/irq/$FOO/smp_affinity is a bitmask.  You can mask an irq to multiple 
processors.  Of course, this will absolutely kill your performance.  That's why 
irqbalance never does this.

	-- Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: irq load balancing
  2007-09-12 21:44 ` Chris Snook
@ 2007-09-13 20:31   ` Venkat Subbiah
  2007-09-13 20:44     ` Lennart Sorensen
  0 siblings, 1 reply; 9+ messages in thread
From: Venkat Subbiah @ 2007-09-13 20:31 UTC (permalink / raw)
  To: Chris Snook; +Cc: linux-kernel

Doing it in a round-robin fashion will be disastrous for performance.
Your cache miss rate will go through the roof and you'll hit the slow
paths in the network stack most of the time.
> Most of the work in my system is spent in enrypt/decrypting traffic.
Right now all this is done in a tasklet within the softirqd and hence
all landing up on the same CPU.
On the receive side it'a packet handler that handles the traffic. On the
tx side it's done within the transmit path of the packet. So would
re-architecting this to move the rx packet handler to a different kernel
thread(with smp affinity to one CPU) and tx to a different kernel
thread(with SMP affinity to a different CPU) be advisable. 
What's the impact on cache miss and slowpath/fastpath in network stack.

Thx,
-Venkat

-----Original Message-----
From: Chris Snook [mailto:csnook@redhat.com] 
Sent: Wednesday, September 12, 2007 2:45 PM
To: Venkat Subbiah
Cc: linux-kernel@vger.kernel.org
Subject: Re: irq load balancing

Venkat Subbiah wrote:
> Most of the load in my system is triggered by a single ethernet IRQ.
> Essentially the IRQ schedules a tasklet and most of the work is done
in the
> taskelet which is scheduled in the IRQ. From what I read looks like
the
> tasklet would be executed on the same CPU on which it was scheduled.
So this
> means even in an SMP system it will be one processor which is
overloaded.
> 
> So will using the user space IRQ loadbalancer really help?

A little bit.  It'll keep other IRQs on different CPUs, which will
prevent other 
interrupts from causing cache and TLB evictions that could slow down the

interrupt handler for the NIC.

> What I am doubtful
> about is that the user space load balance comes along and changes the
> affinity once in a while. But really what I need is every interrupt to
go to
> a different CPU in a round robin fashion.

Doing it in a round-robin fashion will be disastrous for performance.
Your 
cache miss rate will go through the roof and you'll hit the slow paths
in the 
network stack most of the time.

> Looks like the APIC  can distribute IRQ's dynamically? Is this
supported in
> the kernel and any config or proc interface to turn this on/off.

/proc/irq/$FOO/smp_affinity is a bitmask.  You can mask an irq to
multiple 
processors.  Of course, this will absolutely kill your performance.
That's why 
irqbalance never does this.

	-- Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq load balancing
  2007-09-13 20:31   ` Venkat Subbiah
@ 2007-09-13 20:44     ` Lennart Sorensen
  2007-09-13 21:02       ` Venkat Subbiah
  0 siblings, 1 reply; 9+ messages in thread
From: Lennart Sorensen @ 2007-09-13 20:44 UTC (permalink / raw)
  To: Venkat Subbiah; +Cc: Chris Snook, linux-kernel

On Thu, Sep 13, 2007 at 01:31:39PM -0700, Venkat Subbiah wrote:
> Doing it in a round-robin fashion will be disastrous for performance.
> Your cache miss rate will go through the roof and you'll hit the slow
> paths in the network stack most of the time.
> > Most of the work in my system is spent in enrypt/decrypting traffic.
> Right now all this is done in a tasklet within the softirqd and hence
> all landing up on the same CPU.
> On the receive side it'a packet handler that handles the traffic. On the
> tx side it's done within the transmit path of the packet. So would
> re-architecting this to move the rx packet handler to a different kernel
> thread(with smp affinity to one CPU) and tx to a different kernel
> thread(with SMP affinity to a different CPU) be advisable. 
> What's the impact on cache miss and slowpath/fastpath in network stack.

Since most network devices have a single status register for both
receiver and transmit (and errors and the like), which needs a lock to
protect access, you will likely end up with serious thrashing of moving
the lock between cpus.

--
Len Sorensen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: irq load balancing
  2007-09-13 20:44     ` Lennart Sorensen
@ 2007-09-13 21:02       ` Venkat Subbiah
  2007-09-13 21:30         ` Chris Snook
  0 siblings, 1 reply; 9+ messages in thread
From: Venkat Subbiah @ 2007-09-13 21:02 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Chris Snook, linux-kernel

Since most network devices have a single status register for both
receiver and transmit (and errors and the like), which needs a lock to
protect access, you will likely end up with serious thrashing of moving
the lock between cpus.
> Any ways to measure the trashing of locks?

Since most network devices have a single status register for both
receiver and transmit (and errors and the like)
> These register accesses will be mostly within the irq handler which I
plan on keeping on the same processor. The network driver is actually
tg3. Will looks closely into the driver.

Thx,
Venkat


-----Original Message-----
From: Lennart Sorensen [mailto:lsorense@csclub.uwaterloo.ca] 
Sent: Thursday, September 13, 2007 1:45 PM
To: Venkat Subbiah
Cc: Chris Snook; linux-kernel@vger.kernel.org
Subject: Re: irq load balancing

On Thu, Sep 13, 2007 at 01:31:39PM -0700, Venkat Subbiah wrote:
> Doing it in a round-robin fashion will be disastrous for performance.
> Your cache miss rate will go through the roof and you'll hit the slow
> paths in the network stack most of the time.
> > Most of the work in my system is spent in enrypt/decrypting traffic.
> Right now all this is done in a tasklet within the softirqd and hence
> all landing up on the same CPU.
> On the receive side it'a packet handler that handles the traffic. On
the
> tx side it's done within the transmit path of the packet. So would
> re-architecting this to move the rx packet handler to a different
kernel
> thread(with smp affinity to one CPU) and tx to a different kernel
> thread(with SMP affinity to a different CPU) be advisable. 
> What's the impact on cache miss and slowpath/fastpath in network
stack.

Since most network devices have a single status register for both
receiver and transmit (and errors and the like), which needs a lock to
protect access, you will likely end up with serious thrashing of moving
the lock between cpus.

--
Len Sorensen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq load balancing
  2007-09-13 21:02       ` Venkat Subbiah
@ 2007-09-13 21:30         ` Chris Snook
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Snook @ 2007-09-13 21:30 UTC (permalink / raw)
  To: Venkat Subbiah; +Cc: Lennart Sorensen, linux-kernel

Venkat Subbiah wrote:
> Since most network devices have a single status register for both
> receiver and transmit (and errors and the like), which needs a lock to
> protect access, you will likely end up with serious thrashing of moving
> the lock between cpus.
>> Any ways to measure the trashing of locks?
> 
> Since most network devices have a single status register for both
> receiver and transmit (and errors and the like)
>> These register accesses will be mostly within the irq handler which I
> plan on keeping on the same processor. The network driver is actually
> tg3. Will looks closely into the driver.

Why are you trying to do this, anyway?  This is a classic example of fairness 
hurting both performance and efficiency.  Unbalanced distribution of a single 
IRQ gives superior performance.  There are cases when this is a worthwhile 
tradeoff, but the network stack is not one of them.  In the HPC world, people 
generally want to squeeze maximum performance out of CPU/cache/RAM so they just 
accept the imbalance because it performs better than balancing it, and 
irqbalance can keep things fair over longer intervals if that's important.  In 
the realtime world, people generally bind everything they can to one or two 
CPUs, and bind their realtime applications to the remaining ones to minimize 
contention.

Distributing your network interrupts in a round-robin fashion will make your 
computer do exactly one thing faster: heat up the room.

	-- Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-09-13 21:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-11 23:18 irq load balancing Venkat Subbiah
2007-09-12 11:51 ` Stephen Hemminger
2007-09-12 11:55 ` kalash nainwal
2007-09-12 14:47 ` Arjan van de Ven
2007-09-12 21:44 ` Chris Snook
2007-09-13 20:31   ` Venkat Subbiah
2007-09-13 20:44     ` Lennart Sorensen
2007-09-13 21:02       ` Venkat Subbiah
2007-09-13 21:30         ` Chris Snook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox