Netdev List
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Yuehai Xu <yuehaixu@gmail.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, yhxu@wayne.edu
Subject: Re: Why the number of /proc/interrupts doesn't change when nic is under heavy workload?
Date: Sun, 15 Jan 2012 23:09:27 +0100	[thread overview]
Message-ID: <1326665367.5287.97.camel@edumazet-laptop> (raw)
In-Reply-To: <CAEc1PS0doeaH59621ia7n2GT=WgRSsCit_EzGrCV93QH95XZqw@mail.gmail.com>

Le dimanche 15 janvier 2012 à 15:53 -0500, Yuehai Xu a écrit :
> Hi All,
> 
> My nic of server is Intel Corporation 80003ES2LAN Gigabit Ethernet
> Controller, the driver is e1000e, and my Linux version is 3.1.4. I
> have a Memcached server running on this 8 core box, the weird thing is
> that when my server is under heavy workload, the number of
> /proc/interrupts doesn't change at all. Below are some details:
> =======
> cat /proc/interrupts | grep eth0
> 68:     330887     330861     331432     330544     330346     330227
>    330830     330575   PCI-MSI-edge      eth0
> =======
> cat /proc/irq/68/smp_affinity
> ff
> 
> I know when network is under heavy load, NAPI will disable nic
> interrupt and poll ring buffer in nic. My question is, when is nic
> interrupt enabled again? It seems that it will never be enabled if the
> heavy workload doesn't stop, simply because the number showed by
> /proc/interrupts doesn't change at all. In my case, one of core is
> saturated by ksoftirqd, because lots of softirqs are pending to that
> core. I just want to distribute these softirqs to other cores. Even
> RPS is enabled, that core is still occupied by ksoftirq, nearly 100%.
> 
> I dive into the codes and find these statements:
> __napi_schedule ==>
>    local_irq_save(flags);
>    ____napi_schedule(&__get_cpu_var(softnet_data), n);
>    local_irq_restore(flags);
> 
> here "local_irq_save" actually invokes "cli" which disable interrupt
> for the local core, is this the one that used in NAPI to disable nic
> interrupt? Personally I don't think it is because it just disables
> local cpu.
> 
> I also find "enable_irq/disable_irq/e1000_irq_enable/e1000_irq_disable"
> under drivers/net/e1000e, are these used in NAPI to disable nic
> interrupt, but I fail to get any clue that they are used in the code
> path of NAPI?

This is done in the device driver itself, not in generic NAPI code.

When NAPI poll() get less packets than the budget, it re-enables chip
interrupts.


> 
> My current situation is that, almost 60% of time of other 7 cores are
> idle, while only one core which is occupied by ksoftirq is 100% busy.
> 

You could post some info, like "cat /proc/net/softnet_stat"

If you use RPS on a very high workload, on a mono queue NIC, best is to
stick for example cpu0 for the packet dispatching, and other cpus for
IP/UDP handling.

echo 01 >/proc/irq/68/smp_affinity
echo fe >/sys/class/net/eth0/queues/rx-0/rps_cpus

Please keep in mind that if your memcache uses a single UDP socket, you
probably hit a lot of contention on the socket spinlock and various
counters. So maybe it would be better to _reduce_ number of cpus
handling network load to reduce false sharing.

echo 0e >/sys/class/net/eth0/queues/rx-0/rps_cpus

Really, if you have a single UDP queue, best would be to not use RPS and
only have :

echo 01 >/proc/irq/68/smp_affinity

Then you could post the result of "perf top -C 0" so that we can spot
obvious problems on the hot path for this particular cpu.

  reply	other threads:[~2012-01-15 22:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-15 20:53 Why the number of /proc/interrupts doesn't change when nic is under heavy workload? Yuehai Xu
2012-01-15 22:09 ` Eric Dumazet [this message]
2012-01-15 22:27   ` Yuehai Xu
2012-01-15 22:45     ` Yuehai Xu
2012-01-15 23:10       ` Eric Dumazet
2012-01-16  6:53     ` Eric Dumazet
2012-01-16  7:01       ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1326665367.5287.97.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yhxu@wayne.edu \
    --cc=yuehaixu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox