From: Eric Dumazet <eric.dumazet@gmail.com>
To: Yuehai Xu <yuehaixu@gmail.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, yhxu@wayne.edu
Subject: Re: Why the number of /proc/interrupts doesn't change when nic is under heavy workload?
Date: Sun, 15 Jan 2012 23:09:27 +0100 [thread overview]
Message-ID: <1326665367.5287.97.camel@edumazet-laptop> (raw)
In-Reply-To: <CAEc1PS0doeaH59621ia7n2GT=WgRSsCit_EzGrCV93QH95XZqw@mail.gmail.com>
Le dimanche 15 janvier 2012 à 15:53 -0500, Yuehai Xu a écrit :
> Hi All,
>
> My nic of server is Intel Corporation 80003ES2LAN Gigabit Ethernet
> Controller, the driver is e1000e, and my Linux version is 3.1.4. I
> have a Memcached server running on this 8 core box, the weird thing is
> that when my server is under heavy workload, the number of
> /proc/interrupts doesn't change at all. Below are some details:
> =======
> cat /proc/interrupts | grep eth0
> 68: 330887 330861 331432 330544 330346 330227
> 330830 330575 PCI-MSI-edge eth0
> =======
> cat /proc/irq/68/smp_affinity
> ff
>
> I know when network is under heavy load, NAPI will disable nic
> interrupt and poll ring buffer in nic. My question is, when is nic
> interrupt enabled again? It seems that it will never be enabled if the
> heavy workload doesn't stop, simply because the number showed by
> /proc/interrupts doesn't change at all. In my case, one of core is
> saturated by ksoftirqd, because lots of softirqs are pending to that
> core. I just want to distribute these softirqs to other cores. Even
> RPS is enabled, that core is still occupied by ksoftirq, nearly 100%.
>
> I dive into the codes and find these statements:
> __napi_schedule ==>
> local_irq_save(flags);
> ____napi_schedule(&__get_cpu_var(softnet_data), n);
> local_irq_restore(flags);
>
> here "local_irq_save" actually invokes "cli" which disable interrupt
> for the local core, is this the one that used in NAPI to disable nic
> interrupt? Personally I don't think it is because it just disables
> local cpu.
>
> I also find "enable_irq/disable_irq/e1000_irq_enable/e1000_irq_disable"
> under drivers/net/e1000e, are these used in NAPI to disable nic
> interrupt, but I fail to get any clue that they are used in the code
> path of NAPI?
This is done in the device driver itself, not in generic NAPI code.
When NAPI poll() get less packets than the budget, it re-enables chip
interrupts.
>
> My current situation is that, almost 60% of time of other 7 cores are
> idle, while only one core which is occupied by ksoftirq is 100% busy.
>
You could post some info, like "cat /proc/net/softnet_stat"
If you use RPS on a very high workload, on a mono queue NIC, best is to
stick for example cpu0 for the packet dispatching, and other cpus for
IP/UDP handling.
echo 01 >/proc/irq/68/smp_affinity
echo fe >/sys/class/net/eth0/queues/rx-0/rps_cpus
Please keep in mind that if your memcache uses a single UDP socket, you
probably hit a lot of contention on the socket spinlock and various
counters. So maybe it would be better to _reduce_ number of cpus
handling network load to reduce false sharing.
echo 0e >/sys/class/net/eth0/queues/rx-0/rps_cpus
Really, if you have a single UDP queue, best would be to not use RPS and
only have :
echo 01 >/proc/irq/68/smp_affinity
Then you could post the result of "perf top -C 0" so that we can spot
obvious problems on the hot path for this particular cpu.
next prev parent reply other threads:[~2012-01-15 22:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-15 20:53 Why the number of /proc/interrupts doesn't change when nic is under heavy workload? Yuehai Xu
2012-01-15 22:09 ` Eric Dumazet [this message]
2012-01-15 22:27 ` Yuehai Xu
2012-01-15 22:45 ` Yuehai Xu
2012-01-15 23:10 ` Eric Dumazet
2012-01-16 6:53 ` Eric Dumazet
2012-01-16 7:01 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1326665367.5287.97.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=yhxu@wayne.edu \
--cc=yuehaixu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox