All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Dobriyan <adobriyan@gmail.com>
To: Waiman Long <longman@redhat.com>
Cc: linux-kernel@vger.kernel.org, rdunlap@infradead.org,
	akpm@linux-foundation.org
Subject: Re: [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs
Date: Thu, 19 Apr 2018 23:39:49 +0300	[thread overview]
Message-ID: <20180419203949.GA4555@avx2> (raw)
In-Reply-To: <eb7c1569-e445-5cbb-6d10-2694b625232a@redhat.com>

On Thu, Apr 19, 2018 at 04:21:14PM -0400, Waiman Long wrote:
> On 04/19/2018 03:55 PM, Alexey Dobriyan wrote:
> > On Thu, Apr 19, 2018 at 03:28:40PM -0400, Waiman Long wrote:
> >> On 04/19/2018 03:08 PM, Alexey Dobriyan wrote:
> >>>> Therefore, application performance can be impacted if the application
> >>>> reads /proc/stat rather frequently.
> >>> [nods]
> >>> Text interfaces can be designed in a very stupid way.
> >>>
> >>>> For example, reading /proc/stat in a certain 2-socket Skylake server
> >>>> took about 4.6ms because it had over 5k irqs.
> >>> Is this top(1)? What is this application doing?
> >>> If it needs percpu usage stats, then maybe /proc/stat should be
> >>> converted away from single_open() so that core seq_file code doesn't
> >>> generate everything at once.
> >> The application is actually a database benchmarking tool used by a
> >> customer.
> > So it probably needs lines before "intr" line.
> >
> >> The reading of /proc/stat is an artifact of the benchmarking
> >> tool that can actually be turned off. Without doing that, about 20% of
> >> CPU time were spent reading /proc/stat and the trashing of cachelines
> >> slowed the benchmark number quite significantly. However, I was also
> >> told that there are legitimate cases where reading /proc/stat was
> >> necessary in some of their applications.
> >>
> >>>> -
> >>>> -	/* sum again ? it could be updated? */
> >>>> -	for_each_irq_nr(j)
> >>>> -		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
> >>>> -
> >>> This is direct userspace breakage.
> >> Yes, I am aware of that. That is the cost of improving the performance
> >> of applications that read /proc/stat, but don't need the individual irq
> >> counts.
> > Yeah, but all it takes is one script which cares.
> >
> > I have an idea.
> >
> > Maintain "maximum registered irq #", it should be much smaller than
> > "nr_irqs":
> >
> > intr 4245359 151 0 0 0 0 0 0 0 38 0 0 0 0 0 0 0 0 0 64 0 0 0 0 0 0 0 0 0 0 44330 182364 57741 0 0 0 0 0 0 0 0 85 89124 0 0 0 0 0 323360 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> Yes, that can probably help.
> 
> This is the data from the problematic skylake server:
> 
> model name    : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
> 56 sosreport-carevalo.02076935-20180413085327/proc/stat
> Interrupts: 5370
> Interrupts without "0" entries: 1011
> 
> There are still quite a large number of non-zero entries, though.
> 
> > Or maintain array of registered irqs and iterate over them only.
> 
> Right, we can allocate a bitmap of used irqs to do that.
> 
> >
> > I have another idea.
> >
> > perf record shows mutex_lock/mutex_unlock at the top.
> > Most of them are irq mutex not seqfile mutex as there are many more
> > interrupts than reads. Take it once.
> >
> How many cpus are in your test system? In that skylake server, it was
> the per-cpu summing operation of the irq counts that was consuming most
> of the time for reading /proc/stat. I think we can certainly try to
> optimize the lock taking.

It's 16x(NR_IRQS: 4352, nr_irqs: 960, preallocated irqs: 16)
Given that irq registering is rare operation, maintaining sorted array
of irq should be the best option.

> For the time being, I think I am going to have a clone /proc/stat2 as
> suggested in my earlier email. Alternatively, I can put that somewhere
> in sysfs if you have a good idea of where I can put it.

sysfs is strictly one-value-per-file.

> I will also look into ways to optimize the current per-IRQ stats
> handling, but it will come later.

There is always a time-honored way of ioctl(2) switching irq info off
/proc supports that.

There are many options.

  parent reply	other threads:[~2018-04-19 20:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19 19:08 [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs Alexey Dobriyan
2018-04-19 19:28 ` Waiman Long
2018-04-19 19:55   ` Alexey Dobriyan
     [not found]     ` <eb7c1569-e445-5cbb-6d10-2694b625232a@redhat.com>
2018-04-19 20:39       ` Alexey Dobriyan [this message]
2018-04-19 20:58         ` Waiman Long
2018-04-19 23:23           ` Joel Fernandes (Google)
2018-04-21 20:34             ` Alexey Dobriyan
2018-04-21 20:36               ` Alexey Dobriyan
2018-04-24  5:54                 ` David Rientjes
2018-04-24  6:18                   ` Alexey Dobriyan
2018-05-02  0:02                     ` Andrew Morton
2018-04-19 21:05         ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2018-04-19 17:09 Waiman Long
2018-04-19 17:38 ` Randy Dunlap
2018-04-19 18:44   ` Waiman Long
2018-04-19 19:43 ` Andrew Morton
2018-04-19 19:57   ` Waiman Long
2018-04-19 20:02   ` Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180419203949.GA4555@avx2 \
    --to=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=rdunlap@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.