From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Vishal Chourasia <vishalc@linux.vnet.ibm.com>
Cc: ritesh.list@gmail.com, vschneid@redhat.com,
vincent.guittot@linaro.org, srikar@linux.vnet.ibm.com,
Peter Zijlstra <peterz@infradead.org>,
aneesh.kumar@linux.ibm.com, linux-kernel@vger.kernel.org,
sshegde@linux.ibm.com, mingo@redhat.com,
linuxppc-dev@lists.ozlabs.org
Subject: Re: sched/debug: CPU hotplug operation suffers in a large cpu systems
Date: Tue, 18 Oct 2022 13:04:40 +0200 [thread overview]
Message-ID: <Y06ISBWhJflnV+NI@kroah.com> (raw)
In-Reply-To: <Y06B0pr8hpwzxEzI@li-05afa54c-330e-11b2-a85c-e3f3aa0db1e9.ibm.com>
On Tue, Oct 18, 2022 at 04:07:06PM +0530, Vishal Chourasia wrote:
> On Mon, Oct 17, 2022 at 04:54:11PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 17, 2022 at 04:19:31PM +0200, Peter Zijlstra wrote:
> > >
> > > +GregKH who actually knows about debugfs.
> > >
> > > On Mon, Oct 17, 2022 at 06:40:49PM +0530, Vishal Chourasia wrote:
> > > > smt=off operation on system with 1920 CPUs is taking approx 59 mins on v5.14
> > > > versus 29 mins on v5.11 measured using:
> > > > # time ppc64_cpu --smt=off
> > > >
> > > >
> > > > |--------------------------------+----------------+--------------|
> > > > | method | sysctl | debugfs |
> > > > |--------------------------------+----------------+--------------|
> > > > | unregister_sysctl_table | 0.020050 s | NA |
> > > > | build_sched_domains | 3.090563 s | 3.119130 s |
> > > > | register_sched_domain_sysctl | 0.065487 s | NA |
> > > > | update_sched_domain_debugfs | NA | 2.791232 s |
> > > > | partition_sched_domains_locked | 3.195958 s | 5.933254 s |
> > > > |--------------------------------+----------------+--------------|
> > > >
> > > > Note: partition_sched_domains_locked internally calls build_sched_domains
> > > > and calls other functions respective to what's being currently used to
> > > > export information i.e. sysctl or debugfs
> > > >
> > > > Above numbers are quoted from the case where we tried offlining 1 cpu in system
> > > > with 1920 online cpus.
> > > >
> > > > From the above table, register_sched_domain_sysctl and
> > > > unregister_sysctl_table collectively took ~0.085 secs, whereas
> > > > update_sched_domain_debugfs took ~2.79 secs.
> > > >
> > > > Root cause:
> > > >
> > > > The observed regression stems from the way these two pseudo-filesystems handle
> > > > creation and deletion of files and directories internally.
> >
> > Yes, debugfs is not optimized for speed or memory usage at all. This
> > happens to be the first code path I have seen that cares about this for
> > debugfs files.
> >
> > You can either work on not creating so many debugfs files (do you really
> > really need all of them all the time?) Or you can work on moving
> > debugfs to use kernfs as the backend logic, which will save you both
> > speed and memory usage overall as kernfs is used to being used on
> > semi-fast paths.
> >
> > Maybe do both?
> >
> > hope this helps,
> >
> > greg k-h
>
> Yes, we need to create 7-8 files per domain per CPU, eventually ending up
> creating a lot of files.
Why do you need to? What tools require these debugfs files to be
present?
And if you only have 7-8 files per CPU, that does not seem like a lot of
files overall (14000-16000)? If you only offline 1 cpu, how is removing
7 or 8 files a bottleneck? Do you really offline 1999 cpus for a 2k
system?
> Is there a possibility of reverting back to /proc/sys/kernel/sched_domain/?
No, these are debugging-only things, they do not belong in /proc/
If you rely on them for real functionality, that's a different story,
but I want to know what tool uses them and for what functionality as
debugfs should never be relied on for normal operation of a system.
thanks,
greg k-h
WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Vishal Chourasia <vishalc@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, mingo@redhat.com,
vincent.guittot@linaro.org, vschneid@redhat.com,
srikar@linux.vnet.ibm.com, sshegde@linux.ibm.com,
linuxppc-dev@lists.ozlabs.org, ritesh.list@gmail.com,
aneesh.kumar@linux.ibm.com
Subject: Re: sched/debug: CPU hotplug operation suffers in a large cpu systems
Date: Tue, 18 Oct 2022 13:04:40 +0200 [thread overview]
Message-ID: <Y06ISBWhJflnV+NI@kroah.com> (raw)
In-Reply-To: <Y06B0pr8hpwzxEzI@li-05afa54c-330e-11b2-a85c-e3f3aa0db1e9.ibm.com>
On Tue, Oct 18, 2022 at 04:07:06PM +0530, Vishal Chourasia wrote:
> On Mon, Oct 17, 2022 at 04:54:11PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 17, 2022 at 04:19:31PM +0200, Peter Zijlstra wrote:
> > >
> > > +GregKH who actually knows about debugfs.
> > >
> > > On Mon, Oct 17, 2022 at 06:40:49PM +0530, Vishal Chourasia wrote:
> > > > smt=off operation on system with 1920 CPUs is taking approx 59 mins on v5.14
> > > > versus 29 mins on v5.11 measured using:
> > > > # time ppc64_cpu --smt=off
> > > >
> > > >
> > > > |--------------------------------+----------------+--------------|
> > > > | method | sysctl | debugfs |
> > > > |--------------------------------+----------------+--------------|
> > > > | unregister_sysctl_table | 0.020050 s | NA |
> > > > | build_sched_domains | 3.090563 s | 3.119130 s |
> > > > | register_sched_domain_sysctl | 0.065487 s | NA |
> > > > | update_sched_domain_debugfs | NA | 2.791232 s |
> > > > | partition_sched_domains_locked | 3.195958 s | 5.933254 s |
> > > > |--------------------------------+----------------+--------------|
> > > >
> > > > Note: partition_sched_domains_locked internally calls build_sched_domains
> > > > and calls other functions respective to what's being currently used to
> > > > export information i.e. sysctl or debugfs
> > > >
> > > > Above numbers are quoted from the case where we tried offlining 1 cpu in system
> > > > with 1920 online cpus.
> > > >
> > > > From the above table, register_sched_domain_sysctl and
> > > > unregister_sysctl_table collectively took ~0.085 secs, whereas
> > > > update_sched_domain_debugfs took ~2.79 secs.
> > > >
> > > > Root cause:
> > > >
> > > > The observed regression stems from the way these two pseudo-filesystems handle
> > > > creation and deletion of files and directories internally.
> >
> > Yes, debugfs is not optimized for speed or memory usage at all. This
> > happens to be the first code path I have seen that cares about this for
> > debugfs files.
> >
> > You can either work on not creating so many debugfs files (do you really
> > really need all of them all the time?) Or you can work on moving
> > debugfs to use kernfs as the backend logic, which will save you both
> > speed and memory usage overall as kernfs is used to being used on
> > semi-fast paths.
> >
> > Maybe do both?
> >
> > hope this helps,
> >
> > greg k-h
>
> Yes, we need to create 7-8 files per domain per CPU, eventually ending up
> creating a lot of files.
Why do you need to? What tools require these debugfs files to be
present?
And if you only have 7-8 files per CPU, that does not seem like a lot of
files overall (14000-16000)? If you only offline 1 cpu, how is removing
7 or 8 files a bottleneck? Do you really offline 1999 cpus for a 2k
system?
> Is there a possibility of reverting back to /proc/sys/kernel/sched_domain/?
No, these are debugging-only things, they do not belong in /proc/
If you rely on them for real functionality, that's a different story,
but I want to know what tool uses them and for what functionality as
debugfs should never be relied on for normal operation of a system.
thanks,
greg k-h
next prev parent reply other threads:[~2022-10-18 11:05 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-17 13:10 sched/debug: CPU hotplug operation suffers in a large cpu systems Vishal Chourasia
2022-10-17 14:19 ` Peter Zijlstra
2022-10-17 14:54 ` Greg Kroah-Hartman
2022-10-18 10:37 ` Vishal Chourasia
2022-10-18 11:04 ` Greg Kroah-Hartman [this message]
2022-10-18 11:04 ` Greg Kroah-Hartman
2022-10-26 6:37 ` Vishal Chourasia
2022-10-26 6:37 ` Vishal Chourasia
2022-10-26 7:02 ` Greg Kroah-Hartman
2022-10-26 7:02 ` Greg Kroah-Hartman
2022-10-26 9:10 ` Peter Zijlstra
2022-10-26 9:10 ` Peter Zijlstra
2022-11-08 10:00 ` Vishal Chourasia
2022-11-08 10:00 ` Vishal Chourasia
2022-11-08 12:24 ` Greg Kroah-Hartman
2022-11-08 12:24 ` Greg Kroah-Hartman
2022-11-08 14:51 ` Srikar Dronamraju
2022-11-08 14:51 ` Srikar Dronamraju
2022-11-08 15:38 ` Greg Kroah-Hartman
2022-11-08 15:38 ` Greg Kroah-Hartman
2022-12-12 19:17 ` Phil Auld
2022-12-12 19:17 ` Phil Auld
2022-12-13 2:17 ` kernel test robot
2022-12-13 2:17 ` kernel test robot
2022-12-13 6:23 ` Greg Kroah-Hartman
2022-12-13 6:23 ` Greg Kroah-Hartman
2022-12-13 13:22 ` Phil Auld
2022-12-13 13:22 ` Phil Auld
2022-12-13 14:31 ` Greg Kroah-Hartman
2022-12-13 14:31 ` Greg Kroah-Hartman
2022-12-13 14:45 ` Phil Auld
2022-12-13 14:45 ` Phil Auld
2023-01-19 15:31 ` Phil Auld
2023-01-19 15:31 ` Phil Auld
2022-12-13 23:41 ` Michael Ellerman
2022-12-13 23:41 ` Michael Ellerman
2022-12-14 2:26 ` Phil Auld
2022-12-14 2:26 ` Phil Auld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y06ISBWhJflnV+NI@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=ritesh.list@gmail.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=sshegde@linux.ibm.com \
--cc=vincent.guittot@linaro.org \
--cc=vishalc@linux.vnet.ibm.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.