From: Michal Hocko <mhocko@suse.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>,
"Gerald Schaefer" <gerald.schaefer@linux.ibm.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"Greg KH" <gregkh@linuxfoundation.org>,
"Jan Höppner" <hoeppner@linux.ibm.com>,
"Heiko Carstens" <hca@linux.ibm.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
linux-api@vger.kernel.org,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
Date: Fri, 11 Sep 2020 11:12:52 +0200 [thread overview]
Message-ID: <20200911091252.GD7986@dhcp22.suse.cz> (raw)
In-Reply-To: <02cdbf90-b29f-a9ec-c83d-49f2548e3e91@redhat.com>
On Fri 11-09-20 10:09:07, David Hildenbrand wrote:
[...]
> Consider two cases:
>
> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to
> online/offline the whole thing. HW can effectively only plug/unplug the
> whole thing. It makes sense in some (most?) setups to represent one DIMM
> as one memory block device.
Yes, for the physical hotplug it doesn't really make much sense to me to
offline portions that the HW cannot hotremove.
> 2. Hot(un)plugging small memory increments. This is mostly the case in
> virtualized environments - especially hyper-v balloon, xen balloon,
> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC,
> you want at least all (16MB!) memory block devices that can get
> unplugged again individually ("LMBs") as separate memory blocks. Same on
> s390x on memory increment size (currently effectively the memory block
> size).
Yes I do recognize those usecase even though I will not pretend I
consider it quesitonable. E.g. any hotplug with a smaller granularity
than the memory model in Linus allows is just dubious. We simply cannot
implement that without a lot of wasting and then the question is what is
the real point.
> In summary, larger memory block devices mostly only make sense with
> DIMMs (and for boot memory in some cases). We will still end up with
> many memory block devices in other configurations.
And that is fine because the boot time memory is still likely the
primary source of memory. And reducing memory devices for those is a
huge improvement already (just think of a multi TB system with
gazillions pointless memory devices).
> I do agree that a "disable sysfs" option is interesting - even with
> memory hotplug (we mostly need a way to configure it and a way to notify
> kexec-tools about memory hot(un)plug events). I am currently (once
> again) looking into improving auto-onlining support in the kernel.
>
> Having that said, I much rather want to see smaller improvements (that
> can be fine-tuned individually - like allowing variable-sized memory
> blocks) than doing a switch to "new shiny" and figuring out after a
> while that we need "new shiny2".
There is only one certainty. Providing a long term interface with ever
growing (ab)users is a hard target. And shinyN might be needed in the
end. Who knows. My main point is that the existing interface is hitting
a wall on usecases which _do_not_care_ about memory hotplug. And that is
something we should be looking at.
> I consider removing "phys_device" as one of these tunables. The question
> would be how to make such sysfs changes easy to configure
> ("-phys_device", "+variable_sized_blocks" ...)
I am with you on that. There are more candidates in memory block
directories which have dubious value. Deprecation process is a PITA and
that's why I thought that it would make sense to focus on something that
we can mis^Wdesign with exising and forming usecases in mind that would
get rid of all the cruft that we know it doesn't work (removable would
be another one.
I am definitely not going to insist and I appreciate you are trying to
clean this up. That is highly appreciated of course.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2020-09-11 9:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand
2020-09-10 20:00 ` Dave Hansen
2020-09-10 20:31 ` David Hildenbrand
2020-09-11 7:20 ` Michal Hocko
2020-09-11 8:09 ` David Hildenbrand
2020-09-11 9:12 ` Michal Hocko [this message]
2020-09-11 10:09 ` David Hildenbrand
2020-09-11 19:24 ` Dave Hansen
2020-09-11 19:35 ` Luck, Tony
2020-09-11 19:56 ` David Hildenbrand
2020-09-11 20:09 ` Luck, Tony
2020-09-11 20:49 ` David Hildenbrand
2020-09-14 11:24 ` Michal Hocko
2020-09-14 12:14 ` David Hildenbrand
2020-09-10 20:57 ` Dave Hansen
2020-09-22 13:56 ` Gerald Schaefer
2020-09-25 14:49 ` David Hildenbrand
2020-09-25 15:00 ` Greg KH
2020-09-25 15:05 ` David Hildenbrand
2020-09-25 15:39 ` Michal Hocko
2020-09-25 15:47 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200911091252.GD7986@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gregkh@linuxfoundation.org \
--cc=hca@linux.ibm.com \
--cc=hoeppner@linux.ibm.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).