All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v1] drivers/base/memory.c: Don't store end_section_nr in memory blocks
Date: Wed, 31 Jul 2019 15:25:34 +0200	[thread overview]
Message-ID: <20190731132534.GQ9330@dhcp22.suse.cz> (raw)
In-Reply-To: <f0894c30-105a-2241-a505-7436bc15b864@redhat.com>

On Wed 31-07-19 15:12:12, David Hildenbrand wrote:
> On 31.07.19 14:43, Michal Hocko wrote:
> > On Wed 31-07-19 14:22:13, David Hildenbrand wrote:
> >> Each memory block spans the same amount of sections/pages/bytes. The size
> >> is determined before the first memory block is created. No need to store
> >> what we can easily calculate - and the calculations even look simpler now.
> > 
> > While this cleanup helps a bit, I am not sure this is really worth
> > bothering. I guess we can agree when I say that the memblock interface
> > is suboptimal (to put it mildly).  Shouldn't we strive for making it
> > a real hotplug API in the future? What do I mean by that? Why should
> > be any memblock fixed in size? Shouldn't we have use hotplugable units
> > instead (aka pfn range that userspace can work with sensibly)? Do we
> > know of any existing userspace that would depend on the current single
> > section res. 2GB sized memblocks?
> 
> Short story: It is already ABI (e.g.,
> /sys/devices/system/memory/block_size_bytes) - around since 2005 (!) -
> since we had memory block devices.
> 
> I suspect that it is mainly manually used. But I might be wrong.

Any pointer to the real userspace depending on it? Most usecases I am
aware of rely on udev events and either onlining or offlining the memory
in the handler.

I know we have documented this as an ABI and it is really _sad_ that
this ABI didn't get through normal scrutiny any user visible interface
should go through but these are sins of the past...

> Long story:
> 
> How would you want to number memory blocks? At least no longer by phys
> index. For now, memory blocks are ordered and numbered by their block id.

memory_${mem_section_nr_of_start_pfn}

> Admins might want to online parts of a DIMM MOVABLE/NORMAL, to more
> reliably use huge pages but still have enough space for kernel memory
> (e.g., page tables). They might like that a DIMM is actually a set of
> memory blocks instead of one big chunk.

They might. Do they though? There are many theoretical usecases but
let's face it, there is a cost given to the current state. E.g. the
number of memblock directories is already quite large on machines with a
lot of memory even though they use large blocks. That has negative
implications already (e.g. the number of events you get, any iteration
on the /sys etc.). Also 2G memblocks are quite arbitrary and they
already limit the above usecase some, right?

> IOW: You can consider it a restriction to add e.g., DIMMs only in one
> bigger chunks.
> 
> > 
> > All that being said, I do not oppose to the patch but can we start
> > thinking about the underlying memblock limitations rather than micro
> > cleanups?
> 
> I am pro cleaning up what we have right now, not expect it to eventually
> change some-when in the future. (btw, I highly doubt it will change)

I do agree, but having the memblock fixed size doesn't really go along
with variable memblock size if we ever go there. But as I've said I am
not really against the patch.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-07-31 13:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-31 12:22 [PATCH v1] drivers/base/memory.c: Don't store end_section_nr in memory blocks David Hildenbrand
2019-07-31 12:43 ` Michal Hocko
2019-07-31 13:12   ` David Hildenbrand
2019-07-31 13:25     ` Michal Hocko [this message]
2019-07-31 13:42       ` David Hildenbrand
2019-07-31 14:04         ` David Hildenbrand
2019-07-31 14:15           ` Michal Hocko
2019-07-31 14:23             ` David Hildenbrand
2019-07-31 14:14         ` Michal Hocko
2019-07-31 14:21           ` David Hildenbrand
2019-07-31 14:37             ` Michal Hocko
2019-07-31 14:43               ` David Hildenbrand
2019-08-01  6:13                 ` Michal Hocko
2019-08-01  7:00                   ` David Hildenbrand
2019-08-01  8:27                     ` Michal Hocko
2019-08-01  8:36                       ` David Hildenbrand
2019-07-31 20:57 ` Andrew Morton
2019-08-01  6:48   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190731132534.GQ9330@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.