All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v1] drivers/base/memory.c: Don't store end_section_nr in memory blocks
Date: Thu, 1 Aug 2019 09:00:45 +0200	[thread overview]
Message-ID: <f8767e9a-034d-dca6-05e6-dc6bbcb4d005@redhat.com> (raw)
In-Reply-To: <20190801061344.GA11627@dhcp22.suse.cz>

On 01.08.19 08:13, Michal Hocko wrote:
> On Wed 31-07-19 16:43:58, David Hildenbrand wrote:
>> On 31.07.19 16:37, Michal Hocko wrote:
>>> On Wed 31-07-19 16:21:46, David Hildenbrand wrote:
>>> [...]
>>>>> Thinking about it some more, I believe that we can reasonably provide
>>>>> both APIs controlable by a command line parameter for backwards
>>>>> compatibility. It is the hotplug code to control sysfs APIs.  E.g.
>>>>> create one sysfs entry per add_memory_resource for the new semantic.
>>>>
>>>> Yeah, but the real question is: who needs it. I can only think about
>>>> some DIMM scenarios (some, not all). I would be interested in more use
>>>> cases. Of course, to provide and maintain two APIs we need a good reason.
>>>
>>> Well, my 3TB machine that has 7 movable nodes could really go with less
>>> than
>>> $ find /sys/devices/system/memory -name "memory*" | wc -l
>>> 1729>
>>
>> The question is if it would be sufficient to increase the memory block
>> size even further for these kinds of systems (e.g., via a boot parameter
>> - I think we have that on uv systems) instead of having blocks of
>> different sizes. Say, 128GB blocks because you're not going to hotplug
>> 128MB DIMMs into such a system - at least that's my guess ;)
> 
> The system has
> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x10000000000-0x17fffffffff]
> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x80000000000-0x87fffffffff]
> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x90000000000-0x97fffffffff]
> [    0.000000] ACPI: SRAT: Node 4 PXM 4 [mem 0x100000000000-0x107fffffffff]
> [    0.000000] ACPI: SRAT: Node 5 PXM 5 [mem 0x110000000000-0x117fffffffff]
> [    0.000000] ACPI: SRAT: Node 6 PXM 6 [mem 0x180000000000-0x183fffffffff]
> [    0.000000] ACPI: SRAT: Node 7 PXM 7 [mem 0x190000000000-0x191fffffffff]
> 
> hotplugable memory. I would love to have those 7 memory blocks to work
> with. Any smaller grained split is just not helping as the platform will
> not be able to hotremove it anyway.
> 

So the smallest granularity in your system is indeed 128GB (btw, nice
system, I wish I had something like that), the biggest one 512GB.

Using a memory block size of 128GB would imply on a 3TB system 24 memory
blocks - which is tolerable IMHO. Especially, performance-wise there
shouldn't be a real difference to 7 blocks. Hotunplug triggered via ACPI
will take care of offlining the right DIMMs.

Of course, 7 blocks would be nicer, but as discussed, not possible with
the current ABI.

What we could do right now is finally make "cat
/sys/devices/system/memory/memory99/phys_device" indicate on x86-64 to
which DIMM an added memory range belongs (if applicable). For now, it's
only used on s390x. We could store for each memory block the
"phys_index" - a.k.a. section number of the lowest memory block of a
add_memory() range.

This would at least allow user space to identify all memory blocks that
logically belong together (DIMM) without ABI changes.

-- 

Thanks,

David / dhildenb


  reply	other threads:[~2019-08-01  7:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-31 12:22 [PATCH v1] drivers/base/memory.c: Don't store end_section_nr in memory blocks David Hildenbrand
2019-07-31 12:43 ` Michal Hocko
2019-07-31 13:12   ` David Hildenbrand
2019-07-31 13:25     ` Michal Hocko
2019-07-31 13:42       ` David Hildenbrand
2019-07-31 14:04         ` David Hildenbrand
2019-07-31 14:15           ` Michal Hocko
2019-07-31 14:23             ` David Hildenbrand
2019-07-31 14:14         ` Michal Hocko
2019-07-31 14:21           ` David Hildenbrand
2019-07-31 14:37             ` Michal Hocko
2019-07-31 14:43               ` David Hildenbrand
2019-08-01  6:13                 ` Michal Hocko
2019-08-01  7:00                   ` David Hildenbrand [this message]
2019-08-01  8:27                     ` Michal Hocko
2019-08-01  8:36                       ` David Hildenbrand
2019-07-31 20:57 ` Andrew Morton
2019-08-01  6:48   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8767e9a-034d-dca6-05e6-dc6bbcb4d005@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.