* [PATCH 0/2] mm/pstore: Reserve named unspecified memory across boots
@ 2024-06-03 23:33 Steven Rostedt
2024-06-03 23:33 ` [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
2024-06-03 23:33 ` [PATCH 2/2] pstore/ramoops: Add ramoops.mem_name= command line option Steven Rostedt
0 siblings, 2 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-06-03 23:33 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes, linux-mm,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Kees Cook, Tony Luck,
Guilherme G. Piccoli, linux-hardening, Guenter Roeck,
Ross Zwisler, wklin, Vineeth Remanan Pillai, Joel Fernandes,
Suleiman Souhlal, Linus Torvalds, Catalin Marinas, Will Deacon
Reserve unspecified location of physical memory from kernel command line
Background:
In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract
dmesg output and some other information when a crash happens in the field.
(This is only done when the user selects "Allow Google to collect data for
improving the system"). But there are cases when there's a bug that
requires more data to be retrieved to figure out what is happening. We would
like to increase the pstore size, either temporarily, or maybe even
permanently. The pstore on these devices are at a fixed location in RAM (as
the RAM is not cleared on soft reboots nor crashes). The location is chosen
by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86.
There's a driver that queries for this to initialize the pstore for
ChromeOS:
See drivers/platform/chrome/chromeos_pstore.c
Problem:
The problem is that, even though there's a process to change the kernel on
these systems, and is done regularly to install updates, the firmware is
updated much less frequently. Choosing the place in RAM also takes special
care, and may be in a different address for different boards. Updating the
size via firmware is a large effort and not something that many are willing
to do for a temporary pstore size change.
Requirement:
Need a way to reserve memory that will be at a consistent location for
every boot, if the kernel and system are the same. Does not need to work
if rebooting to a different kernel, or if the system can change the
memory layout between boots.
The reserved memory can not be an hard coded address, as the same kernel /
command line needs to run on several different machines. The picked memory
reservation just needs to be the same for a given machine, but may be
different for different machines.
Solution:
The solution I have come up with is to introduce a new "reserve_mem=" kernel
command line. This parameter takes the following format:
reserve_mem=nn:align:label
Where nn is the size of memory to reserve, the align is the alignment of
that memory, and label is the way for other sub-systems to find that memory.
This way the kernel command line could have:
reserve_mem=12M:4096:oops ramoops.mem_name=oops
At boot up, the kernel will search for 12 megabytes in usable memory regions
with an alignment of 4096. It will start at the highest regions and work its
way down (for those old devices that want access to lower address DMA). When
it finds a region, it will save it off in a small table and mark it with the
"oops" label. Then the pstore ramoops sub-system could ask for that memory
and location, and it will map itself there.
This prototype allows for 8 different mappings (which may be overkill, 4 is
probably plenty) with 16 byte size to store the label.
I have tested this and it works for us to solve the above problem. We can
update the kernel and command line and increase the size of pstore without
needing to update the firmware, or knowing every memory layout of each
board. I only tested this locally, it has not been tested in the field.
Changes since the POC: https://lore.kernel.org/all/20240409210254.660888920@goodmis.org/
- Used Mike Rapoport's suggesting to use the later call to
memblock_phys_alloc() instead of messing with the e820 tables.
- No longer uses the " memmap" kernel command line and instead uses
"reserve_mem". This also removes the issue of booting a kernel without it
crashing due to "memmap" defaulting to using only the specified memory
when it doesn't know what the extra option is.
- No longer keeping the table as __initdata so that pstore can use it via
a module.
- This is no longer a proof of concept patch series.
Steven Rostedt (Google) (2):
mm/memblock: Add "reserve_mem" to reserved named memory at boot up
pstore/ramoops: Add ramoops.mem_name= command line option
----
fs/pstore/ram.c | 15 +++++++++
include/linux/mm.h | 2 ++
mm/memblock.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 114 insertions(+)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
2024-06-03 23:33 [PATCH 0/2] mm/pstore: Reserve named unspecified memory across boots Steven Rostedt
@ 2024-06-03 23:33 ` Steven Rostedt
2024-06-04 5:52 ` Kees Cook
2024-06-04 6:03 ` Ard Biesheuvel
2024-06-03 23:33 ` [PATCH 2/2] pstore/ramoops: Add ramoops.mem_name= command line option Steven Rostedt
1 sibling, 2 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-06-03 23:33 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes, linux-mm,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Kees Cook, Tony Luck,
Guilherme G. Piccoli, linux-hardening, Guenter Roeck,
Ross Zwisler, wklin, Vineeth Remanan Pillai, Joel Fernandes,
Suleiman Souhlal, Linus Torvalds, Catalin Marinas, Will Deacon,
Mike Rapoport
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
In order to allow for requesting a memory region that can be used for
things like pstore on multiple machines where the memory layout is not the
same, add a new option to the kernel command line called "reserve_mem".
The format is: reserve_mem=nn:align:name
Where it will find nn amount of memory at the given alignment of align.
The name field is to allow another subsystem to retrieve where the memory
was found. For example:
reserve_mem=12M:4096:oops ramoops.mem_name=oops
Where ramoops.mem_name will tell ramoops that memory was reserved for it
via the reserve_mem option and it can find it by calling:
if (reserve_mem_find_by_name("oops", &start, &size)) {
// start holds the start address and size holds the size given
Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/mm.h | 2 +
mm/memblock.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 99 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..b4455cc02f2c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
void vma_pgtable_walk_begin(struct vm_area_struct *vma);
void vma_pgtable_walk_end(struct vm_area_struct *vma);
+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size);
+
#endif /* _LINUX_MM_H */
diff --git a/mm/memblock.c b/mm/memblock.c
index d09136e040d3..a8bf0ee9e2b4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2244,6 +2244,103 @@ void __init memblock_free_all(void)
totalram_pages_add(pages);
}
+/* Keep a table to reserve named memory */
+#define RESERVE_MEM_MAX_ENTRIES 8
+#define RESERVE_MEM_NAME_SIZE 16
+struct reserve_mem_table {
+ char name[RESERVE_MEM_NAME_SIZE];
+ unsigned long start;
+ unsigned long size;
+};
+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
+static int reserved_mem_count;
+
+/* Add wildcard region with a lookup name */
+static int __init reserved_mem_add(unsigned long start, unsigned long size,
+ const char *name)
+{
+ struct reserve_mem_table *map;
+
+ if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
+ return -EINVAL;
+
+ if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
+ return -1;
+
+ map = &reserved_mem_table[reserved_mem_count++];
+ map->start = start;
+ map->size = size;
+ strscpy(map->name, name);
+ return 0;
+}
+
+/**
+ * reserve_mem_find_by_name - Find reserved memory region with a given name
+ * @name: The name that is attached to a reserved memory region
+ * @start: If found, holds the start address
+ * @size: If found, holds the size of the address.
+ *
+ * Returns: 1 if found or 0 if not found.
+ */
+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
+{
+ struct reserve_mem_table *map;
+ int i;
+
+ for (i = 0; i < reserved_mem_count; i++) {
+ map = &reserved_mem_table[i];
+ if (!map->size)
+ continue;
+ if (strcmp(name, map->name) == 0) {
+ *start = map->start;
+ *size = map->size;
+ return 1;
+ }
+ }
+ return 0;
+}
+
+/*
+ * Parse early_reserve_mem=nn:align:name
+ */
+static int __init reserve_mem(char *p)
+{
+ phys_addr_t start, size, align;
+ char *oldp;
+ int err;
+
+ if (!p)
+ return -EINVAL;
+
+ oldp = p;
+ size = memparse(p, &p);
+ if (p == oldp)
+ return -EINVAL;
+
+ if (*p != ':')
+ return -EINVAL;
+
+ align = memparse(p+1, &p);
+ if (*p != ':')
+ return -EINVAL;
+
+ start = memblock_phys_alloc(size, align);
+ if (!start)
+ return -ENOMEM;
+
+ p++;
+ err = reserved_mem_add(start, size, p);
+ if (err) {
+ memblock_phys_free(start, size);
+ return err;
+ }
+
+ p += strlen(p);
+
+ return *p == '\0' ? 0: -EINVAL;
+}
+__setup("reserve_mem=", reserve_mem);
+
#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
static const char * const flagname[] = {
[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] pstore/ramoops: Add ramoops.mem_name= command line option
2024-06-03 23:33 [PATCH 0/2] mm/pstore: Reserve named unspecified memory across boots Steven Rostedt
2024-06-03 23:33 ` [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
@ 2024-06-03 23:33 ` Steven Rostedt
1 sibling, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-06-03 23:33 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes, linux-mm,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Kees Cook, Tony Luck,
Guilherme G. Piccoli, linux-hardening, Guenter Roeck,
Ross Zwisler, wklin, Vineeth Remanan Pillai, Joel Fernandes,
Suleiman Souhlal, Linus Torvalds, Catalin Marinas, Will Deacon
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Add a method to find a region specified by reserve_mem=nn:align:name for
ramoops. Adding a kernel command line parameter:
reserve_mem=12M:4096:oops ramoops.mem_name=oops
Will use the size and location defined by the memmap parameter where it
finds the memory and labels it "oops". The "oops" in the ramoops option
is used to search for it.
This allows for arbitrary RAM to be used for ramoops if it is known that
the memory is not cleared on kernel crashes or soft reboots.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/pstore/ram.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index b1a455f42e93..bae8486045eb 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -50,6 +50,11 @@ module_param_hw(mem_address, ullong, other, 0400);
MODULE_PARM_DESC(mem_address,
"start of reserved RAM used to store oops/panic logs");
+static char *mem_name;
+module_param_named(mem_name, mem_name, charp, 0400);
+MODULE_PARM_DESC(mem_name,
+ "name of kernel param that holds addr (builtin only)");
+
static ulong mem_size;
module_param(mem_size, ulong, 0400);
MODULE_PARM_DESC(mem_size,
@@ -914,6 +919,16 @@ static void __init ramoops_register_dummy(void)
{
struct ramoops_platform_data pdata;
+ if (mem_name) {
+ unsigned long start;
+ unsigned long size;
+
+ if (reserve_mem_find_by_name(mem_name, &start, &size)) {
+ mem_address = start;
+ mem_size = size;
+ }
+ }
+
/*
* Prepare a dummy platform data structure to carry the module
* parameters. If mem_size isn't set, then there are no module
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
2024-06-03 23:33 ` [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
@ 2024-06-04 5:52 ` Kees Cook
2024-06-04 10:57 ` Steven Rostedt
2024-06-04 6:03 ` Ard Biesheuvel
1 sibling, 1 reply; 9+ messages in thread
From: Kees Cook @ 2024-06-04 5:52 UTC (permalink / raw)
To: Steven Rostedt, linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes, linux-mm,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Kees Cook, Tony Luck,
Guilherme G. Piccoli, linux-hardening, Guenter Roeck,
Ross Zwisler, wklin, Vineeth Remanan Pillai, Joel Fernandes,
Suleiman Souhlal, Linus Torvalds, Catalin Marinas, Will Deacon,
Mike Rapoport, ardb
On June 3, 2024 4:33:31 PM PDT, Steven Rostedt <rostedt@goodmis.org> wrote:
>From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
>In order to allow for requesting a memory region that can be used for
>things like pstore on multiple machines where the memory layout is not the
>same, add a new option to the kernel command line called "reserve_mem".
>
>The format is: reserve_mem=nn:align:name
>
>Where it will find nn amount of memory at the given alignment of align.
>The name field is to allow another subsystem to retrieve where the memory
>was found. For example:
>
> reserve_mem=12M:4096:oops ramoops.mem_name=oops
How does this interact with KASLR? It has chosen its physical location before this parsing happens, so I'd expect this to fail once in a while, unless the size/alignment is lucky enough that KASLR never uses that portion of the physical memory...
-Kees
>
>Where ramoops.mem_name will tell ramoops that memory was reserved for it
>via the reserve_mem option and it can find it by calling:
>
> if (reserve_mem_find_by_name("oops", &start, &size)) {
> // start holds the start address and size holds the size given
>
>Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/
>
>Suggested-by: Mike Rapoport <rppt@kernel.org>
>Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>---
> include/linux/mm.h | 2 +
> mm/memblock.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 99 insertions(+)
>
>diff --git a/include/linux/mm.h b/include/linux/mm.h
>index 9849dfda44d4..b4455cc02f2c 100644
>--- a/include/linux/mm.h
>+++ b/include/linux/mm.h
>@@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
> void vma_pgtable_walk_begin(struct vm_area_struct *vma);
> void vma_pgtable_walk_end(struct vm_area_struct *vma);
>
>+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size);
>+
> #endif /* _LINUX_MM_H */
>diff --git a/mm/memblock.c b/mm/memblock.c
>index d09136e040d3..a8bf0ee9e2b4 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -2244,6 +2244,103 @@ void __init memblock_free_all(void)
> totalram_pages_add(pages);
> }
>
>+/* Keep a table to reserve named memory */
>+#define RESERVE_MEM_MAX_ENTRIES 8
>+#define RESERVE_MEM_NAME_SIZE 16
>+struct reserve_mem_table {
>+ char name[RESERVE_MEM_NAME_SIZE];
>+ unsigned long start;
>+ unsigned long size;
>+};
>+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
>+static int reserved_mem_count;
>+
>+/* Add wildcard region with a lookup name */
>+static int __init reserved_mem_add(unsigned long start, unsigned long size,
>+ const char *name)
>+{
>+ struct reserve_mem_table *map;
>+
>+ if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
>+ return -EINVAL;
>+
>+ if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
>+ return -1;
>+
>+ map = &reserved_mem_table[reserved_mem_count++];
>+ map->start = start;
>+ map->size = size;
>+ strscpy(map->name, name);
>+ return 0;
>+}
>+
>+/**
>+ * reserve_mem_find_by_name - Find reserved memory region with a given name
>+ * @name: The name that is attached to a reserved memory region
>+ * @start: If found, holds the start address
>+ * @size: If found, holds the size of the address.
>+ *
>+ * Returns: 1 if found or 0 if not found.
>+ */
>+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
>+{
>+ struct reserve_mem_table *map;
>+ int i;
>+
>+ for (i = 0; i < reserved_mem_count; i++) {
>+ map = &reserved_mem_table[i];
>+ if (!map->size)
>+ continue;
>+ if (strcmp(name, map->name) == 0) {
>+ *start = map->start;
>+ *size = map->size;
>+ return 1;
>+ }
>+ }
>+ return 0;
>+}
>+
>+/*
>+ * Parse early_reserve_mem=nn:align:name
>+ */
>+static int __init reserve_mem(char *p)
>+{
>+ phys_addr_t start, size, align;
>+ char *oldp;
>+ int err;
>+
>+ if (!p)
>+ return -EINVAL;
>+
>+ oldp = p;
>+ size = memparse(p, &p);
>+ if (p == oldp)
>+ return -EINVAL;
>+
>+ if (*p != ':')
>+ return -EINVAL;
>+
>+ align = memparse(p+1, &p);
>+ if (*p != ':')
>+ return -EINVAL;
>+
>+ start = memblock_phys_alloc(size, align);
>+ if (!start)
>+ return -ENOMEM;
>+
>+ p++;
>+ err = reserved_mem_add(start, size, p);
>+ if (err) {
>+ memblock_phys_free(start, size);
>+ return err;
>+ }
>+
>+ p += strlen(p);
>+
>+ return *p == '\0' ? 0: -EINVAL;
>+}
>+__setup("reserve_mem=", reserve_mem);
>+
> #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> static const char * const flagname[] = {
> [ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
--
Kees Cook
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
2024-06-03 23:33 ` [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
2024-06-04 5:52 ` Kees Cook
@ 2024-06-04 6:03 ` Ard Biesheuvel
2024-06-04 11:08 ` Steven Rostedt
1 sibling, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2024-06-04 6:03 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Liam R. Howlett,
Vlastimil Babka, Lorenzo Stoakes, linux-mm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Kees Cook, Tony Luck, Guilherme G. Piccoli,
linux-hardening, Guenter Roeck, Ross Zwisler, wklin,
Vineeth Remanan Pillai, Joel Fernandes, Suleiman Souhlal,
Linus Torvalds, Catalin Marinas, Will Deacon, Mike Rapoport
On Tue, 4 Jun 2024 at 01:35, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
> In order to allow for requesting a memory region that can be used for
> things like pstore on multiple machines where the memory layout is not the
> same, add a new option to the kernel command line called "reserve_mem".
>
> The format is: reserve_mem=nn:align:name
>
> Where it will find nn amount of memory at the given alignment of align.
> The name field is to allow another subsystem to retrieve where the memory
> was found. For example:
>
> reserve_mem=12M:4096:oops ramoops.mem_name=oops
>
> Where ramoops.mem_name will tell ramoops that memory was reserved for it
> via the reserve_mem option and it can find it by calling:
>
> if (reserve_mem_find_by_name("oops", &start, &size)) {
> // start holds the start address and size holds the size given
>
> Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/
>
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
You failed to point out in the commit message that the assumption here
is that this memory will retain its contents across a soft reboot. Or
am I misunderstanding this?
In any case, as I pointed out before, playing these games unilaterally
from the kernel side, i.e., without any awareness whatsoever from the
firmware and bootloader (which will not attempt to preserve RAM
contents), is likely to have a rather disappointing success ratio in
the general case. I understand this may be different for vertically
integrated software stacks like ChromeOS so perhaps it should live
there as a feature.
Then, as Kees points out, there is also the risk that the kernel
itself may be stepping on this memory before having realized that it
is reserved. At least ARM and x86 have decompressors with a
substantial amount of non-trivial placement logic that would need to
be made aware of this reservation. Note that EFI vs. non-EFI boot also
makes a difference here.
> ---
> include/linux/mm.h | 2 +
> mm/memblock.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 99 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9849dfda44d4..b4455cc02f2c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
> void vma_pgtable_walk_begin(struct vm_area_struct *vma);
> void vma_pgtable_walk_end(struct vm_area_struct *vma);
>
> +int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size);
> +
> #endif /* _LINUX_MM_H */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index d09136e040d3..a8bf0ee9e2b4 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2244,6 +2244,103 @@ void __init memblock_free_all(void)
> totalram_pages_add(pages);
> }
>
> +/* Keep a table to reserve named memory */
> +#define RESERVE_MEM_MAX_ENTRIES 8
> +#define RESERVE_MEM_NAME_SIZE 16
> +struct reserve_mem_table {
> + char name[RESERVE_MEM_NAME_SIZE];
> + unsigned long start;
> + unsigned long size;
> +};
> +static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
> +static int reserved_mem_count;
> +
> +/* Add wildcard region with a lookup name */
> +static int __init reserved_mem_add(unsigned long start, unsigned long size,
> + const char *name)
> +{
> + struct reserve_mem_table *map;
> +
> + if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
> + return -EINVAL;
> +
> + if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
> + return -1;
> +
> + map = &reserved_mem_table[reserved_mem_count++];
> + map->start = start;
> + map->size = size;
> + strscpy(map->name, name);
> + return 0;
> +}
> +
> +/**
> + * reserve_mem_find_by_name - Find reserved memory region with a given name
> + * @name: The name that is attached to a reserved memory region
> + * @start: If found, holds the start address
> + * @size: If found, holds the size of the address.
> + *
> + * Returns: 1 if found or 0 if not found.
> + */
> +int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
> +{
> + struct reserve_mem_table *map;
> + int i;
> +
> + for (i = 0; i < reserved_mem_count; i++) {
> + map = &reserved_mem_table[i];
> + if (!map->size)
> + continue;
> + if (strcmp(name, map->name) == 0) {
> + *start = map->start;
> + *size = map->size;
> + return 1;
> + }
> + }
> + return 0;
> +}
> +
> +/*
> + * Parse early_reserve_mem=nn:align:name
> + */
> +static int __init reserve_mem(char *p)
> +{
> + phys_addr_t start, size, align;
> + char *oldp;
> + int err;
> +
> + if (!p)
> + return -EINVAL;
> +
> + oldp = p;
> + size = memparse(p, &p);
> + if (p == oldp)
> + return -EINVAL;
> +
> + if (*p != ':')
> + return -EINVAL;
> +
> + align = memparse(p+1, &p);
> + if (*p != ':')
> + return -EINVAL;
> +
> + start = memblock_phys_alloc(size, align);
> + if (!start)
> + return -ENOMEM;
> +
> + p++;
> + err = reserved_mem_add(start, size, p);
> + if (err) {
> + memblock_phys_free(start, size);
> + return err;
> + }
> +
> + p += strlen(p);
> +
> + return *p == '\0' ? 0: -EINVAL;
> +}
> +__setup("reserve_mem=", reserve_mem);
> +
> #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> static const char * const flagname[] = {
> [ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
> --
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
2024-06-04 5:52 ` Kees Cook
@ 2024-06-04 10:57 ` Steven Rostedt
0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-06-04 10:57 UTC (permalink / raw)
To: Kees Cook
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Liam R. Howlett,
Vlastimil Babka, Lorenzo Stoakes, linux-mm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Kees Cook, Tony Luck, Guilherme G. Piccoli,
linux-hardening, Guenter Roeck, Ross Zwisler, wklin,
Vineeth Remanan Pillai, Joel Fernandes, Suleiman Souhlal,
Linus Torvalds, Catalin Marinas, Will Deacon, Mike Rapoport, ardb
On Mon, 03 Jun 2024 22:52:37 -0700
Kees Cook <kees@kernel.org> wrote:
> On June 3, 2024 4:33:31 PM PDT, Steven Rostedt <rostedt@goodmis.org> wrote:
> >From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> >
> >In order to allow for requesting a memory region that can be used for
> >things like pstore on multiple machines where the memory layout is not the
> >same, add a new option to the kernel command line called "reserve_mem".
> >
> >The format is: reserve_mem=nn:align:name
> >
> >Where it will find nn amount of memory at the given alignment of align.
> >The name field is to allow another subsystem to retrieve where the memory
> >was found. For example:
> >
> > reserve_mem=12M:4096:oops ramoops.mem_name=oops
>
> How does this interact with KASLR? It has chosen its physical location
> before this parsing happens, so I'd expect this to fail once in a while,
> unless the size/alignment is lucky enough that KASLR never uses that
> portion of the physical memory...
>
From looking at the KASLR code, it looks to me that it picks from 100
different locations. I could be wrong, but if you have sufficient memory,
I'm thinking that it should not conflict. But if it does, yes, it will fail
to pick the same location.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
2024-06-04 6:03 ` Ard Biesheuvel
@ 2024-06-04 11:08 ` Steven Rostedt
2024-06-04 16:05 ` Luck, Tony
0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2024-06-04 11:08 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Liam R. Howlett,
Vlastimil Babka, Lorenzo Stoakes, linux-mm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Kees Cook, Tony Luck, Guilherme G. Piccoli,
linux-hardening, Guenter Roeck, Ross Zwisler, wklin,
Vineeth Remanan Pillai, Joel Fernandes, Suleiman Souhlal,
Linus Torvalds, Catalin Marinas, Will Deacon, Mike Rapoport
On Tue, 4 Jun 2024 08:03:54 +0200
Ard Biesheuvel <ardb@kernel.org> wrote:
> On Tue, 4 Jun 2024 at 01:35, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> >
> > In order to allow for requesting a memory region that can be used for
> > things like pstore on multiple machines where the memory layout is not the
> > same, add a new option to the kernel command line called "reserve_mem".
> >
> > The format is: reserve_mem=nn:align:name
> >
> > Where it will find nn amount of memory at the given alignment of align.
> > The name field is to allow another subsystem to retrieve where the memory
> > was found. For example:
> >
> > reserve_mem=12M:4096:oops ramoops.mem_name=oops
> >
> > Where ramoops.mem_name will tell ramoops that memory was reserved for it
> > via the reserve_mem option and it can find it by calling:
> >
> > if (reserve_mem_find_by_name("oops", &start, &size)) {
> > // start holds the start address and size holds the size given
> >
> > Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/
> >
> > Suggested-by: Mike Rapoport <rppt@kernel.org>
> > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>
> You failed to point out in the commit message that the assumption here
> is that this memory will retain its contents across a soft reboot. Or
> am I misunderstanding this?
Yes that is the intention. I should update the commit message.
>
> In any case, as I pointed out before, playing these games unilaterally
> from the kernel side, i.e., without any awareness whatsoever from the
> firmware and bootloader (which will not attempt to preserve RAM
> contents), is likely to have a rather disappointing success ratio in
> the general case. I understand this may be different for vertically
> integrated software stacks like ChromeOS so perhaps it should live
> there as a feature.
I have been using this on two different test machines, as well as a
chromebook, and it appears to work on all ofthem. As well as for VMs. I
plan on adding this to my workstation and server too (they use EFI).
>
> Then, as Kees points out, there is also the risk that the kernel
> itself may be stepping on this memory before having realized that it
> is reserved. At least ARM and x86 have decompressors with a
> substantial amount of non-trivial placement logic that would need to
> be made aware of this reservation. Note that EFI vs. non-EFI boot also
> makes a difference here.
Agreed. Note, it should definitely state that this is not 100% reliable,
and depending on the setup it may not be reliable at all. Whatever uses it
should add something to confirm that the memory is the same.
If corner cases become an issue, this could be extended to work with them.
We could update KASLR to be aware of this allocation. The documentation
update to kernel-parameters.txt on this usage should definitely stress that
this can be unreliable, and use should be tested to see if it works. And
also stress that if it does work, it may not work all the time. The best
usage for this is for statistical debugging. For instance, in our use case,
we have 1000s of crashes that we have no idea why. If this worked only 10%
of the time, the data retrieved from 100 of those crashes would be very
valuable.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
2024-06-04 11:08 ` Steven Rostedt
@ 2024-06-04 16:05 ` Luck, Tony
2024-06-06 14:50 ` Steven Rostedt
0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2024-06-04 16:05 UTC (permalink / raw)
To: Steven Rostedt, Ard Biesheuvel
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
linux-mm@kvack.org, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, H. Peter Anvin, Peter Zijlstra,
Kees Cook, Guilherme G. Piccoli, linux-hardening@vger.kernel.org,
Guenter Roeck, Ross Zwisler, wklin@google.com,
Vineeth Remanan Pillai, Joel Fernandes, Suleiman Souhlal,
Linus Torvalds, Catalin Marinas, Will Deacon, Mike Rapoport
> I have been using this on two different test machines, as well as a
> chromebook, and it appears to work on all ofthem. As well as for VMs. I
> plan on adding this to my workstation and server too (they use EFI).
I think that BIOS on Intel servers with ECC memory will stomp on all
memory (to ensure that ECC bits are all set to good values). There
might be a "fast boot" BIOS option to skip this (but using it leaves you
vulnerable after a crash due to ECC fail to hit the same error again).
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
2024-06-04 16:05 ` Luck, Tony
@ 2024-06-06 14:50 ` Steven Rostedt
0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-06-06 14:50 UTC (permalink / raw)
To: Luck, Tony
Cc: Ard Biesheuvel, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, Masami Hiramatsu,
Mark Rutland, Mathieu Desnoyers, Andrew Morton, Liam R. Howlett,
Vlastimil Babka, Lorenzo Stoakes, linux-mm@kvack.org,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, Peter Zijlstra, Kees Cook,
Guilherme G. Piccoli, linux-hardening@vger.kernel.org,
Guenter Roeck, Ross Zwisler, wklin@google.com,
Vineeth Remanan Pillai, Joel Fernandes, Suleiman Souhlal,
Linus Torvalds, Catalin Marinas, Will Deacon, Mike Rapoport
On Tue, 4 Jun 2024 16:05:04 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:
> > I have been using this on two different test machines, as well as a
> > chromebook, and it appears to work on all ofthem. As well as for VMs. I
> > plan on adding this to my workstation and server too (they use EFI).
>
> I think that BIOS on Intel servers with ECC memory will stomp on all
> memory (to ensure that ECC bits are all set to good values). There
> might be a "fast boot" BIOS option to skip this (but using it leaves you
> vulnerable after a crash due to ECC fail to hit the same error again).
>
Talking with some people that are interested in this, they told me that
those servers (the ones that take several minutes to boot up) usually
use kexec to reboot. Even after a crash (with or without kdump). In
those cases, they said this would likely work for them.
Again, this isn't fool proof nor guaranteed. It's a best effort
approach that, at least for my use case, works most of the time.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-06-06 14:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 23:33 [PATCH 0/2] mm/pstore: Reserve named unspecified memory across boots Steven Rostedt
2024-06-03 23:33 ` [PATCH 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
2024-06-04 5:52 ` Kees Cook
2024-06-04 10:57 ` Steven Rostedt
2024-06-04 6:03 ` Ard Biesheuvel
2024-06-04 11:08 ` Steven Rostedt
2024-06-04 16:05 ` Luck, Tony
2024-06-06 14:50 ` Steven Rostedt
2024-06-03 23:33 ` [PATCH 2/2] pstore/ramoops: Add ramoops.mem_name= command line option Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).