linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named  memory at boot up
@ 2024-06-13 15:55 Steven Rostedt
  2024-06-13 15:55 ` [PATCH v6 1/2] " Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Steven Rostedt @ 2024-06-13 15:55 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells,
	Mike Rapoport

Reserve unspecified location of physical memory from kernel command line

Background:

In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract
dmesg output and some other information when a crash happens in the field.
(This is only done when the user selects "Allow Google to collect data for
 improving the system"). But there are cases when there's a bug that
requires more data to be retrieved to figure out what is happening. We would
like to increase the pstore size, either temporarily, or maybe even
permanently. The pstore on these devices are at a fixed location in RAM (as
the RAM is not cleared on soft reboots nor crashes). The location is chosen
by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86.
There's a driver that queries for this to initialize the pstore for
ChromeOS:

  See drivers/platform/chrome/chromeos_pstore.c

Problem:

The problem is that, even though there's a process to change the kernel on
these systems, and is done regularly to install updates, the firmware is
updated much less frequently. Choosing the place in RAM also takes special
care, and may be in a different address for different boards. Updating the
size via firmware is a large effort and not something that many are willing
to do for a temporary pstore size change.

Requirement:

Need a way to reserve memory that will be at a consistent location for
every boot, if the kernel and system are the same. Does not need to work
if rebooting to a different kernel, or if the system can change the
memory layout between boots.

The reserved memory can not be an hard coded address, as the same kernel /
command line needs to run on several different machines. The picked memory
reservation just needs to be the same for a given machine, but may be
different for different machines.

Solution:

The solution I have come up with is to introduce a new "reserve_mem=" kernel
command line. This parameter takes the following format:

  reserve_mem=nn:align:label

Where nn is the size of memory to reserve, the align is the alignment of
that memory, and label is the way for other sub-systems to find that memory.
This way the kernel command line could have:

  reserve_mem=12M:4096:oops   ramoops.mem_name=oops

At boot up, the kernel will search for 12 megabytes in usable memory regions
with an alignment of 4096. It will start at the highest regions and work its
way down (for those old devices that want access to lower address DMA). When
it finds a region, it will save it off in a small table and mark it with the
"oops" label. Then the pstore ramoops sub-system could ask for that memory
and location, and it will map itself there.

This prototype allows for 8 different mappings (which may be overkill, 4 is
probably plenty) with 16 byte size to store the label.

I have tested this and it works for us to solve the above problem. We can
update the kernel and command line and increase the size of pstore without
needing to update the firmware, or knowing every memory layout of each
board. I only tested this locally, it has not been tested in the field.

Changes since v5: https://lore.kernel.org/all/20240613003435.401549779@goodmis.org/

[ patch at bottom showing differences ]

- Stressed more that this is a best effort use case

- Updated ramoops.rst to document this new feature

- Used a new variable "tmp" to use in reserve_mem_find_by_name() instead
  of using "size" and possibly corrupting it.

Changes since v4: https://lore.kernel.org/all/20240611215610.548854415@goodmis.org/

- Add all checks about reserve_mem before allocation.
  This means reserved_mem_add() is now a void function.

- Check for name duplications.

- Fix compare of align to SMP_CACHE_BYTES ("<" instead of "<=")

Changes since v3: https://lore.kernel.org/all/20240611144911.327227285@goodmis.org/

- Changed table type of start and size from unsigned long to phys_addr_t
  (as well as the parameters to the functions that use them)

- Changed old reference to "early_reserve_mem" to "reserve_mem"

- Check before reservering memory:
  o Size is non-zero
  o name has text in it

- If align is less than SMP_CACHE_BYTES, make it SMP_CACHE_BYTES

- Remove the silly check of testing *p == '\0' after a p += strlen(p)

Changes since v2: https://lore.kernel.org/all/20240606150143.876469296@goodmis.org/

- Fixed typo of "reserver"

- Added EXPORT_SYMBOL_GPL() for reserve_mem_find_by_name()

- Removed "built-in" from module description that was changed from v1.


Changes since v1: https://lore.kernel.org/all/20240603233330.801075898@goodmis.org/

- Updated the change log of the first patch as well as added an entry
  into kernel-parameters.txt about how reserve_mem is for soft reboots
  and may not be reliable. 


Steven Rostedt (Google) (2):
      mm/memblock: Add "reserve_mem" to reserved named memory at boot up
      pstore/ramoops: Add ramoops.mem_name= command line option

----
 Documentation/admin-guide/kernel-parameters.txt |  22 +++++
 Documentation/admin-guide/ramoops.rst           |  13 +++
 fs/pstore/ram.c                                 |  14 +++
 include/linux/mm.h                              |   2 +
 mm/memblock.c                                   | 117 ++++++++++++++++++++++++
 5 files changed, 168 insertions(+)


diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ce7de8136f2f..56e18b1a520d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5717,9 +5717,11 @@
 			used for systems that do not wipe the RAM, and this command
 			line will try to reserve the same physical memory on
 			soft reboots. Note, it is not guaranteed to be the same
-			location. For example, if KASLR places the kernel at the
-			location of where the RAM reservation was from a previous
-			boot, the new reservation will be at a different location.
+			location. For example, if anything about the system changes
+			or if booting a different kernel. It can also fail if KASLR
+			places the kernel at the location of where the RAM reservation
+			was from a previous boot, the new reservation will be at a
+			different location.
 			Any subsystem using this feature must add a way to verify
 			that the contents of the physical memory is from a previous
 			boot, as there may be cases where the memory will not be
diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
index e9f85142182d..6f534a707b2a 100644
--- a/Documentation/admin-guide/ramoops.rst
+++ b/Documentation/admin-guide/ramoops.rst
@@ -23,6 +23,8 @@ and type of the memory area are set using three variables:
   * ``mem_size`` for the size. The memory size will be rounded down to a
     power of two.
   * ``mem_type`` to specify if the memory type (default is pgprot_writecombine).
+  * ``mem_name`` to specify a memory region defined by ``reserve_mem`` command
+    line parameter.
 
 Typically the default value of ``mem_type=0`` should be used as that sets the pstore
 mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
@@ -118,6 +120,17 @@ Setting the ramoops parameters can be done in several different manners:
 	return ret;
   }
 
+ D. Using a region of memory reserved via ``reserve_mem`` command line
+    parameter. The address and size will be defined by the ``reserve_mem``
+    parameter. Note, that ``reserve_mem`` may not always allocate memory
+    in the same location, and cannot be relied upon. Testing will need
+    to be done, and it may not work on every machine, nor every kernel.
+    Consider this a "best effort" approach. The ``reserve_mem`` option
+    takes a size, alignment and name as arguments. The name is used
+    to map the memory to a label that can be retrieved by ramoops.
+
+	reserver_mem=2M:4096:oops  ramoops.mem_name=oops
+
 You can specify either RAM memory or peripheral devices' memory. However, when
 specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
 very early in the architecture code, e.g.::
diff --git a/mm/memblock.c b/mm/memblock.c
index 739d106a9165..b7b0e8c3868d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2301,7 +2301,7 @@ EXPORT_SYMBOL_GPL(reserve_mem_find_by_name);
  */
 static int __init reserve_mem(char *p)
 {
-	phys_addr_t start, size, align;
+	phys_addr_t start, size, align, tmp;
 	char *name;
 	char *oldp;
 	int len;
@@ -2347,8 +2347,8 @@ static int __init reserve_mem(char *p)
 	if (!*p)
 		return -EINVAL;
 
-	/* Make sure the name is not already used (size is only updated if found) */
-	if (reserve_mem_find_by_name(name, &start, &size))
+	/* Make sure the name is not already used */
+	if (reserve_mem_find_by_name(name, &start, &tmp))
 		return -EBUSY;
 
 	start = memblock_phys_alloc(size, align);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-13 15:55 [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
@ 2024-06-13 15:55 ` Steven Rostedt
  2024-06-18 12:55   ` Zhengyejian
  2024-06-13 15:55 ` [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2024-06-13 15:55 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells,
	Mike Rapoport, Guilherme G. Piccoli

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

In order to allow for requesting a memory region that can be used for
things like pstore on multiple machines where the memory layout is not the
same, add a new option to the kernel command line called "reserve_mem".

The format is:  reserve_mem=nn:align:name

Where it will find nn amount of memory at the given alignment of align.
The name field is to allow another subsystem to retrieve where the memory
was found. For example:

  reserve_mem=12M:4096:oops ramoops.mem_name=oops

Where ramoops.mem_name will tell ramoops that memory was reserved for it
via the reserve_mem option and it can find it by calling:

  if (reserve_mem_find_by_name("oops", &start, &size)) {
	// start holds the start address and size holds the size given

This is typically used for systems that do not wipe the RAM, and this
command line will try to reserve the same physical memory on soft reboots.
Note, it is not guaranteed to be the same location. For example, if KASLR
places the kernel at the location of where the RAM reservation was from a
previous boot, the new reservation will be at a different location.  Any
subsystem using this feature must add a way to verify that the contents of
the physical memory is from a previous boot, as there may be cases where
the memory will not be located at the same location.

Not all systems may work either. There could be bit flips if the reboot
goes through the BIOS. Using kexec to reboot the machine is likely to
have better results in such cases.

Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/

Suggested-by: Mike Rapoport <rppt@kernel.org>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 .../admin-guide/kernel-parameters.txt         |  22 ++++
 include/linux/mm.h                            |   2 +
 mm/memblock.c                                 | 117 ++++++++++++++++++
 3 files changed, 141 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b600df82669d..56e18b1a520d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5710,6 +5710,28 @@
 			them.  If <base> is less than 0x10000, the region
 			is assumed to be I/O ports; otherwise it is memory.
 
+	reserve_mem=	[RAM]
+			Format: nn[KNG]:<align>:<label>
+			Reserve physical memory and label it with a name that
+			other subsystems can use to access it. This is typically
+			used for systems that do not wipe the RAM, and this command
+			line will try to reserve the same physical memory on
+			soft reboots. Note, it is not guaranteed to be the same
+			location. For example, if anything about the system changes
+			or if booting a different kernel. It can also fail if KASLR
+			places the kernel at the location of where the RAM reservation
+			was from a previous boot, the new reservation will be at a
+			different location.
+			Any subsystem using this feature must add a way to verify
+			that the contents of the physical memory is from a previous
+			boot, as there may be cases where the memory will not be
+			located at the same location.
+
+			The format is size:align:label for example, to request
+			12 megabytes of 4096 alignment for ramoops:
+
+			reserve_mem=12M:4096:oops ramoops.mem_name=oops
+
 	reservetop=	[X86-32,EARLY]
 			Format: nn[KMG]
 			Reserves a hole at the top of the kernel virtual
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..077fb589b88a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
 void vma_pgtable_walk_begin(struct vm_area_struct *vma);
 void vma_pgtable_walk_end(struct vm_area_struct *vma);
 
+int reserve_mem_find_by_name(const char *name, phys_addr_t *start, phys_addr_t *size);
+
 #endif /* _LINUX_MM_H */
diff --git a/mm/memblock.c b/mm/memblock.c
index d09136e040d3..b7b0e8c3868d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2244,6 +2244,123 @@ void __init memblock_free_all(void)
 	totalram_pages_add(pages);
 }
 
+/* Keep a table to reserve named memory */
+#define RESERVE_MEM_MAX_ENTRIES		8
+#define RESERVE_MEM_NAME_SIZE		16
+struct reserve_mem_table {
+	char			name[RESERVE_MEM_NAME_SIZE];
+	phys_addr_t		start;
+	phys_addr_t		size;
+};
+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
+static int reserved_mem_count;
+
+/* Add wildcard region with a lookup name */
+static void __init reserved_mem_add(phys_addr_t start, phys_addr_t size,
+				   const char *name)
+{
+	struct reserve_mem_table *map;
+
+	map = &reserved_mem_table[reserved_mem_count++];
+	map->start = start;
+	map->size = size;
+	strscpy(map->name, name);
+}
+
+/**
+ * reserve_mem_find_by_name - Find reserved memory region with a given name
+ * @name: The name that is attached to a reserved memory region
+ * @start: If found, holds the start address
+ * @size: If found, holds the size of the address.
+ *
+ * @start and @size are only updated if @name is found.
+ *
+ * Returns: 1 if found or 0 if not found.
+ */
+int reserve_mem_find_by_name(const char *name, phys_addr_t *start, phys_addr_t *size)
+{
+	struct reserve_mem_table *map;
+	int i;
+
+	for (i = 0; i < reserved_mem_count; i++) {
+		map = &reserved_mem_table[i];
+		if (!map->size)
+			continue;
+		if (strcmp(name, map->name) == 0) {
+			*start = map->start;
+			*size = map->size;
+			return 1;
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(reserve_mem_find_by_name);
+
+/*
+ * Parse reserve_mem=nn:align:name
+ */
+static int __init reserve_mem(char *p)
+{
+	phys_addr_t start, size, align, tmp;
+	char *name;
+	char *oldp;
+	int len;
+
+	if (!p)
+		return -EINVAL;
+
+	/* Check if there's room for more reserved memory */
+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
+		return -EBUSY;
+
+	oldp = p;
+	size = memparse(p, &p);
+	if (!size || p == oldp)
+		return -EINVAL;
+
+	if (*p != ':')
+		return -EINVAL;
+
+	align = memparse(p+1, &p);
+	if (*p != ':')
+		return -EINVAL;
+
+	/*
+	 * memblock_phys_alloc() doesn't like a zero size align,
+	 * but it is OK for this command to have it.
+	 */
+	if (align < SMP_CACHE_BYTES)
+		align = SMP_CACHE_BYTES;
+
+	name = p + 1;
+	len = strlen(name);
+
+	/* name needs to have length but not too big */
+	if (!len || len >= RESERVE_MEM_NAME_SIZE)
+		return -EINVAL;
+
+	/* Make sure that name has text */
+	for (p = name; *p; p++) {
+		if (!isspace(*p))
+			break;
+	}
+	if (!*p)
+		return -EINVAL;
+
+	/* Make sure the name is not already used */
+	if (reserve_mem_find_by_name(name, &start, &tmp))
+		return -EBUSY;
+
+	start = memblock_phys_alloc(size, align);
+	if (!start)
+		return -ENOMEM;
+
+	reserved_mem_add(start, size, name);
+
+	return 0;
+}
+__setup("reserve_mem=", reserve_mem);
+
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
 static const char * const flagname[] = {
 	[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option
  2024-06-13 15:55 [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
  2024-06-13 15:55 ` [PATCH v6 1/2] " Steven Rostedt
@ 2024-06-13 15:55 ` Steven Rostedt
  2024-06-13 18:19   ` Guilherme G. Piccoli
  2024-06-13 16:54 ` [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Alexander Graf
  2024-06-19 21:53 ` Mike Rapoport
  3 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2024-06-13 15:55 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells,
	Mike Rapoport, Guilherme G. Piccoli

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Add a method to find a region specified by reserve_mem=nn:align:name for
ramoops. Adding a kernel command line parameter:

  reserve_mem=12M:4096:oops ramoops.mem_name=oops

Will use the size and location defined by the memmap parameter where it
finds the memory and labels it "oops". The "oops" in the ramoops option
is used to search for it.

This allows for arbitrary RAM to be used for ramoops if it is known that
the memory is not cleared on kernel crashes or soft reboots.

Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 Documentation/admin-guide/ramoops.rst | 13 +++++++++++++
 fs/pstore/ram.c                       | 14 ++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
index e9f85142182d..6f534a707b2a 100644
--- a/Documentation/admin-guide/ramoops.rst
+++ b/Documentation/admin-guide/ramoops.rst
@@ -23,6 +23,8 @@ and type of the memory area are set using three variables:
   * ``mem_size`` for the size. The memory size will be rounded down to a
     power of two.
   * ``mem_type`` to specify if the memory type (default is pgprot_writecombine).
+  * ``mem_name`` to specify a memory region defined by ``reserve_mem`` command
+    line parameter.
 
 Typically the default value of ``mem_type=0`` should be used as that sets the pstore
 mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
@@ -118,6 +120,17 @@ Setting the ramoops parameters can be done in several different manners:
 	return ret;
   }
 
+ D. Using a region of memory reserved via ``reserve_mem`` command line
+    parameter. The address and size will be defined by the ``reserve_mem``
+    parameter. Note, that ``reserve_mem`` may not always allocate memory
+    in the same location, and cannot be relied upon. Testing will need
+    to be done, and it may not work on every machine, nor every kernel.
+    Consider this a "best effort" approach. The ``reserve_mem`` option
+    takes a size, alignment and name as arguments. The name is used
+    to map the memory to a label that can be retrieved by ramoops.
+
+	reserver_mem=2M:4096:oops  ramoops.mem_name=oops
+
 You can specify either RAM memory or peripheral devices' memory. However, when
 specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
 very early in the architecture code, e.g.::
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index b1a455f42e93..4311fcbc84f2 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -50,6 +50,10 @@ module_param_hw(mem_address, ullong, other, 0400);
 MODULE_PARM_DESC(mem_address,
 		"start of reserved RAM used to store oops/panic logs");
 
+static char *mem_name;
+module_param_named(mem_name, mem_name, charp, 0400);
+MODULE_PARM_DESC(mem_name, "name of kernel param that holds addr");
+
 static ulong mem_size;
 module_param(mem_size, ulong, 0400);
 MODULE_PARM_DESC(mem_size,
@@ -914,6 +918,16 @@ static void __init ramoops_register_dummy(void)
 {
 	struct ramoops_platform_data pdata;
 
+	if (mem_name) {
+		phys_addr_t start;
+		phys_addr_t size;
+
+		if (reserve_mem_find_by_name(mem_name, &start, &size)) {
+			mem_address = start;
+			mem_size = size;
+		}
+	}
+
 	/*
 	 * Prepare a dummy platform data structure to carry the module
 	 * parameters. If mem_size isn't set, then there are no module
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-13 15:55 [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
  2024-06-13 15:55 ` [PATCH v6 1/2] " Steven Rostedt
  2024-06-13 15:55 ` [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option Steven Rostedt
@ 2024-06-13 16:54 ` Alexander Graf
  2024-06-13 17:12   ` Steven Rostedt
  2024-06-19 21:53 ` Mike Rapoport
  3 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2024-06-13 16:54 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells, Mike Rapoport

Hey Steve,

On 13.06.24 17:55, Steven Rostedt wrote:
> Reserve unspecified location of physical memory from kernel command line
>
> Background:
>
> In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract
> dmesg output and some other information when a crash happens in the field.
> (This is only done when the user selects "Allow Google to collect data for
>   improving the system"). But there are cases when there's a bug that
> requires more data to be retrieved to figure out what is happening. We would
> like to increase the pstore size, either temporarily, or maybe even
> permanently. The pstore on these devices are at a fixed location in RAM (as
> the RAM is not cleared on soft reboots nor crashes). The location is chosen
> by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86.
> There's a driver that queries for this to initialize the pstore for
> ChromeOS:
>
>    See drivers/platform/chrome/chromeos_pstore.c
>
> Problem:
>
> The problem is that, even though there's a process to change the kernel on
> these systems, and is done regularly to install updates, the firmware is
> updated much less frequently. Choosing the place in RAM also takes special
> care, and may be in a different address for different boards. Updating the
> size via firmware is a large effort and not something that many are willing
> to do for a temporary pstore size change.


(sorry for not commenting on earlier versions, I didn't see v1-v5 in my 
inbox)

Do you have a "real" pstore on these systems that you could store 
non-volatile variables in, such as persistent UEFI variables? If so, you 
could create an actually persistent mapping for your trace pstore even 
across kernel version updates as a general mechanism to create reserved 
memblocks at fixed offsets.


> Requirement:
>
> Need a way to reserve memory that will be at a consistent location for
> every boot, if the kernel and system are the same. Does not need to work
> if rebooting to a different kernel, or if the system can change the
> memory layout between boots.
>
> The reserved memory can not be an hard coded address, as the same kernel /
> command line needs to run on several different machines. The picked memory
> reservation just needs to be the same for a given machine, but may be


With KASLR is enabled, doesn't this approach break too often to be 
reliable enough for the data you want to extract?

Picking up the idea above, with a persistent variable we could even make 
KASLR avoid that reserved pstore region in its search for a viable KASLR 
offset.


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-13 16:54 ` [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Alexander Graf
@ 2024-06-13 17:12   ` Steven Rostedt
  2024-06-17  7:07     ` Alexander Graf
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2024-06-13 17:12 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Baoquan He, Borislav Petkov,
	Paul E. McKenney, David Howells, Mike Rapoport

On Thu, 13 Jun 2024 18:54:12 +0200
Alexander Graf <graf@amazon.com> wrote:

> 
> Do you have a "real" pstore on these systems that you could store 
> non-volatile variables in, such as persistent UEFI variables? If so, you 
> could create an actually persistent mapping for your trace pstore even 
> across kernel version updates as a general mechanism to create reserved 
> memblocks at fixed offsets.

After implementing all this, I don't think I can use pstore for my
purpose. pstore is a generic interface for persistent storage, and
requires an interface to access it. From what I understand, it's not
the place to just ask for an area of RAM.

For this, I have a single patch that allows the tracing instance to use
an area reserved by reserve_mem.

  reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace

I've already tested this on qemu and a couple of chromebooks. It works
well.

> 
> 
> > Requirement:
> >
> > Need a way to reserve memory that will be at a consistent location for
> > every boot, if the kernel and system are the same. Does not need to work
> > if rebooting to a different kernel, or if the system can change the
> > memory layout between boots.
> >
> > The reserved memory can not be an hard coded address, as the same kernel /
> > command line needs to run on several different machines. The picked memory
> > reservation just needs to be the same for a given machine, but may be  
> 
> 
> With KASLR is enabled, doesn't this approach break too often to be 
> reliable enough for the data you want to extract?
> 
> Picking up the idea above, with a persistent variable we could even make 
> KASLR avoid that reserved pstore region in its search for a viable KASLR 
> offset.

I think I was hit by it once in all my testing. For our use case, the
few times it fails to map is not going to affect what we need this for
at all.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option
  2024-06-13 15:55 ` [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option Steven Rostedt
@ 2024-06-13 18:19   ` Guilherme G. Piccoli
  2024-06-13 18:42     ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Guilherme G. Piccoli @ 2024-06-13 18:19 UTC (permalink / raw)
  To: Steven Rostedt, Mike Rapoport
  Cc: Masami Hiramatsu, linux-trace-kernel, linux-kernel, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

Thanks Steve, re-tested in my VM and it's working fine.
Just a minor below...


On 13/06/2024 12:55, Steven Rostedt wrote:
> [...]
> + D. Using a region of memory reserved via ``reserve_mem`` command line
> +    parameter. The address and size will be defined by the ``reserve_mem``
> +    parameter. Note, that ``reserve_mem`` may not always allocate memory
> +    in the same location, and cannot be relied upon. Testing will need
> +    to be done, and it may not work on every machine, nor every kernel.
> +    Consider this a "best effort" approach. The ``reserve_mem`` option
> +    takes a size, alignment and name as arguments. The name is used
> +    to map the memory to a label that can be retrieved by ramoops.
> +
> +	reserver_mem=2M:4096:oops  ramoops.mem_name=oops
> +

Likely this could be fixed on merge, to avoid another version, but...

s/reserver_mem/reserve_mem


Cheers!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option
  2024-06-13 18:19   ` Guilherme G. Piccoli
@ 2024-06-13 18:42     ` Steven Rostedt
  2024-06-13 18:51       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2024-06-13 18:42 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Mike Rapoport, Masami Hiramatsu, linux-trace-kernel, linux-kernel,
	Mark Rutland, Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

On Thu, 13 Jun 2024 15:19:50 -0300
"Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:

> > +
> > +	reserver_mem=2M:4096:oops  ramoops.mem_name=oops
> > +  
> 
> Likely this could be fixed on merge, to avoid another version, but...
> 
> s/reserver_mem/reserve_mem

That 'r' is my nemesis! Almost every time I type "reserve" I write it
as "reserver" and have to go back and delete it. :-p

Just to make patchwork work nicely, I'll send a v7. But I'll wait for
other comments or acks and reviews.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option
  2024-06-13 18:42     ` Steven Rostedt
@ 2024-06-13 18:51       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 20+ messages in thread
From: Guilherme G. Piccoli @ 2024-06-13 18:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mike Rapoport, Masami Hiramatsu, linux-trace-kernel, linux-kernel,
	Mark Rutland, Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

On 13/06/2024 15:42, Steven Rostedt wrote:
> On Thu, 13 Jun 2024 15:19:50 -0300
> "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
> 
>>> +
>>> +	reserver_mem=2M:4096:oops  ramoops.mem_name=oops
>>> +  
>>
>> Likely this could be fixed on merge, to avoid another version, but...
>>
>> s/reserver_mem/reserve_mem
> 
> That 'r' is my nemesis! Almost every time I type "reserve" I write it
> as "reserver" and have to go back and delete it. :-p

LOL, I thought the same! And I even wondered if you're not a vim user
that makes use of 'r' for replacing text and double typed that - I do
that all the time, with r and i.

Anyway, thanks for fixing that.
Cheers!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-13 17:12   ` Steven Rostedt
@ 2024-06-17  7:07     ` Alexander Graf
  2024-06-17 20:40       ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2024-06-17  7:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Baoquan He, Borislav Petkov,
	Paul E. McKenney, David Howells, Mike Rapoport, Ard Biesheuvel

Hey Steve,

On 13.06.24 19:12, Steven Rostedt wrote:
> On Thu, 13 Jun 2024 18:54:12 +0200
> Alexander Graf <graf@amazon.com> wrote:
>
>> Do you have a "real" pstore on these systems that you could store
>> non-volatile variables in, such as persistent UEFI variables? If so, you
>> could create an actually persistent mapping for your trace pstore even
>> across kernel version updates as a general mechanism to create reserved
>> memblocks at fixed offsets.
> After implementing all this, I don't think I can use pstore for my
> purpose. pstore is a generic interface for persistent storage, and
> requires an interface to access it. From what I understand, it's not
> the place to just ask for an area of RAM.
>
> For this, I have a single patch that allows the tracing instance to use
> an area reserved by reserve_mem.
>
>    reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace
>
> I've already tested this on qemu and a couple of chromebooks. It works
> well.


I believe we're talking about 2 different things :). Let me rephrase a 
bit and make a concrete example.

Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command 
line option. The kernel now comes up and allocates a random chunk of 
memory that - by (admittedly good) chance - may be at the same physical 
location as before. Let's assume it deemed 0x1000000 as a good offset.

Let's now assume you're running on a UEFI system. There, you always have 
non-volatile storage available to you even in the pre-boot phase. That 
means the kernel could create a UEFI variable that says "12M:4096:trace 
-> 0x1000000". The pre-boot phase takes all these UEFI variables and 
marks them as reserved. When you finally reach your command line parsing 
logic for reserve_mem=, you can flip all reservations that were not on 
the command line back to normal memory.

That way you have pretty much guaranteed persistent memory regions, even 
with KASLR changing your memory layout across boots.

The nice thing is that the above is an extension of what you've already 
built: Systems with UEFI simply get better guarantees that their regions 
persist.


>
>>
>>> Requirement:
>>>
>>> Need a way to reserve memory that will be at a consistent location for
>>> every boot, if the kernel and system are the same. Does not need to work
>>> if rebooting to a different kernel, or if the system can change the
>>> memory layout between boots.
>>>
>>> The reserved memory can not be an hard coded address, as the same kernel /
>>> command line needs to run on several different machines. The picked memory
>>> reservation just needs to be the same for a given machine, but may be
>>
>> With KASLR is enabled, doesn't this approach break too often to be
>> reliable enough for the data you want to extract?
>>
>> Picking up the idea above, with a persistent variable we could even make
>> KASLR avoid that reserved pstore region in its search for a viable KASLR
>> offset.
> I think I was hit by it once in all my testing. For our use case, the
> few times it fails to map is not going to affect what we need this for
> at all.


Once is pretty good. Do you know why? Also once out of how many runs? Is 
the randomness source not as random as it should be or are the number of 
bits for KASLR maybe so few on your target architecture that the odds of 
hitting anything become low? Do these same constraints hold true outside 
of your testing environment?


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-17  7:07     ` Alexander Graf
@ 2024-06-17 20:40       ` Steven Rostedt
  2024-06-17 21:01         ` Alexander Graf
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2024-06-17 20:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Baoquan He, Borislav Petkov,
	Paul E. McKenney, David Howells, Mike Rapoport, Ard Biesheuvel

[-- Attachment #1: Type: text/plain, Size: 4250 bytes --]

On Mon, 17 Jun 2024 09:07:29 +0200
Alexander Graf <graf@amazon.com> wrote:

> Hey Steve,
> 
> 
> I believe we're talking about 2 different things :). Let me rephrase a 
> bit and make a concrete example.
> 
> Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command 
> line option. The kernel now comes up and allocates a random chunk of 
> memory that - by (admittedly good) chance - may be at the same physical 
> location as before. Let's assume it deemed 0x1000000 as a good offset.

Note, it's not random. It picks from the top of available memory every
time. But things can mess with it (see below).

> 
> Let's now assume you're running on a UEFI system. There, you always have 
> non-volatile storage available to you even in the pre-boot phase. That 
> means the kernel could create a UEFI variable that says "12M:4096:trace 
> -> 0x1000000". The pre-boot phase takes all these UEFI variables and   
> marks them as reserved. When you finally reach your command line parsing 
> logic for reserve_mem=, you can flip all reservations that were not on 
> the command line back to normal memory.
> 
> That way you have pretty much guaranteed persistent memory regions, even 
> with KASLR changing your memory layout across boots.
> 
> The nice thing is that the above is an extension of what you've already 
> built: Systems with UEFI simply get better guarantees that their regions 
> persist.

This could be an added feature, but it is very architecture specific,
and would likely need architecture specific updates.

> 
> 
> >  
> >>  
> >>> Requirement:
> >>>
> >>> Need a way to reserve memory that will be at a consistent location for
> >>> every boot, if the kernel and system are the same. Does not need to work
> >>> if rebooting to a different kernel, or if the system can change the
> >>> memory layout between boots.
> >>>
> >>> The reserved memory can not be an hard coded address, as the same kernel /
> >>> command line needs to run on several different machines. The picked memory
> >>> reservation just needs to be the same for a given machine, but may be  
> >>
> >> With KASLR is enabled, doesn't this approach break too often to be
> >> reliable enough for the data you want to extract?
> >>
> >> Picking up the idea above, with a persistent variable we could even make
> >> KASLR avoid that reserved pstore region in its search for a viable KASLR
> >> offset.  
> > I think I was hit by it once in all my testing. For our use case, the
> > few times it fails to map is not going to affect what we need this for
> > at all.  
> 
> 
> Once is pretty good. Do you know why? Also once out of how many runs? Is 
> the randomness source not as random as it should be or are the number of 
> bits for KASLR maybe so few on your target architecture that the odds of 
> hitting anything become low? Do these same constraints hold true outside 
> of your testing environment?

So I just ran it a hundred times in a loop. I added a patch to print
the location of "_text". The loop was this:

  for i in `seq 100`; do
	ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot'  >> text; grub-reboot '1>0'; sleep 0.5; reboot"
	sleep 25
  done

It searches dmesg for my added printk as well as the print of were the
ring buffer was loaded in physical memory.

It takes about 15 seconds to reboot, so I waited 25. The results are
attached. I found that it was consistent 76 times, which means 1 out of
4 it's not. Funny enough, it broke whenever it loaded the kernel below
0x100000000. And then it would be off by a little.

It was consistently at:

  0x27d000000

And when it failed, it was at 0x27ce00000.

Note, when I used the e820 tables to do this, I never saw a failure. My
assumption is that when it is below 0x100000000, something else gets
allocated causing this to get pushed down.

As this code relies on memblock_phys_alloc() being consistent, if
something gets allocated before it differently depending on where the
kernel is, it can also move the location. A plugin to UEFI would mean
that it would need to reserve the memory, and the code here will need
to know where it is. We could always make the function reserve_mem()
global and weak so that architectures can override it.

-- Steve

[-- Attachment #2: text --]
[-- Type: application/octet-stream, Size: 15200 bytes --]

[    0.000000] text starts at 0000000261a00000
[    0.279601] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000264a00000
[    0.258160] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000016d800000
[    0.274498] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001e3200000
[    0.321133] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000185200000
[    0.289880] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000121a00000
[    0.335898] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000003ca00000
[    0.299969] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000254e00000
[    0.257330] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001ca400000
[    0.361895] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001a3a00000
[    0.264609] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000221200000
[    0.302778] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000223400000
[    0.281307] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001e5600000
[    0.263459] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000022dc00000
[    0.293013] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000010dc00000
[    0.255996] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001b0c00000
[    0.260787] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000052400000
[    0.257456] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000015ca00000
[    0.271065] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000101800000
[    0.320951] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000252400000
[    0.307885] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000227800000
[    0.280646] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000007800000
[    0.252212] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000227c00000
[    0.262189] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000025fe00000
[    0.343060] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000223a00000
[    0.265265] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001afa00000
[    0.324491] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000151400000
[    0.275907] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000136e00000
[    0.296053] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001a3200000
[    0.256108] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000278400000
[    0.254019] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000061800000
[    0.253513] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 00000001b5000000
[    0.283077] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000029600000
[    0.253972] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000005d600000
[    0.258085] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 00000001ade00000
[    0.263642] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001d0400000
[    0.254616] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000222200000
[    0.275135] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000002f200000
[    0.257120] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000014b200000
[    0.253358] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000107200000
[    0.277631] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000185000000
[    0.277916] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001f5600000
[    0.267432] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000006fa00000
[    0.254557] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000118000000
[    0.259598] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000018b200000
[    0.277552] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001b1400000
[    0.248468] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000056400000
[    0.253134] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000001fe00000
[    0.309512] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 00000001e6400000
[    0.262389] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000016e600000
[    0.261098] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000184600000
[    0.246911] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000275a00000
[    0.245947] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000023e200000
[    0.243659] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000103800000
[    0.245082] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000004e000000
[    0.311655] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000022dc00000
[    0.303337] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000226800000
[    0.248418] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001e0800000
[    0.248551] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000115c00000
[    0.254711] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001bba00000
[    0.304443] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000010c200000
[    0.250813] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000142a00000
[    0.250088] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000012fa00000
[    0.268560] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000140000000
[    0.252298] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001adc00000
[    0.251261] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000012200000
[    0.248004] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000010ce00000
[    0.250743] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000005b400000
[    0.313805] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 00000001d9200000
[    0.000000] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000029400000
[    0.242129] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000060c00000
[    0.303373] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000000c200000
[    0.290952] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000250200000
[    0.310198] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000003d600000
[    0.246804] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000019fe00000
[    0.265477] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000175000000
[    0.255366] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000133600000
[    0.270127] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000011d600000
[    0.281918] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000024a00000
[    0.256186] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000165a00000
[    0.257470] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000003dc00000
[    0.263671] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000102600000
[    0.254288] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 00000001ce200000
[    0.335659] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000018ac00000
[    0.252010] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000074a00000
[    0.254337] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 00000001ece00000
[    0.311124] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000004d400000
[    0.260534] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000013c00000
[    0.295823] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 000000005ec00000
[    0.271064] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000166c00000
[    0.255484] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000025fa00000
[    0.257275] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000021cc00000
[    0.254915] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000239400000
[    0.283687] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000018f000000
[    0.258722] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000275800000
[    0.281587] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000157400000
[    0.256042] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000215c00000
[    0.000000] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 0000000115400000
[    0.243701] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000
[    0.000000] text starts at 000000004f400000
[    0.319405] Tracing: mapped boot instance boot_mapped at physical memory 0x27ce00000 of size 0xc00000
[    0.000000] text starts at 0000000267800000
[    0.307464] Tracing: mapped boot instance boot_mapped at physical memory 0x27d000000 of size 0xc00000

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-17 20:40       ` Steven Rostedt
@ 2024-06-17 21:01         ` Alexander Graf
  2024-06-17 21:18           ` Steven Rostedt
  2024-06-18 10:21           ` Ard Biesheuvel
  0 siblings, 2 replies; 20+ messages in thread
From: Alexander Graf @ 2024-06-17 21:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Baoquan He, Borislav Petkov,
	Paul E. McKenney, David Howells, Mike Rapoport, Ard Biesheuvel

[resend because Thunderbird decided to send the previous version as HTML :(]


On 17.06.24 22:40, Steven Rostedt wrote:
> On Mon, 17 Jun 2024 09:07:29 +0200
> Alexander Graf<graf@amazon.com>  wrote:
>
>> Hey Steve,
>>
>>
>> I believe we're talking about 2 different things :). Let me rephrase a
>> bit and make a concrete example.
>>
>> Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command
>> line option. The kernel now comes up and allocates a random chunk of
>> memory that - by (admittedly good) chance - may be at the same physical
>> location as before. Let's assume it deemed 0x1000000 as a good offset.
> Note, it's not random. It picks from the top of available memory every
> time. But things can mess with it (see below).
>
>> Let's now assume you're running on a UEFI system. There, you always have
>> non-volatile storage available to you even in the pre-boot phase. That
>> means the kernel could create a UEFI variable that says "12M:4096:trace
>> -> 0x1000000". The pre-boot phase takes all these UEFI variables and
>> marks them as reserved. When you finally reach your command line parsing
>> logic for reserve_mem=, you can flip all reservations that were not on
>> the command line back to normal memory.
>>
>> That way you have pretty much guaranteed persistent memory regions, even
>> with KASLR changing your memory layout across boots.
>>
>> The nice thing is that the above is an extension of what you've already
>> built: Systems with UEFI simply get better guarantees that their regions
>> persist.
> This could be an added feature, but it is very architecture specific,
> and would likely need architecture specific updates.


It definitely would be an added feature, yes. But one that allows you to 
ensure persistence a lot more safely :).


>>>>> Requirement:
>>>>>
>>>>> Need a way to reserve memory that will be at a consistent location for
>>>>> every boot, if the kernel and system are the same. Does not need to work
>>>>> if rebooting to a different kernel, or if the system can change the
>>>>> memory layout between boots.
>>>>>
>>>>> The reserved memory can not be an hard coded address, as the same kernel /
>>>>> command line needs to run on several different machines. The picked memory
>>>>> reservation just needs to be the same for a given machine, but may be
>>>> With KASLR is enabled, doesn't this approach break too often to be
>>>> reliable enough for the data you want to extract?
>>>>
>>>> Picking up the idea above, with a persistent variable we could even make
>>>> KASLR avoid that reserved pstore region in its search for a viable KASLR
>>>> offset.
>>> I think I was hit by it once in all my testing. For our use case, the
>>> few times it fails to map is not going to affect what we need this for
>>> at all.
>> Once is pretty good. Do you know why? Also once out of how many runs? Is
>> the randomness source not as random as it should be or are the number of
>> bits for KASLR maybe so few on your target architecture that the odds of
>> hitting anything become low? Do these same constraints hold true outside
>> of your testing environment?
> So I just ran it a hundred times in a loop. I added a patch to print
> the location of "_text". The loop was this:
>
>    for i in `seq 100`; do
>          ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot'  >> text; grub-reboot '1>0'; sleep 0.5; reboot"
>          sleep 25
>    done
>
> It searches dmesg for my added printk as well as the print of were the
> ring buffer was loaded in physical memory.
>
> It takes about 15 seconds to reboot, so I waited 25. The results are
> attached. I found that it was consistent 76 times, which means 1 out of
> 4 it's not. Funny enough, it broke whenever it loaded the kernel below
> 0x100000000. And then it would be off by a little.
>
> It was consistently at:
>
>    0x27d000000
>
> And when it failed, it was at 0x27ce00000.
>
> Note, when I used the e820 tables to do this, I never saw a failure. My
> assumption is that when it is below 0x100000000, something else gets
> allocated causing this to get pushed down.


Thinking about it again: What if you run the allocation super early (see 
arch/x86/boot/compressed/kaslr.c:handle_mem_options())? If you stick to 
allocating only from top, you're effectively kernel version independent 
for your allocations because none of the kernel code ran yet and 
definitely KASLR independent because you're running deterministically 
before KASLR even gets allocated.

> As this code relies on memblock_phys_alloc() being consistent, if
> something gets allocated before it differently depending on where the
> kernel is, it can also move the location. A plugin to UEFI would mean
> that it would need to reserve the memory, and the code here will need
> to know where it is. We could always make the function reserve_mem()
> global and weak so that architectures can override it.


Yes, the in-kernel UEFI loader (efi-stub) could simply populate a new 
type of memblock with the respective reservations and you later call 
memblock_find_in_range_node() instead of memblock_phys_alloc() to pass 
in flags that you want to allocate only from the new 
MEMBLOCK_RESERVE_MEM type. The same model would work for BIOS boots 
through the handle_mem_options() path above. In fact, if the BIOS way 
works fine, we don't even need UEFI variables: The same way allocations 
will be identical during BIOS execution, they should stay identical 
across UEFI launches.

As cherry on top, kexec also works seamlessly with the special memblock 
approach because kexec (at least on x86) hands memblocks as is to the 
next kernel. So the new kernel will also automatically use the same 
ranges for its allocations.


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-17 21:01         ` Alexander Graf
@ 2024-06-17 21:18           ` Steven Rostedt
  2024-06-18 10:21           ` Ard Biesheuvel
  1 sibling, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2024-06-17 21:18 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Baoquan He, Borislav Petkov,
	Paul E. McKenney, David Howells, Mike Rapoport, Ard Biesheuvel

On Mon, 17 Jun 2024 23:01:12 +0200
Alexander Graf <graf@amazon.com> wrote:
> > This could be an added feature, but it is very architecture specific,
> > and would likely need architecture specific updates.  
> 
> 
> It definitely would be an added feature, yes. But one that allows you to 
> ensure persistence a lot more safely :).

Sure.

> 
> Thinking about it again: What if you run the allocation super early (see 
> arch/x86/boot/compressed/kaslr.c:handle_mem_options())? If you stick to 
> allocating only from top, you're effectively kernel version independent 
> for your allocations because none of the kernel code ran yet and 
> definitely KASLR independent because you're running deterministically 
> before KASLR even gets allocated.
> 
> > As this code relies on memblock_phys_alloc() being consistent, if
> > something gets allocated before it differently depending on where the
> > kernel is, it can also move the location. A plugin to UEFI would mean
> > that it would need to reserve the memory, and the code here will need
> > to know where it is. We could always make the function reserve_mem()
> > global and weak so that architectures can override it.  
> 
> 
> Yes, the in-kernel UEFI loader (efi-stub) could simply populate a new 
> type of memblock with the respective reservations and you later call 
> memblock_find_in_range_node() instead of memblock_phys_alloc() to pass 
> in flags that you want to allocate only from the new 
> MEMBLOCK_RESERVE_MEM type. The same model would work for BIOS boots 
> through the handle_mem_options() path above. In fact, if the BIOS way 
> works fine, we don't even need UEFI variables: The same way allocations 
> will be identical during BIOS execution, they should stay identical 
> across UEFI launches.
> 
> As cherry on top, kexec also works seamlessly with the special memblock 
> approach because kexec (at least on x86) hands memblocks as is to the 
> next kernel. So the new kernel will also automatically use the same 
> ranges for its allocations.

I'm all for expanding this. But I would just want to get this in for
now as is. It theoretically works on all architectures. If someone
wants to make in more robust and accurate on a specific architecture,
I'm all for it. Like I said, we could make the reserver_mem() function
global and weak, and then if an architecture has a better way to handle
this, it could use that.

Hmm, x86 could do this with the e820 code like I did in my first
versions. Like I said, it didn't fail at all with that.

And we can have an UEFI version as well.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-17 21:01         ` Alexander Graf
  2024-06-17 21:18           ` Steven Rostedt
@ 2024-06-18 10:21           ` Ard Biesheuvel
  2024-06-18 11:47             ` Alexander Graf
  2024-06-19  8:02             ` Mike Rapoport
  1 sibling, 2 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2024-06-18 10:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells, Mike Rapoport

On Mon, 17 Jun 2024 at 23:01, Alexander Graf <graf@amazon.com> wrote:
>
> On 17.06.24 22:40, Steven Rostedt wrote:
> > On Mon, 17 Jun 2024 09:07:29 +0200
> > Alexander Graf<graf@amazon.com>  wrote:
> >
> >> Hey Steve,
> >>
> >>
> >> I believe we're talking about 2 different things :). Let me rephrase a
> >> bit and make a concrete example.
> >>
> >> Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command
> >> line option. The kernel now comes up and allocates a random chunk of
> >> memory that - by (admittedly good) chance - may be at the same physical
> >> location as before. Let's assume it deemed 0x1000000 as a good offset.
> > Note, it's not random. It picks from the top of available memory every
> > time. But things can mess with it (see below).
> >
> >> Let's now assume you're running on a UEFI system. There, you always have
> >> non-volatile storage available to you even in the pre-boot phase. That
> >> means the kernel could create a UEFI variable that says "12M:4096:trace
> >> -> 0x1000000". The pre-boot phase takes all these UEFI variables and
> >> marks them as reserved. When you finally reach your command line parsing
> >> logic for reserve_mem=, you can flip all reservations that were not on
> >> the command line back to normal memory.
> >>
> >> That way you have pretty much guaranteed persistent memory regions, even
> >> with KASLR changing your memory layout across boots.
> >>
> >> The nice thing is that the above is an extension of what you've already
> >> built: Systems with UEFI simply get better guarantees that their regions
> >> persist.
> > This could be an added feature, but it is very architecture specific,
> > and would likely need architecture specific updates.
>
>
> It definitely would be an added feature, yes. But one that allows you to
> ensure persistence a lot more safely :).
>
>
> >>>>> Requirement:
> >>>>>
> >>>>> Need a way to reserve memory that will be at a consistent location for
> >>>>> every boot, if the kernel and system are the same. Does not need to work
> >>>>> if rebooting to a different kernel, or if the system can change the
> >>>>> memory layout between boots.
> >>>>>
> >>>>> The reserved memory can not be an hard coded address, as the same kernel /
> >>>>> command line needs to run on several different machines. The picked memory
> >>>>> reservation just needs to be the same for a given machine, but may be
> >>>> With KASLR is enabled, doesn't this approach break too often to be
> >>>> reliable enough for the data you want to extract?
> >>>>
> >>>> Picking up the idea above, with a persistent variable we could even make
> >>>> KASLR avoid that reserved pstore region in its search for a viable KASLR
> >>>> offset.
> >>> I think I was hit by it once in all my testing. For our use case, the
> >>> few times it fails to map is not going to affect what we need this for
> >>> at all.
> >> Once is pretty good. Do you know why? Also once out of how many runs? Is
> >> the randomness source not as random as it should be or are the number of
> >> bits for KASLR maybe so few on your target architecture that the odds of
> >> hitting anything become low? Do these same constraints hold true outside
> >> of your testing environment?
> > So I just ran it a hundred times in a loop. I added a patch to print
> > the location of "_text". The loop was this:
> >
> >    for i in `seq 100`; do
> >          ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot'  >> text; grub-reboot '1>0'; sleep 0.5; reboot"
> >          sleep 25
> >    done
> >
> > It searches dmesg for my added printk as well as the print of were the
> > ring buffer was loaded in physical memory.
> >
> > It takes about 15 seconds to reboot, so I waited 25. The results are
> > attached. I found that it was consistent 76 times, which means 1 out of
> > 4 it's not. Funny enough, it broke whenever it loaded the kernel below
> > 0x100000000. And then it would be off by a little.
> >
> > It was consistently at:
> >
> >    0x27d000000
> >
> > And when it failed, it was at 0x27ce00000.
> >
> > Note, when I used the e820 tables to do this, I never saw a failure. My
> > assumption is that when it is below 0x100000000, something else gets
> > allocated causing this to get pushed down.
>
>
> Thinking about it again: What if you run the allocation super early (see
> arch/x86/boot/compressed/kaslr.c:handle_mem_options())?

That code is not used by EFI boot anymore.

In general, I would recommend (and have recommended) against these
kinds of hacks in mainline, because -especially on x86- there is
always someone that turns up with some kind of convoluted use case
that gets broken if we try to change anything in the boot code.

I spent considerable time over the past year making the EFI/x86 boot
code compatible with the new MS imposed requirements on PC boot
firmware (related to secure boot and NX restrictions on memory
mappings). This involved some radical refactoring of the boot
sequence, including the KASLR logic. Adding fragile code there that
will result in regressions observable to end users when it gets broken
is really not what I'd like to see.

So I would personally prefer for this code not to go in at all. But if
it does go in (and Steven has already agreed to this), it needs a
giant disclaimer that it is best effort and may get broken
inadvertently by changes that may seem unrelated.

> If you stick to
> allocating only from top, you're effectively kernel version independent
> for your allocations because none of the kernel code ran yet and
> definitely KASLR independent because you're running deterministically
> before KASLR even gets allocated.
>

Allocating top down under EFI is almost guaranteed to result in
problems, because that is how the EFI page allocator works as well.
This means that allocations will move around depending on, e.g.,
whether some USB stick was inserted on the first boot and removed on
the second, or whether your external display was on or off.

> > As this code relies on memblock_phys_alloc() being consistent, if
> > something gets allocated before it differently depending on where the
> > kernel is, it can also move the location. A plugin to UEFI would mean
> > that it would need to reserve the memory, and the code here will need
> > to know where it is. We could always make the function reserve_mem()
> > global and weak so that architectures can override it.
>
>
> Yes, the in-kernel UEFI loader (efi-stub) could simply populate a new
> type of memblock with the respective reservations and you later call
> memblock_find_in_range_node() instead of memblock_phys_alloc() to pass
> in flags that you want to allocate only from the new
> MEMBLOCK_RESERVE_MEM type.

This would require the introduction of a new kind of memory map for
the handover between EFI stub and core kernel. Currently, x86 relies
on E820 and other architectures pass the EFI memory map unmodified.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-18 10:21           ` Ard Biesheuvel
@ 2024-06-18 11:47             ` Alexander Graf
  2024-06-18 15:43               ` Steven Rostedt
  2024-06-19  8:02             ` Mike Rapoport
  1 sibling, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2024-06-18 11:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells, Mike Rapoport


On 18.06.24 12:21, Ard Biesheuvel wrote:
> On Mon, 17 Jun 2024 at 23:01, Alexander Graf <graf@amazon.com> wrote:
>> On 17.06.24 22:40, Steven Rostedt wrote:
>>> On Mon, 17 Jun 2024 09:07:29 +0200
>>> Alexander Graf<graf@amazon.com>  wrote:
>>>
>>>> Hey Steve,
>>>>
>>>>
>>>> I believe we're talking about 2 different things :). Let me rephrase a
>>>> bit and make a concrete example.
>>>>
>>>> Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command
>>>> line option. The kernel now comes up and allocates a random chunk of
>>>> memory that - by (admittedly good) chance - may be at the same physical
>>>> location as before. Let's assume it deemed 0x1000000 as a good offset.
>>> Note, it's not random. It picks from the top of available memory every
>>> time. But things can mess with it (see below).
>>>
>>>> Let's now assume you're running on a UEFI system. There, you always have
>>>> non-volatile storage available to you even in the pre-boot phase. That
>>>> means the kernel could create a UEFI variable that says "12M:4096:trace
>>>> -> 0x1000000". The pre-boot phase takes all these UEFI variables and
>>>> marks them as reserved. When you finally reach your command line parsing
>>>> logic for reserve_mem=, you can flip all reservations that were not on
>>>> the command line back to normal memory.
>>>>
>>>> That way you have pretty much guaranteed persistent memory regions, even
>>>> with KASLR changing your memory layout across boots.
>>>>
>>>> The nice thing is that the above is an extension of what you've already
>>>> built: Systems with UEFI simply get better guarantees that their regions
>>>> persist.
>>> This could be an added feature, but it is very architecture specific,
>>> and would likely need architecture specific updates.
>>
>> It definitely would be an added feature, yes. But one that allows you to
>> ensure persistence a lot more safely :).
>>
>>
>>>>>>> Requirement:
>>>>>>>
>>>>>>> Need a way to reserve memory that will be at a consistent location for
>>>>>>> every boot, if the kernel and system are the same. Does not need to work
>>>>>>> if rebooting to a different kernel, or if the system can change the
>>>>>>> memory layout between boots.
>>>>>>>
>>>>>>> The reserved memory can not be an hard coded address, as the same kernel /
>>>>>>> command line needs to run on several different machines. The picked memory
>>>>>>> reservation just needs to be the same for a given machine, but may be
>>>>>> With KASLR is enabled, doesn't this approach break too often to be
>>>>>> reliable enough for the data you want to extract?
>>>>>>
>>>>>> Picking up the idea above, with a persistent variable we could even make
>>>>>> KASLR avoid that reserved pstore region in its search for a viable KASLR
>>>>>> offset.
>>>>> I think I was hit by it once in all my testing. For our use case, the
>>>>> few times it fails to map is not going to affect what we need this for
>>>>> at all.
>>>> Once is pretty good. Do you know why? Also once out of how many runs? Is
>>>> the randomness source not as random as it should be or are the number of
>>>> bits for KASLR maybe so few on your target architecture that the odds of
>>>> hitting anything become low? Do these same constraints hold true outside
>>>> of your testing environment?
>>> So I just ran it a hundred times in a loop. I added a patch to print
>>> the location of "_text". The loop was this:
>>>
>>>     for i in `seq 100`; do
>>>           ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot'  >> text; grub-reboot '1>0'; sleep 0.5; reboot"
>>>           sleep 25
>>>     done
>>>
>>> It searches dmesg for my added printk as well as the print of were the
>>> ring buffer was loaded in physical memory.
>>>
>>> It takes about 15 seconds to reboot, so I waited 25. The results are
>>> attached. I found that it was consistent 76 times, which means 1 out of
>>> 4 it's not. Funny enough, it broke whenever it loaded the kernel below
>>> 0x100000000. And then it would be off by a little.
>>>
>>> It was consistently at:
>>>
>>>     0x27d000000
>>>
>>> And when it failed, it was at 0x27ce00000.
>>>
>>> Note, when I used the e820 tables to do this, I never saw a failure. My
>>> assumption is that when it is below 0x100000000, something else gets
>>> allocated causing this to get pushed down.
>>
>> Thinking about it again: What if you run the allocation super early (see
>> arch/x86/boot/compressed/kaslr.c:handle_mem_options())?
> That code is not used by EFI boot anymore.
>
> In general, I would recommend (and have recommended) against these
> kinds of hacks in mainline, because -especially on x86- there is
> always someone that turns up with some kind of convoluted use case
> that gets broken if we try to change anything in the boot code.
>
> I spent considerable time over the past year making the EFI/x86 boot
> code compatible with the new MS imposed requirements on PC boot
> firmware (related to secure boot and NX restrictions on memory
> mappings). This involved some radical refactoring of the boot
> sequence, including the KASLR logic. Adding fragile code there that
> will result in regressions observable to end users when it gets broken
> is really not what I'd like to see.
>
> So I would personally prefer for this code not to go in at all. But if
> it does go in (and Steven has already agreed to this), it needs a
> giant disclaimer that it is best effort and may get broken
> inadvertently by changes that may seem unrelated.


Alright, happy to rest my case about making it more reliable for now 
then :).

IMHO the big fat disclaimer should be in the argument name. 
"reserve_mem" to me sounds like it actually guarantees a reservation - 
which it doesn't. Can we name it more along the lines of "debug" (to 
indicate it's not for production data) or "phoenix" (usually gets reborn 
out of ashes, but you can never know for sure): "debug_mem", / 
"phoenix_mem"?


>
>> If you stick to
>> allocating only from top, you're effectively kernel version independent
>> for your allocations because none of the kernel code ran yet and
>> definitely KASLR independent because you're running deterministically
>> before KASLR even gets allocated.
>>
> Allocating top down under EFI is almost guaranteed to result in
> problems, because that is how the EFI page allocator works as well.
> This means that allocations will move around depending on, e.g.,
> whether some USB stick was inserted on the first boot and removed on
> the second, or whether your external display was on or off.


I believe most UEFI implementations only allocate top down in the lower 
32bits. But yes, it's fragile, I hear you. Let's embrace the flaky 
nature of the beast then :).


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-13 15:55 ` [PATCH v6 1/2] " Steven Rostedt
@ 2024-06-18 12:55   ` Zhengyejian
  2024-06-18 16:06     ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Zhengyejian @ 2024-06-18 12:55 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells,
	Mike Rapoport, Guilherme G. Piccoli

On 2024/6/13 23:55, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> In order to allow for requesting a memory region that can be used for
> things like pstore on multiple machines where the memory layout is not the
> same, add a new option to the kernel command line called "reserve_mem".
> 
> The format is:  reserve_mem=nn:align:name
> 
> Where it will find nn amount of memory at the given alignment of align.
> The name field is to allow another subsystem to retrieve where the memory
> was found. For example:
> 
>    reserve_mem=12M:4096:oops ramoops.mem_name=oops
> 
> Where ramoops.mem_name will tell ramoops that memory was reserved for it
> via the reserve_mem option and it can find it by calling:
> 
>    if (reserve_mem_find_by_name("oops", &start, &size)) {
> 	// start holds the start address and size holds the size given
> 
> This is typically used for systems that do not wipe the RAM, and this
> command line will try to reserve the same physical memory on soft reboots.
> Note, it is not guaranteed to be the same location. For example, if KASLR
> places the kernel at the location of where the RAM reservation was from a
> previous boot, the new reservation will be at a different location.  Any
> subsystem using this feature must add a way to verify that the contents of
> the physical memory is from a previous boot, as there may be cases where
> the memory will not be located at the same location.
> 
> Not all systems may work either. There could be bit flips if the reboot
> goes through the BIOS. Using kexec to reboot the machine is likely to
> have better results in such cases.
> 
> Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/
> 
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>   .../admin-guide/kernel-parameters.txt         |  22 ++++
>   include/linux/mm.h                            |   2 +
>   mm/memblock.c                                 | 117 ++++++++++++++++++
>   3 files changed, 141 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index b600df82669d..56e18b1a520d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5710,6 +5710,28 @@
>   			them.  If <base> is less than 0x10000, the region
>   			is assumed to be I/O ports; otherwise it is memory.
>   
> +	reserve_mem=	[RAM]
> +			Format: nn[KNG]:<align>:<label>
> +			Reserve physical memory and label it with a name that
> +			other subsystems can use to access it. This is typically
> +			used for systems that do not wipe the RAM, and this command
> +			line will try to reserve the same physical memory on
> +			soft reboots. Note, it is not guaranteed to be the same
> +			location. For example, if anything about the system changes
> +			or if booting a different kernel. It can also fail if KASLR
> +			places the kernel at the location of where the RAM reservation
> +			was from a previous boot, the new reservation will be at a
> +			different location.
> +			Any subsystem using this feature must add a way to verify
> +			that the contents of the physical memory is from a previous
> +			boot, as there may be cases where the memory will not be
> +			located at the same location.
> +
> +			The format is size:align:label for example, to request
> +			12 megabytes of 4096 alignment for ramoops:
> +
> +			reserve_mem=12M:4096:oops ramoops.mem_name=oops
> +
>   	reservetop=	[X86-32,EARLY]
>   			Format: nn[KMG]
>   			Reserves a hole at the top of the kernel virtual
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9849dfda44d4..077fb589b88a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
>   void vma_pgtable_walk_begin(struct vm_area_struct *vma);
>   void vma_pgtable_walk_end(struct vm_area_struct *vma);
>   
> +int reserve_mem_find_by_name(const char *name, phys_addr_t *start, phys_addr_t *size);
> +
>   #endif /* _LINUX_MM_H */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index d09136e040d3..b7b0e8c3868d 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2244,6 +2244,123 @@ void __init memblock_free_all(void)
>   	totalram_pages_add(pages);
>   }
>   
> +/* Keep a table to reserve named memory */
> +#define RESERVE_MEM_MAX_ENTRIES		8
> +#define RESERVE_MEM_NAME_SIZE		16
> +struct reserve_mem_table {
> +	char			name[RESERVE_MEM_NAME_SIZE];
> +	phys_addr_t		start;
> +	phys_addr_t		size;
> +};
> +static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
> +static int reserved_mem_count;
> +
> +/* Add wildcard region with a lookup name */
> +static void __init reserved_mem_add(phys_addr_t start, phys_addr_t size,
> +				   const char *name)
> +{
> +	struct reserve_mem_table *map;
> +
> +	map = &reserved_mem_table[reserved_mem_count++];
> +	map->start = start;
> +	map->size = size;
> +	strscpy(map->name, name);
> +}
> +
> +/**
> + * reserve_mem_find_by_name - Find reserved memory region with a given name
> + * @name: The name that is attached to a reserved memory region
> + * @start: If found, holds the start address
> + * @size: If found, holds the size of the address.
> + *
> + * @start and @size are only updated if @name is found.
> + *
> + * Returns: 1 if found or 0 if not found.
> + */
> +int reserve_mem_find_by_name(const char *name, phys_addr_t *start, phys_addr_t *size)
> +{
> +	struct reserve_mem_table *map;
> +	int i;
> +
> +	for (i = 0; i < reserved_mem_count; i++) {
> +		map = &reserved_mem_table[i];
> +		if (!map->size)
> +			continue;
> +		if (strcmp(name, map->name) == 0) {
> +			*start = map->start;
> +			*size = map->size;
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(reserve_mem_find_by_name);
> +
> +/*
> + * Parse reserve_mem=nn:align:name
> + */
> +static int __init reserve_mem(char *p)
> +{
> +	phys_addr_t start, size, align, tmp;
> +	char *name;
> +	char *oldp;
> +	int len;
> +
> +	if (!p)
> +		return -EINVAL;
> +
> +	/* Check if there's room for more reserved memory */
> +	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
> +		return -EBUSY;
> +
> +	oldp = p;
> +	size = memparse(p, &p);
> +	if (!size || p == oldp)
> +		return -EINVAL;
> +
> +	if (*p != ':')
> +		return -EINVAL;
> +
> +	align = memparse(p+1, &p);
> +	if (*p != ':')
> +		return -EINVAL;
> +
> +	/*
> +	 * memblock_phys_alloc() doesn't like a zero size align,
> +	 * but it is OK for this command to have it.
> +	 */
> +	if (align < SMP_CACHE_BYTES)
> +		align = SMP_CACHE_BYTES;
> +
> +	name = p + 1;
> +	len = strlen(name);
> +
> +	/* name needs to have length but not too big */
> +	if (!len || len >= RESERVE_MEM_NAME_SIZE)
> +		return -EINVAL;
> +
> +	/* Make sure that name has text */
> +	for (p = name; *p; p++) {
> +		if (!isspace(*p))
> +			break;
> +	}
> +	if (!*p)
> +		return -EINVAL;
> +
> +	/* Make sure the name is not already used */
> +	if (reserve_mem_find_by_name(name, &start, &tmp))
> +		return -EBUSY;
> +
> +	start = memblock_phys_alloc(size, align);
> +	if (!start)
> +		return -ENOMEM;
> +
> +	reserved_mem_add(start, size, name);
> +
> +	return 0;

Hi, steve, should here return 1 ? Other __setup functions
return 1 when it success.

> +}
> +__setup("reserve_mem=", reserve_mem);
> +
>   #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
>   static const char * const flagname[] = {
>   	[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-18 11:47             ` Alexander Graf
@ 2024-06-18 15:43               ` Steven Rostedt
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2024-06-18 15:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ard Biesheuvel, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells, Mike Rapoport

On Tue, 18 Jun 2024 13:47:49 +0200
Alexander Graf <graf@amazon.com> wrote:

> IMHO the big fat disclaimer should be in the argument name. 
> "reserve_mem" to me sounds like it actually guarantees a reservation - 
> which it doesn't. Can we name it more along the lines of "debug" (to 
> indicate it's not for production data) or "phoenix" (usually gets reborn 
> out of ashes, but you can never know for sure): "debug_mem", / 
> "phoenix_mem"?

I don't see any reason it will not reserve memory. That doesn't seem to
be the issue.  What is not guaranteed is that it will be in the same
location as last time. So the name is correct. It's not
"reserve_consistent_memory" ;-)

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-18 12:55   ` Zhengyejian
@ 2024-06-18 16:06     ` Steven Rostedt
  2024-06-19  6:42       ` Mike Rapoport
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2024-06-18 16:06 UTC (permalink / raw)
  To: Zhengyejian
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells, Mike Rapoport,
	Guilherme G. Piccoli

On Tue, 18 Jun 2024 20:55:17 +0800
Zhengyejian <zhengyejian@huaweicloud.com> wrote:

> > +	start = memblock_phys_alloc(size, align);
> > +	if (!start)
> > +		return -ENOMEM;
> > +
> > +	reserved_mem_add(start, size, name);
> > +
> > +	return 0;  
> 
> Hi, steve, should here return 1 ? Other __setup functions
> return 1 when it success.

Ug, you're correct.

Mike, want me to send a v7?

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-18 16:06     ` Steven Rostedt
@ 2024-06-19  6:42       ` Mike Rapoport
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Rapoport @ 2024-06-19  6:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Zhengyejian, linux-kernel, linux-trace-kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers, Andrew Morton, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells,
	Guilherme G. Piccoli

On Tue, Jun 18, 2024 at 12:06:27PM -0400, Steven Rostedt wrote:
> On Tue, 18 Jun 2024 20:55:17 +0800
> Zhengyejian <zhengyejian@huaweicloud.com> wrote:
> 
> > > +	start = memblock_phys_alloc(size, align);
> > > +	if (!start)
> > > +		return -ENOMEM;
> > > +
> > > +	reserved_mem_add(start, size, name);
> > > +
> > > +	return 0;  
> > 
> > Hi, steve, should here return 1 ? Other __setup functions
> > return 1 when it success.
> 
> Ug, you're correct.
> 
> Mike, want me to send a v7?

I can fix it up while applying.
 
> -- Steve

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-18 10:21           ` Ard Biesheuvel
  2024-06-18 11:47             ` Alexander Graf
@ 2024-06-19  8:02             ` Mike Rapoport
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Rapoport @ 2024-06-19  8:02 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Alexander Graf, Steven Rostedt, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

On Tue, Jun 18, 2024 at 12:21:19PM +0200, Ard Biesheuvel wrote:
> On Mon, 17 Jun 2024 at 23:01, Alexander Graf <graf@amazon.com> wrote:
>
> So I would personally prefer for this code not to go in at all. But if
> it does go in (and Steven has already agreed to this), it needs a
> giant disclaimer that it is best effort and may get broken
> inadvertently by changes that may seem unrelated.

I think that reserve_mem= is way less intrusive than memmap= we anyway
have.
It will reserve memory and the documentation adequately warns that the
location of that memory might be at different physical addresses.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  2024-06-13 15:55 [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
                   ` (2 preceding siblings ...)
  2024-06-13 16:54 ` [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Alexander Graf
@ 2024-06-19 21:53 ` Mike Rapoport
  3 siblings, 0 replies; 20+ messages in thread
From: Mike Rapoport @ 2024-06-19 21:53 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, Steven Rostedt
  Cc: Mike Rapoport, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Vincent Donnefort, Joel Fernandes,
	Daniel Bristot de Oliveira, Ingo Molnar, Peter Zijlstra, suleiman,
	Thomas Gleixner, Vineeth Pillai, Youssef Esmat, Beau Belgrave,
	Alexander Graf, Baoquan He, Borislav Petkov, Paul E. McKenney,
	David Howells

From: Mike Rapoport (IBM) <rppt@kernel.org>

On Thu, 13 Jun 2024 11:55:06 -0400, Steven Rostedt wrote:
> Reserve unspecified location of physical memory from kernel command line
> 
> Background:
> 
> In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract
> dmesg output and some other information when a crash happens in the field.
> (This is only done when the user selects "Allow Google to collect data for
>  improving the system"). But there are cases when there's a bug that
> requires more data to be retrieved to figure out what is happening. We would
> like to increase the pstore size, either temporarily, or maybe even
> permanently. The pstore on these devices are at a fixed location in RAM (as
> the RAM is not cleared on soft reboots nor crashes). The location is chosen
> by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86.
> There's a driver that queries for this to initialize the pstore for
> ChromeOS:
> 
> [...]

Applied to for-next branch of memblock.git tree, thanks!

[0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
      commit: 1e4c64b71c9bf230b25fde12cbcceacfdc8b3332
[1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up
      commit: 1e4c64b71c9bf230b25fde12cbcceacfdc8b3332
[2/2] pstore/ramoops: Add ramoops.mem_name= command line option
      commit: d9d814eebb1ae9742e7fd7f39730653b16326bd4

tree: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
branch: for-next

--
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-06-19 21:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 15:55 [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Steven Rostedt
2024-06-13 15:55 ` [PATCH v6 1/2] " Steven Rostedt
2024-06-18 12:55   ` Zhengyejian
2024-06-18 16:06     ` Steven Rostedt
2024-06-19  6:42       ` Mike Rapoport
2024-06-13 15:55 ` [PATCH v6 2/2] pstore/ramoops: Add ramoops.mem_name= command line option Steven Rostedt
2024-06-13 18:19   ` Guilherme G. Piccoli
2024-06-13 18:42     ` Steven Rostedt
2024-06-13 18:51       ` Guilherme G. Piccoli
2024-06-13 16:54 ` [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up Alexander Graf
2024-06-13 17:12   ` Steven Rostedt
2024-06-17  7:07     ` Alexander Graf
2024-06-17 20:40       ` Steven Rostedt
2024-06-17 21:01         ` Alexander Graf
2024-06-17 21:18           ` Steven Rostedt
2024-06-18 10:21           ` Ard Biesheuvel
2024-06-18 11:47             ` Alexander Graf
2024-06-18 15:43               ` Steven Rostedt
2024-06-19  8:02             ` Mike Rapoport
2024-06-19 21:53 ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).