[RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
@ 2025-03-14  5:04 Stephen Eta Zhou
  2025-03-17  7:21 ` David Disseldorp
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Eta Zhou @ 2025-03-14  5:04 UTC (permalink / raw)
  To: jsperbeck@google.com, akpm@linux-foundation.org, ddiss@suse.de
  Cc: gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org

From 3499daeb5caf934f08a485027b5411f9ef82d6be Mon Sep 17 00:00:00 2001
From: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
Date: Fri, 14 Mar 2025 12:32:59 +0800
Subject: [PATCH] initramfs: Add size validation to prevent tmpfs exhaustion

When initramfs is loaded into a small memory environment, if its size
exceeds the tmpfs max blocks limit, the loading will fail. Additionally,
if the required blocks are close to the tmpfs max blocks boundary,
subsequent drivers or subsystems using tmpfs may fail to initialize.

To prevent this, the size limit is set to half of tmpfs max blocks.
This ensures that initramfs can complete its mission without exhausting
tmpfs resources, as user-space programs may also rely on tmpfs after boot.

This patch adds a validation mechanism to check the decompressed size
of initramfs based on its compression type and ratio. If the required
blocks exceed half of the tmpfs max blocks limit, the loading will be
aborted with an appropriate error message, exposing the issue early
and preventing further escalation.

Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
---
 init/initramfs.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 162 insertions(+)

diff --git a/init/initramfs.c b/init/initramfs.c
index b2f7583bb1f5..dadda0a42b48 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -497,6 +497,157 @@ static unsigned long my_inptr __initdata; /* index of next byte to be processed
 
 #include <linux/decompress/generic.h>
 
+#ifdef CONFIG_TMPFS
+/*
+ * struct compress_info - Describes a compression method.
+ *
+ * @magic: Magic numbers to identify the compression method (e.g., GZIP, XZ, etc.).
+ *         Each magic number is a byte array of maximum length 256.
+ *         The first dimension (2) represents the number of possible magic numbers.
+ * @rate: Compression ratio, calculated as R = (compressed size / original size) * 100.
+ *        The value is in percentage (0-100).
+ * @mark: Name of the compression scheme (e.g., "GZIP", "XZ").
+ * @len: Length of each magic byte array. Used for comparison with memcmp.
+ *       The first dimension (2) corresponds to the number of magic numbers.
+ * @magic_max: Maximum number of magic numbers supported (used when multiple magics are possible).
+ */
+struct compress_info {
+     unsigned char magic[2][256];
+     unsigned long rate;
+     char *mark;
+     size_t len[2];
+     size_t magic_max;
+};
+
+static struct compress_info cfm[] __initdata = {
+     {
+           .mark = "Gzip",
+           .magic = { { 0x1F, 0x8B } },
+           .len = { 2 },
+           .rate = 43,
+           .magic_max = 1,
+     },
+     {
+           .mark = "Bzip2",
+           .magic = { { 0x42, 0x5A, 0x68 } },
+           .len = { 3 },
+           .rate = 22,
+           .magic_max = 1,
+     },
+     {
+           .mark = "LZMA",
+           .magic = { { 0x5D, 0x00, 0x00 }, { 0xFF, 0x5D, 0x00 } },
+           .len = { 3, 3 },
+           .rate = 5,
+           .magic_max = 2,
+     },
+     {
+           .mark = "XZ",
+           .magic = { { 0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00 } },
+           .len = { 6 },
+           .rate = 7,
+           .magic_max = 1,
+     },
+     {
+           .mark = "LZO",
+           .magic = { { 0x89, 0x4C, 0x5A, 0x4F, 0x00, 0x0D, 0x0A, 0x1A, 0x0A } },
+           .len = { 9 },
+           .rate = 47,
+           .magic_max = 1,
+     },
+     {
+           .mark = "LZ4",
+           .magic = {
+                             { 0x04, 0x22, 0x4D, 0x18 },
+                             { 0x02, 0x21, 0x4C, 0x18 }
+                        },
+           .len = { 4 },
+           .rate = 52,
+           .magic_max = 2,
+     },
+     {
+           .mark = "ZSTD",
+           .magic = { { 0x28, 0xB5, 0x2F, 0xFD } },
+           .len = { 4 },
+           .rate = 7,
+           .magic_max = 1,
+     },
+     {
+           .mark = "None",
+           .magic = {
+                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x31 },
+                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x32 }
+                        },
+           .len = { 6, 6 },
+           .rate = 0,
+           .magic_max = 2,
+     },
+};
+
+static int __init validate_rootfs_size(char *buf, unsigned long len)
+{
+     unsigned long i, j, result, quotient, half_tmpfs_blocks;
+
+     /*
+      * Calculate how many blocks are needed to decompress
+      * and check if they are within a reasonable range.
+      */
+     for (i = 0; i < ARRAY_SIZE(cfm); ++i) {
+           for (j = 0; j < cfm[i].magic_max; ++j) {
+                 if (memcmp(buf, cfm[i].magic[j], cfm[i].len[j]) == 0) {
+                       pr_debug("Compression method: %\n", cfm[i].mark);
+                       /*
+                        * The calculation is divided into three steps:
+                        * 1. Calculate the decompressed size based on the ratio.
+                        * 2. Check for potential overflow risks and ensure that
+                        *    the temporary decompressed
+                        *    initramfs does not exceed the maximum range of 2^(32/64),
+                        *    ensuring that the initramfs size does not approach the
+                        *    memory addressing limit (this cannot be fully guaranteed).
+                        * 3. Determine whether the required page size exceeds 1/4 of
+                        *    the total memory pages, restricting it from using excessively
+                        *    large amounts of memory pages.
+                        *
+                        * Note1: Here, `len` cannot be directly multiplied by 100,
+                        *        as it may cause overflow.
+                        *        Dividing by `rate` first and then multiplying by 100
+                        *        can effectively reduce the risk of overflow.
+                        *
+                        * Note2: Due to integer division and rounding,
+                        *        the calculated size may deviate by a few MB.
+                        */
+                       quotient = len / cfm[i].rate;
+
+                       if (quotient > ULONG_MAX / 100)
+                             goto err_overflow;
+                       else
+                             result = (quotient * 100) / PAGE_SIZE;
+
+                       /*
+                        * totalram_pages() / 2 = tmpfs max blocks
+                        */
+                       half_tmpfs_blocks = (totalram_pages() / 2) / 2;
+                       if (result > half_tmpfs_blocks)
+                             goto err_nomem;
+
+                       return 0;
+                 }
+           }
+     }
+
+     pr_err("This compression format is not supported.\n");
+     return -EOPNOTSUPP;
+
+err_overflow:
+     pr_err("Decompressed size overflow!\n");
+     return -ERANGE;
+err_nomem:
+     pr_err("Decompressed size exceeds tmpfs max blocks limit!\n");
+     return -ENOMEM;
+
+}
+#endif
+
 static char * __init unpack_to_rootfs(char *buf, unsigned long len)
 {
      long written;
@@ -504,6 +655,17 @@ static char * __init unpack_to_rootfs(char *buf, unsigned long len)
      const char *compress_name;
      static __initdata char msg_buf[64];
 
+#ifdef CONFIG_TMPFS
+     int ret = validate_rootfs_size(buf, len);
+
+     if (ret) {
+           snprintf(msg_buf, sizeof(msg_buf),
+                       "Rootfs does not comply with the rules, error code: %d", ret);
+           message = msg_buf;
+           return message;
+     }
+#endif
+
      header_buf = kmalloc(110, GFP_KERNEL);
      symlink_buf = kmalloc(PATH_MAX + N_ALIGN(PATH_MAX) + 1, GFP_KERNEL);
      name_buf = kmalloc(N_ALIGN(PATH_MAX), GFP_KERNEL);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-14  5:04 [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion Stephen Eta Zhou
@ 2025-03-17  7:21 ` David Disseldorp
  2025-03-17  9:41   ` Stephen Eta Zhou
  2025-03-18 11:55   ` Krzysztof Kozlowski
  0 siblings, 2 replies; 12+ messages in thread
From: David Disseldorp @ 2025-03-17  7:21 UTC (permalink / raw)
  To: Stephen Eta Zhou
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

[cc'ing fsdevel]

Hi,

On Fri, 14 Mar 2025 05:04:58 +0000, Stephen Eta Zhou wrote:

> From 3499daeb5caf934f08a485027b5411f9ef82d6be Mon Sep 17 00:00:00 2001
> From: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
> Date: Fri, 14 Mar 2025 12:32:59 +0800
> Subject: [PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
> 
> When initramfs is loaded into a small memory environment, if its size
> exceeds the tmpfs max blocks limit, the loading will fail. Additionally,
> if the required blocks are close to the tmpfs max blocks boundary,
> subsequent drivers or subsystems using tmpfs may fail to initialize.
> 
> To prevent this, the size limit is set to half of tmpfs max blocks.
> This ensures that initramfs can complete its mission without exhausting
> tmpfs resources, as user-space programs may also rely on tmpfs after boot.
>
> This patch adds a validation mechanism to check the decompressed size
> of initramfs based on its compression type and ratio. If the required
> blocks exceed half of the tmpfs max blocks limit, the loading will be
> aborted with an appropriate error message, exposing the issue early
> and preventing further escalation.

This behaviour appears fragile and quite arbitrary. I don't think
initramfs should be responsible for making any of these decisions.

Why can't the init binary make the decision of whether or not the amount
of free memory remaining is sufficient for user-space, instead of this
magic 50% limit?

What are you trying to achieve by failing in this way before initramfs
extraction instead of during / after?

> Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
> ---
>  init/initramfs.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 162 insertions(+)
> 
> diff --git a/init/initramfs.c b/init/initramfs.c
> index b2f7583bb1f5..dadda0a42b48 100644
> --- a/init/initramfs.c
> +++ b/init/initramfs.c
> @@ -497,6 +497,157 @@ static unsigned long my_inptr __initdata; /* index of next byte to be processed
>  
>  #include <linux/decompress/generic.h>
>  
> +#ifdef CONFIG_TMPFS
> +/*
> + * struct compress_info - Describes a compression method.
> + *
> + * @magic: Magic numbers to identify the compression method (e.g., GZIP, XZ, etc.).
> + *         Each magic number is a byte array of maximum length 256.
> + *         The first dimension (2) represents the number of possible magic numbers.
> + * @rate: Compression ratio, calculated as R = (compressed size / original size) * 100.
> + *        The value is in percentage (0-100).
> + * @mark: Name of the compression scheme (e.g., "GZIP", "XZ").
> + * @len: Length of each magic byte array. Used for comparison with memcmp.
> + *       The first dimension (2) corresponds to the number of magic numbers.
> + * @magic_max: Maximum number of magic numbers supported (used when multiple magics are possible).
> + */
> +struct compress_info {
> +     unsigned char magic[2][256];
> +     unsigned long rate;
> +     char *mark;
> +     size_t len[2];
> +     size_t magic_max;
> +};

initramfs doesn't have much knowledge of underlying compression details.
This seems like a pretty significant layering violation...

> +
> +static struct compress_info cfm[] __initdata = {
> +     {
> +           .mark = "Gzip",
> +           .magic = { { 0x1F, 0x8B } },
> +           .len = { 2 },
> +           .rate = 43,
> +           .magic_max = 1,
> +     },
> +     {
> +           .mark = "Bzip2",
> +           .magic = { { 0x42, 0x5A, 0x68 } },
> +           .len = { 3 },
> +           .rate = 22,
> +           .magic_max = 1,
> +     },
> +     {
> +           .mark = "LZMA",
> +           .magic = { { 0x5D, 0x00, 0x00 }, { 0xFF, 0x5D, 0x00 } },
> +           .len = { 3, 3 },
> +           .rate = 5,
> +           .magic_max = 2,
> +     },
> +     {
> +           .mark = "XZ",
> +           .magic = { { 0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00 } },
> +           .len = { 6 },
> +           .rate = 7,
> +           .magic_max = 1,
> +     },
> +     {
> +           .mark = "LZO",
> +           .magic = { { 0x89, 0x4C, 0x5A, 0x4F, 0x00, 0x0D, 0x0A, 0x1A, 0x0A } },
> +           .len = { 9 },
> +           .rate = 47,
> +           .magic_max = 1,
> +     },
> +     {
> +           .mark = "LZ4",
> +           .magic = {
> +                             { 0x04, 0x22, 0x4D, 0x18 },
> +                             { 0x02, 0x21, 0x4C, 0x18 }
> +                        },
> +           .len = { 4 },
> +           .rate = 52,
> +           .magic_max = 2,
> +     },
> +     {
> +           .mark = "ZSTD",
> +           .magic = { { 0x28, 0xB5, 0x2F, 0xFD } },
> +           .len = { 4 },
> +           .rate = 7,
> +           .magic_max = 1,
> +     },
> +     {
> +           .mark = "None",
> +           .magic = {
> +                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x31 },
> +                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x32 }
> +                        },
> +           .len = { 6, 6 },
> +           .rate = 0,

This will trigger a divide by zero below.

> +           .magic_max = 2,
> +     },
> +};
> +
> +static int __init validate_rootfs_size(char *buf, unsigned long len)
> +{
> +     unsigned long i, j, result, quotient, half_tmpfs_blocks;
> +
> +     /*
> +      * Calculate how many blocks are needed to decompress
> +      * and check if they are within a reasonable range.
> +      */
> +     for (i = 0; i < ARRAY_SIZE(cfm); ++i) {
> +           for (j = 0; j < cfm[i].magic_max; ++j) {
> +                 if (memcmp(buf, cfm[i].magic[j], cfm[i].len[j]) == 0) {
> +                       pr_debug("Compression method: %\n", cfm[i].mark);
> +                       /*
> +                        * The calculation is divided into three steps:
> +                        * 1. Calculate the decompressed size based on the ratio.
> +                        * 2. Check for potential overflow risks and ensure that
> +                        *    the temporary decompressed
> +                        *    initramfs does not exceed the maximum range of 2^(32/64),
> +                        *    ensuring that the initramfs size does not approach the
> +                        *    memory addressing limit (this cannot be fully guaranteed).
> +                        * 3. Determine whether the required page size exceeds 1/4 of
> +                        *    the total memory pages, restricting it from using excessively
> +                        *    large amounts of memory pages.
> +                        *
> +                        * Note1: Here, `len` cannot be directly multiplied by 100,
> +                        *        as it may cause overflow.
> +                        *        Dividing by `rate` first and then multiplying by 100
> +                        *        can effectively reduce the risk of overflow.
> +                        *
> +                        * Note2: Due to integer division and rounding,
> +                        *        the calculated size may deviate by a few MB.
> +                        */
> +                       quotient = len / cfm[i].rate;
> +
> +                       if (quotient > ULONG_MAX / 100)
> +                             goto err_overflow;
> +                       else
> +                             result = (quotient * 100) / PAGE_SIZE;
> +
> +                       /*
> +                        * totalram_pages() / 2 = tmpfs max blocks
> +                        */
> +                       half_tmpfs_blocks = (totalram_pages() / 2) / 2;
> +                       if (result > half_tmpfs_blocks)
> +                             goto err_nomem;

See Documentation/driver-api/early-userspace/buffer-format.rst .
Initramfs images can be made up of several concatenated cpio archives,
which would throw off these calculations.

> +
> +                       return 0;
> +                 }
> +           }
> +     }
> +
> +     pr_err("This compression format is not supported.\n");
> +     return -EOPNOTSUPP;
> +
> +err_overflow:
> +     pr_err("Decompressed size overflow!\n");
> +     return -ERANGE;
> +err_nomem:
> +     pr_err("Decompressed size exceeds tmpfs max blocks limit!\n");
> +     return -ENOMEM;
> +
> +}
> +#endif
> +
>  static char * __init unpack_to_rootfs(char *buf, unsigned long len)
>  {
>       long written;
> @@ -504,6 +655,17 @@ static char * __init unpack_to_rootfs(char *buf, unsigned long len)
>       const char *compress_name;
>       static __initdata char msg_buf[64];
>  
> +#ifdef CONFIG_TMPFS
> +     int ret = validate_rootfs_size(buf, len);
> +
> +     if (ret) {
> +           snprintf(msg_buf, sizeof(msg_buf),
> +                       "Rootfs does not comply with the rules, error code: %d", ret);
> +           message = msg_buf;
> +           return message;
> +     }
> +#endif
> +
>       header_buf = kmalloc(110, GFP_KERNEL);
>       symlink_buf = kmalloc(PATH_MAX + N_ALIGN(PATH_MAX) + 1, GFP_KERNEL);
>       name_buf = kmalloc(N_ALIGN(PATH_MAX), GFP_KERNEL);
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-17  7:21 ` David Disseldorp
@ 2025-03-17  9:41   ` Stephen Eta Zhou
  2025-03-18  1:14     ` David Disseldorp
  2025-03-18 11:55   ` Krzysztof Kozlowski
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Eta Zhou @ 2025-03-17  9:41 UTC (permalink / raw)
  To: David Disseldorp
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

Hi david，

> On Fri, 14 Mar 2025 05:04:58 +0000, Stephen Eta Zhou wrote:



> > From 3499daeb5caf934f08a485027b5411f9ef82d6be Mon Sep 17 00:00:00 2001

> > From: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>

> > Date: Fri, 14 Mar 2025 12:32:59 +0800

> > Subject: [PATCH] initramfs: Add size validation to prevent tmpfs exhaustion

> >

> > When initramfs is loaded into a small memory environment, if its size

> > exceeds the tmpfs max blocks limit, the loading will fail. Additionally,

> > if the required blocks are close to the tmpfs max blocks boundary,

>>  subsequent drivers or subsystems using tmpfs may fail to initialize.

> >

> > To prevent this, the size limit is set to half of tmpfs max blocks.

> > This ensures that initramfs can complete its mission without exhausting

> > tmpfs resources, as user-space programs may also rely on tmpfs after boot.

> >

> > This patch adds a validation mechanism to check the decompressed size

> > of initramfs based on its compression type and ratio. If the required

> > blocks exceed half of the tmpfs max blocks limit, the loading will be

> > aborted with an appropriate error message, exposing the issue early

> > and preventing further escalation.



> This behaviour appears fragile and quite arbitrary. I don't think

> initramfs should be responsible for making any of these decisions.


> Why can't the init binary make the decision of whether or not the amount

> of free memory remaining is sufficient for user-space, instead of this

> magic 50% limit?


> What are you trying to achieve by failing in this way before initramfs

> extraction instead of during / after?

Before the init process runs, initramfs needs to be decompressed to tmpfs and become the root file system (rootfs). If there is insufficient tmpfs space after decompression, init may not be able to run at all, causing the system to crash or panic.

Letting the init process decide whether it is sufficient means that the initramfs must be decompressed first, which may have filled up tmpfs, making the entire system unusable, rather than a controllable error handling process.

This problem is more obvious in extreme cases, for example:

1. After initramfs is decompressed, there is only a small amount of available space in tmpfs, causing early-user-space tasks such as mount and udevadm to fail, affecting device initialization.

2. On embedded devices, tmpfs is usually configured small, and insufficient space is found after decompression, which directly leads to boot failure.

The reason why the check is performed before decompression is to expose the problem in advance to avoid the passive failure mode of insufficient space after decompression.
Calculating the theoretically required tmpfs resources and making judgments in advance can reduce unnecessary I/O operations and provide clearer error reports to help users adjust the initramfs size or tmpfs configuration.
My idea is to expose problems as early as possible. If problems occur during operation, it may be more troublesome to troubleshoot or bring unnecessary risks.

As for why 50% of the available tmpfs space is reserved, I think initramfs is only a temporary root file system during the boot phase, and tmpfs still needs to provide enough space for user space processes, such as: udev device management, console, serial port driver...log or other user-mode process runtime data. If initramfs is too large and occupies too much tmpfs space, even if the system successfully enters the user state, it may cause further failures due to tmpfs exhaustion. 50% is a conservative but reasonable upper limit to ensure that there are still enough tmpfs resources to support early-user-space after initramfs is successfully decompressed

Of course, this 50% can be adjusted. This is a patch for discussion. If it is willing to be accepted by the community, a config variable can be added to configure it.


>> Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>

>> ---

>>  init/initramfs.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++

>>  1 file changed, 162 insertions(+)

>>

>> diff --git a/init/initramfs.c b/init/initramfs.c

>> index b2f7583bb1f5..dadda0a42b48 100644

>> --- a/init/initramfs.c

>> +++ b/init/initramfs.c

>> @@ -497,6 +497,157 @@ static unsigned long my_inptr __initdata; /* index of next byte to be processed

>>  

>>  #include <linux/decompress/generic.h>

>>  

>> +#ifdef CONFIG_TMPFS

>> +/*

>> + * struct compress_info - Describes a compression method.

>> + *

>> + * @magic: Magic numbers to identify the compression method (e.g., GZIP, XZ, etc.).

>> + *         Each magic number is a byte array of maximum length 256.

>> + *         The first dimension (2) represents the number of possible magic numbers.

>> + * @rate: Compression ratio, calculated as R = (compressed size / original size) * 100.

>> + *        The value is in percentage (0-100).

>> + * @mark: Name of the compression scheme (e.g., "GZIP", "XZ").

>> + * @len: Length of each magic byte array. Used for comparison with memcmp.

>> + *       The first dimension (2) corresponds to the number of magic numbers.

>> + * @magic_max: Maximum number of magic numbers supported (used when multiple magics are possible).

>> + */

>> +struct compress_info {

>> +     unsigned char magic[2][256];

>> +     unsigned long rate;

>> +     char *mark;

>> +     size_t len[2];

>> +     size_t magic_max;

>> +};



> initramfs doesn't have much knowledge of underlying compression details.

> This seems like a pretty significant layering violation...

I wrote this part of the code to demonstrate the correct working of this patch. Later, you can consider using decompress or saving it to a variable during the build so that initramfs can get it directly without calculation.

>> +

>> +static struct compress_info cfm[] __initdata = {

>> +     {

>> +           .mark = "Gzip",

>> +           .magic = { { 0x1F, 0x8B } },

>> +           .len = { 2 },

>> +           .rate = 43,

>> +           .magic_max = 1,

>> +     },

>> +     {

>> +           .mark = "Bzip2",

>> +           .magic = { { 0x42, 0x5A, 0x68 } },

>> +           .len = { 3 },

>> +           .rate = 22,

>> +           .magic_max = 1,

>> +     },

>> +     {

>> +           .mark = "LZMA",

>> +           .magic = { { 0x5D, 0x00, 0x00 }, { 0xFF, 0x5D, 0x00 } },

>> +           .len = { 3, 3 },

>> +           .rate = 5,

>> +           .magic_max = 2,

>> +     },

>> +     {

>> +           .mark = "XZ",

>> +           .magic = { { 0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00 } },

>> +           .len = { 6 },

>> +           .rate = 7,

>> +           .magic_max = 1,

>> +     },

>> +     {

>> +           .mark = "LZO",

>> +           .magic = { { 0x89, 0x4C, 0x5A, 0x4F, 0x00, 0x0D, 0x0A, 0x1A, 0x0A } },

>> +           .len = { 9 },

>> +           .rate = 47,

>> +           .magic_max = 1,

>> +     },

>> +     {

>> +           .mark = "LZ4",

>> +           .magic = {

>> +                             { 0x04, 0x22, 0x4D, 0x18 },

>> +                             { 0x02, 0x21, 0x4C, 0x18 }

>> +                        },

>> +           .len = { 4 },

>> +           .rate = 52,

>> +           .magic_max = 2,

>> +     },

>> +     {

>> +           .mark = "ZSTD",

>> +           .magic = { { 0x28, 0xB5, 0x2F, 0xFD } },

>> +           .len = { 4 },

>> +           .rate = 7,

>> +           .magic_max = 1,

>> +     },

>> +     {

>> +           .mark = "None",

>> +           .magic = {

>> +                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x31 },

>> +                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x32 }

>> +                        },

>> +           .len = { 6, 6 },

>> +           .rate = 0,



> This will trigger a divide by zero below.

Thanks, you reminded me, if this patch is worth continuing, I will fix it in v2

>> +           .magic_max = 2,

>> +     },

>> +};

>> +

>> +static int __init validate_rootfs_size(char *buf, unsigned long len)

>> +{

>> +     unsigned long i, j, result, quotient, half_tmpfs_blocks;

>> +

>> +     /*

>> +      * Calculate how many blocks are needed to decompress

>> +      * and check if they are within a reasonable range.

>> +      */

>> +     for (i = 0; i < ARRAY_SIZE(cfm); ++i) {

>> +           for (j = 0; j < cfm[i].magic_max; ++j) {

>> +                 if (memcmp(buf, cfm[i].magic[j], cfm[i].len[j]) == 0) {

>> +                       pr_debug("Compression method: %\n", cfm[i].mark);

>> +                       /*

>> +                        * The calculation is divided into three steps:

>> +                        * 1. Calculate the decompressed size based on the ratio.

>> +                        * 2. Check for potential overflow risks and ensure that

>> +                        *    the temporary decompressed

>> +                        *    initramfs does not exceed the maximum range of 2^(32/64),

>> +                        *    ensuring that the initramfs size does not approach the

>> +                        *    memory addressing limit (this cannot be fully guaranteed).

>> +                        * 3. Determine whether the required page size exceeds 1/4 of

>> +                        *    the total memory pages, restricting it from using excessively

>> +                        *    large amounts of memory pages.

>> +                        *

>> +                        * Note1: Here, `len` cannot be directly multiplied by 100,

>> +                        *        as it may cause overflow.

>> +                        *        Dividing by `rate` first and then multiplying by 100

>> +                        *        can effectively reduce the risk of overflow.

>> +                        *

>> +                        * Note2: Due to integer division and rounding,

>> +                        *        the calculated size may deviate by a few MB.

>> +                        */

>> +                       quotient = len / cfm[i].rate;

>> +

>> +                       if (quotient > ULONG_MAX / 100)

>> +                             goto err_overflow;

>> +                       else

>> +                             result = (quotient * 100) / PAGE_SIZE;

>> +

>> +                       /*

>> +                        * totalram_pages() / 2 = tmpfs max blocks

>> +                        */

>> +                       half_tmpfs_blocks = (totalram_pages() / 2) / 2;

>> +                       if (result > half_tmpfs_blocks)

>> +                             goto err_nomem;



> See Documentation/driver-api/early-userspace/buffer-format.rst .

> Initramfs images can be made up of several concatenated cpio archives,

> which would throw off these calculations.

Thank you for your reminder. As I said before, if this patch is willing to be discussed and accepted by the community, I will add a scanning logic in v2.


Thanks,
Stephen
________________________________________
From: David Disseldorp
Sent: Monday, March 17, 2025 15:21
To: Stephen Eta Zhou
Cc: jsperbeck@google.com; akpm@linux-foundation.org; gregkh@linuxfoundation.org; lukas@wunner.de; wufan@linux.microsoft.com; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion


[cc'ing fsdevel]



Hi,



On Fri, 14 Mar 2025 05:04:58 +0000, Stephen Eta Zhou wrote:



> From 3499daeb5caf934f08a485027b5411f9ef82d6be Mon Sep 17 00:00:00 2001

> From: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>

> Date: Fri, 14 Mar 2025 12:32:59 +0800

> Subject: [PATCH] initramfs: Add size validation to prevent tmpfs exhaustion

>

> When initramfs is loaded into a small memory environment, if its size

> exceeds the tmpfs max blocks limit, the loading will fail. Additionally,

> if the required blocks are close to the tmpfs max blocks boundary,

> subsequent drivers or subsystems using tmpfs may fail to initialize.

>

> To prevent this, the size limit is set to half of tmpfs max blocks.

> This ensures that initramfs can complete its mission without exhausting

> tmpfs resources, as user-space programs may also rely on tmpfs after boot.

>

> This patch adds a validation mechanism to check the decompressed size

> of initramfs based on its compression type and ratio. If the required

> blocks exceed half of the tmpfs max blocks limit, the loading will be

> aborted with an appropriate error message, exposing the issue early

> and preventing further escalation.



This behaviour appears fragile and quite arbitrary. I don't think

initramfs should be responsible for making any of these decisions.



Why can't the init binary make the decision of whether or not the amount

of free memory remaining is sufficient for user-space, instead of this

magic 50% limit?



What are you trying to achieve by failing in this way before initramfs

extraction instead of during / after?



> Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>

> ---

>  init/initramfs.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++

>  1 file changed, 162 insertions(+)

>

> diff --git a/init/initramfs.c b/init/initramfs.c

> index b2f7583bb1f5..dadda0a42b48 100644

> --- a/init/initramfs.c

> +++ b/init/initramfs.c

> @@ -497,6 +497,157 @@ static unsigned long my_inptr __initdata; /* index of next byte to be processed

>  

>  #include <linux/decompress/generic.h>

>  

> +#ifdef CONFIG_TMPFS

> +/*

> + * struct compress_info - Describes a compression method.

> + *

> + * @magic: Magic numbers to identify the compression method (e.g., GZIP, XZ, etc.).

> + *         Each magic number is a byte array of maximum length 256.

> + *         The first dimension (2) represents the number of possible magic numbers.

> + * @rate: Compression ratio, calculated as R = (compressed size / original size) * 100.

> + *        The value is in percentage (0-100).

> + * @mark: Name of the compression scheme (e.g., "GZIP", "XZ").

> + * @len: Length of each magic byte array. Used for comparison with memcmp.

> + *       The first dimension (2) corresponds to the number of magic numbers.

> + * @magic_max: Maximum number of magic numbers supported (used when multiple magics are possible).

> + */

> +struct compress_info {

> +     unsigned char magic[2][256];

> +     unsigned long rate;

> +     char *mark;

> +     size_t len[2];

> +     size_t magic_max;

> +};



initramfs doesn't have much knowledge of underlying compression details.

This seems like a pretty significant layering violation...



> +

> +static struct compress_info cfm[] __initdata = {

> +     {

> +           .mark = "Gzip",

> +           .magic = { { 0x1F, 0x8B } },

> +           .len = { 2 },

> +           .rate = 43,

> +           .magic_max = 1,

> +     },

> +     {

> +           .mark = "Bzip2",

> +           .magic = { { 0x42, 0x5A, 0x68 } },

> +           .len = { 3 },

> +           .rate = 22,

> +           .magic_max = 1,

> +     },

> +     {

> +           .mark = "LZMA",

> +           .magic = { { 0x5D, 0x00, 0x00 }, { 0xFF, 0x5D, 0x00 } },

> +           .len = { 3, 3 },

> +           .rate = 5,

> +           .magic_max = 2,

> +     },

> +     {

> +           .mark = "XZ",

> +           .magic = { { 0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00 } },

> +           .len = { 6 },

> +           .rate = 7,

> +           .magic_max = 1,

> +     },

> +     {

> +           .mark = "LZO",

> +           .magic = { { 0x89, 0x4C, 0x5A, 0x4F, 0x00, 0x0D, 0x0A, 0x1A, 0x0A } },

> +           .len = { 9 },

> +           .rate = 47,

> +           .magic_max = 1,

> +     },

> +     {

> +           .mark = "LZ4",

> +           .magic = {

> +                             { 0x04, 0x22, 0x4D, 0x18 },

> +                             { 0x02, 0x21, 0x4C, 0x18 }

> +                        },

> +           .len = { 4 },

> +           .rate = 52,

> +           .magic_max = 2,

> +     },

> +     {

> +           .mark = "ZSTD",

> +           .magic = { { 0x28, 0xB5, 0x2F, 0xFD } },

> +           .len = { 4 },

> +           .rate = 7,

> +           .magic_max = 1,

> +     },

> +     {

> +           .mark = "None",

> +           .magic = {

> +                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x31 },

> +                             { 0x30, 0x37, 0x30, 0x37, 0x30, 0x32 }

> +                        },

> +           .len = { 6, 6 },

> +           .rate = 0,



This will trigger a divide by zero below.



> +           .magic_max = 2,

> +     },

> +};

> +

> +static int __init validate_rootfs_size(char *buf, unsigned long len)

> +{

> +     unsigned long i, j, result, quotient, half_tmpfs_blocks;

> +

> +     /*

> +      * Calculate how many blocks are needed to decompress

> +      * and check if they are within a reasonable range.

> +      */

> +     for (i = 0; i < ARRAY_SIZE(cfm); ++i) {

> +           for (j = 0; j < cfm[i].magic_max; ++j) {

> +                 if (memcmp(buf, cfm[i].magic[j], cfm[i].len[j]) == 0) {

> +                       pr_debug("Compression method: %\n", cfm[i].mark);

> +                       /*

> +                        * The calculation is divided into three steps:

> +                        * 1. Calculate the decompressed size based on the ratio.

> +                        * 2. Check for potential overflow risks and ensure that

> +                        *    the temporary decompressed

> +                        *    initramfs does not exceed the maximum range of 2^(32/64),

> +                        *    ensuring that the initramfs size does not approach the

> +                        *    memory addressing limit (this cannot be fully guaranteed).

> +                        * 3. Determine whether the required page size exceeds 1/4 of

> +                        *    the total memory pages, restricting it from using excessively

> +                        *    large amounts of memory pages.

> +                        *

> +                        * Note1: Here, `len` cannot be directly multiplied by 100,

> +                        *        as it may cause overflow.

> +                        *        Dividing by `rate` first and then multiplying by 100

> +                        *        can effectively reduce the risk of overflow.

> +                        *

> +                        * Note2: Due to integer division and rounding,

> +                        *        the calculated size may deviate by a few MB.

> +                        */

> +                       quotient = len / cfm[i].rate;

> +

> +                       if (quotient > ULONG_MAX / 100)

> +                             goto err_overflow;

> +                       else

> +                             result = (quotient * 100) / PAGE_SIZE;

> +

> +                       /*

> +                        * totalram_pages() / 2 = tmpfs max blocks

> +                        */

> +                       half_tmpfs_blocks = (totalram_pages() / 2) / 2;

> +                       if (result > half_tmpfs_blocks)

> +                             goto err_nomem;



See Documentation/driver-api/early-userspace/buffer-format.rst .

Initramfs images can be made up of several concatenated cpio archives,

which would throw off these calculations.



> +

> +                       return 0;

> +                 }

> +           }

> +     }

> +

> +     pr_err("This compression format is not supported.\n");

> +     return -EOPNOTSUPP;

> +

> +err_overflow:

> +     pr_err("Decompressed size overflow!\n");

> +     return -ERANGE;

> +err_nomem:

> +     pr_err("Decompressed size exceeds tmpfs max blocks limit!\n");

> +     return -ENOMEM;

> +

> +}

> +#endif

> +

>  static char * __init unpack_to_rootfs(char *buf, unsigned long len)

>  {

>       long written;

> @@ -504,6 +655,17 @@ static char * __init unpack_to_rootfs(char *buf, unsigned long len)

>       const char *compress_name;

>       static __initdata char msg_buf[64];

>  

> +#ifdef CONFIG_TMPFS

> +     int ret = validate_rootfs_size(buf, len);

> +

> +     if (ret) {

> +           snprintf(msg_buf, sizeof(msg_buf),

> +                       "Rootfs does not comply with the rules, error code: %d", ret);

> +           message = msg_buf;

> +           return message;

> +     }

> +#endif

> +

>       header_buf = kmalloc(110, GFP_KERNEL);

>       symlink_buf = kmalloc(PATH_MAX + N_ALIGN(PATH_MAX) + 1, GFP_KERNEL);

>       name_buf = kmalloc(N_ALIGN(PATH_MAX), GFP_KERNEL);

> -- 

> 2.25.1

>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-17  9:41   ` Stephen Eta Zhou
@ 2025-03-18  1:14     ` David Disseldorp
  2025-03-18  4:47       ` Stephen Eta Zhou
  2025-03-18  6:28       ` Stephen Eta Zhou
  0 siblings, 2 replies; 12+ messages in thread
From: David Disseldorp @ 2025-03-18  1:14 UTC (permalink / raw)
  To: Stephen Eta Zhou
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On Mon, 17 Mar 2025 09:41:35 +0000, Stephen Eta Zhou wrote:
...
> Before the init process runs, initramfs needs to be decompressed to tmpfs and become the root file system (rootfs). If there is insufficient tmpfs space after decompression, init may not be able to run at all, causing the system to crash or panic.
> 
> Letting the init process decide whether it is sufficient means that the initramfs must be decompressed first, which may have filled up tmpfs, making the entire system unusable, rather than a controllable error handling process.
> 
> This problem is more obvious in extreme cases, for example:
> 
> 1. After initramfs is decompressed, there is only a small amount of available space in tmpfs, causing early-user-space tasks such as mount and udevadm to fail, affecting device initialization.

It's still not clear to me why early-user-space can't determine this
before attempting to mount, etc. It's in a better position to know the
resource requirements of what it's going to run.

> 2. On embedded devices, tmpfs is usually configured small, and insufficient space is found after decompression, which directly leads to boot failure.
> 
> The reason why the check is performed before decompression is to expose the problem in advance to avoid the passive failure mode of insufficient space after decompression.
> Calculating the theoretically required tmpfs resources and making judgments in advance can reduce unnecessary I/O operations and provide clearer error reports to help users adjust the initramfs size or tmpfs configuration.
> My idea is to expose problems as early as possible. If problems occur during operation, it may be more troublesome to troubleshoot or bring unnecessary risks.

There's room for improvement WRT how out-of-memory failures are reported
and handled during decompression and I/O. However, adding an extra pass
and some arbitrary free-space logic doesn't improve the situation IMO.

Cheers, David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-18  1:14     ` David Disseldorp
@ 2025-03-18  4:47       ` Stephen Eta Zhou
  2025-03-18  6:28       ` Stephen Eta Zhou
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Eta Zhou @ 2025-03-18  4:47 UTC (permalink / raw)
  To: David Disseldorp
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

> I0t's still not clear to me why early-user-space can't determine this
> before attempting to mount, etc. It's in a better position to know the
> resource requirements of what it's going to run.

Before init runs, initramfs is unpacked to tmpfs, and if tmpfs runs out of resources, the system may already be in an unrecoverable state (crash or panic), which can cause problems: the system may be in an unrecoverable state before init has a chance to check.
Unrecoverable state can come from drivers or subsystems that cannot create files in tmpfs. For example, if the rootfs is created on a system with very limited RAM (e.g., an embedded device with 64MB RAM), unpacking a large initramfs may leave so little space that even init cannot allocate memory.
The kernel error log may show OOM errors, but the underlying problem is that the initramfs is too large and there is not enough space left to support subsequent programs.

So I think the kernel should provide protection during decompression, so that if tmpfs is insufficient, the kernel can handle the error gracefully and accurately, without running at the risk of panic or crash. At least when such a situation occurs, developers will know where the problem is at a glance, and spend too much energy to gradually troubleshoot the problem, and it can also strengthen the kernel's ability to handle problems early.

Sometimes when we troubleshoot errors, we always check step by step, and finally find that the problem may have occurred in the first link or some parameter configuration at the beginning of startup. In this case, why can't we expose the problem as soon as possible?

> There's room for improvement WRT how out-of-memory failures are reported

> and handled during decompression and I/O. However, adding an extra pass

> and some arbitrary free-space logic doesn't improve the situation IMO.

My intention is that pre-checking free space before unpacking ensures that failures are predictable and recoverable. This check can be a simple size comparison between the initramfs archive and the free tmpfs space, which has minimal overhead compared to a full unpack and subsequent failure.
Proactive checking prevents unnecessary I/O operations and reduces the complexity of debugging at boot time.
This is done to provide early assurance to the kernel.

Thanks,
Stephen
________________________________________
From: David Disseldorp
Sent: Tuesday, March 18, 2025 09:14
To: Stephen Eta Zhou
Cc: jsperbeck@google.com; akpm@linux-foundation.org; gregkh@linuxfoundation.org; lukas@wunner.de; wufan@linux.microsoft.com; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion

On Mon, 17 Mar 2025 09:41:35 +0000, Stephen Eta Zhou wrote:

...

> Before the init process runs, initramfs needs to be decompressed to tmpfs and become the root file system (rootfs). If there is insufficient tmpfs space after decompression, init may not be able to run at all, causing the system to crash or panic.

>

> Letting the init process decide whether it is sufficient means that the initramfs must be decompressed first, which may have filled up tmpfs, making the entire system unusable, rather than a controllable error handling process.

>

> This problem is more obvious in extreme cases, for example:

>

> 1. After initramfs is decompressed, there is only a small amount of available space in tmpfs, causing early-user-space tasks such as mount and udevadm to fail, affecting device initialization.

It's still not clear to me why early-user-space can't determine this

before attempting to mount, etc. It's in a better position to know the

resource requirements of what it's going to run.

> 2. On embedded devices, tmpfs is usually configured small, and insufficient space is found after decompression, which directly leads to boot failure.

>

> The reason why the check is performed before decompression is to expose the problem in advance to avoid the passive failure mode of insufficient space after decompression.

> Calculating the theoretically required tmpfs resources and making judgments in advance can reduce unnecessary I/O operations and provide clearer error reports to help users adjust the initramfs size or tmpfs configuration.

> My idea is to expose problems as early as possible. If problems occur during operation, it may be more troublesome to troubleshoot or bring unnecessary risks.

There's room for improvement WRT how out-of-memory failures are reported

and handled during decompression and I/O. However, adding an extra pass

and some arbitrary free-space logic doesn't improve the situation IMO.

Cheers, David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-18  1:14     ` David Disseldorp
  2025-03-18  4:47       ` Stephen Eta Zhou
@ 2025-03-18  6:28       ` Stephen Eta Zhou
  2025-03-18  9:51         ` David Disseldorp
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Eta Zhou @ 2025-03-18  6:28 UTC (permalink / raw)
  To: David Disseldorp
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

> There's room for improvement WRT how out-of-memory failures are reported

I am currently trying to find a good optimization solution for this. Since initramfs is decompressed in the early stage of the kernel, if the decompression fails, it will call panic to put the kernel into a panic state. There is a contradiction: at this time, the console and serial port have not been initialized yet, which will cause the error message to fail to be output, resulting in a suspended state, and no valid output can be seen.
________________________________________
From: David Disseldorp <ddiss@suse.de>
Sent: Tuesday, March 18, 2025 09:14
To: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
Cc: jsperbeck@google.com <jsperbeck@google.com>; akpm@linux-foundation.org <akpm@linux-foundation.org>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; lukas@wunner.de <lukas@wunner.de>; wufan@linux.microsoft.com <wufan@linux.microsoft.com>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion

On Mon, 17 Mar 2025 09:41:35 +0000, Stephen Eta Zhou wrote:
...
> Before the init process runs, initramfs needs to be decompressed to tmpfs and become the root file system (rootfs). If there is insufficient tmpfs space after decompression, init may not be able to run at all, causing the system to crash or panic.
>
> Letting the init process decide whether it is sufficient means that the initramfs must be decompressed first, which may have filled up tmpfs, making the entire system unusable, rather than a controllable error handling process.
>
> This problem is more obvious in extreme cases, for example:
>
> 1. After initramfs is decompressed, there is only a small amount of available space in tmpfs, causing early-user-space tasks such as mount and udevadm to fail, affecting device initialization.

It's still not clear to me why early-user-space can't determine this
before attempting to mount, etc. It's in a better position to know the
resource requirements of what it's going to run.

> 2. On embedded devices, tmpfs is usually configured small, and insufficient space is found after decompression, which directly leads to boot failure.
>
> The reason why the check is performed before decompression is to expose the problem in advance to avoid the passive failure mode of insufficient space after decompression.
> Calculating the theoretically required tmpfs resources and making judgments in advance can reduce unnecessary I/O operations and provide clearer error reports to help users adjust the initramfs size or tmpfs configuration.
> My idea is to expose problems as early as possible. If problems occur during operation, it may be more troublesome to troubleshoot or bring unnecessary risks.

There's room for improvement WRT how out-of-memory failures are reported
and handled during decompression and I/O. However, adding an extra pass
and some arbitrary free-space logic doesn't improve the situation IMO.

Cheers, David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-18  6:28       ` Stephen Eta Zhou
@ 2025-03-18  9:51         ` David Disseldorp
  2025-03-18 12:36           ` Stephen Eta Zhou
  0 siblings, 1 reply; 12+ messages in thread
From: David Disseldorp @ 2025-03-18  9:51 UTC (permalink / raw)
  To: Stephen Eta Zhou
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On Tue, 18 Mar 2025 06:28:53 +0000, Stephen Eta Zhou wrote:

> > There's room for improvement WRT how out-of-memory failures are reported  
> 
> I am currently trying to find a good optimization solution for this. Since initramfs is decompressed in the early stage of the kernel, if the decompression fails, it will call panic to put the kernel into a panic state.

Not always. The *built-in* initramfs unpack_to_rootfs() error path
panics, but external initramfs unpack_to_rootfs() failure won't panic
immediately...

> There is a contradiction: at this time, the console and serial port have not been initialized yet, which will cause the error message to fail to be output, resulting in a suspended state, and no valid output can be seen.

Are your console/serial drivers loaded as external modules? That sounds
like a configuration problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-17  7:21 ` David Disseldorp
  2025-03-17  9:41   ` Stephen Eta Zhou
@ 2025-03-18 11:55   ` Krzysztof Kozlowski
  2025-03-18 12:46     ` Stephen Eta Zhou
  1 sibling, 1 reply; 12+ messages in thread
From: Krzysztof Kozlowski @ 2025-03-18 11:55 UTC (permalink / raw)
  To: David Disseldorp, Stephen Eta Zhou
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On 17/03/2025 08:21, David Disseldorp wrote:
>> From 3499daeb5caf934f08a485027b5411f9ef82d6be Mon Sep 17 00:00:00 2001
>> From: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
>> Date: Fri, 14 Mar 2025 12:32:59 +0800
>> Subject: [PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
>>
>> When initramfs is loaded into a small memory environment, if its size
>> exceeds the tmpfs max blocks limit, the loading will fail. Additionally,
>> if the required blocks are close to the tmpfs max blocks boundary,
>> subsequent drivers or subsystems using tmpfs may fail to initialize.
>>
>> To prevent this, the size limit is set to half of tmpfs max blocks.
>> This ensures that initramfs can complete its mission without exhausting
>> tmpfs resources, as user-space programs may also rely on tmpfs after boot.
>>
>> This patch adds a validation mechanism to check the decompressed size
>> of initramfs based on its compression type and ratio. If the required
>> blocks exceed half of the tmpfs max blocks limit, the loading will be
>> aborted with an appropriate error message, exposing the issue early
>> and preventing further escalation.
> 
> This behaviour appears fragile and quite arbitrary. I don't think
> initramfs should be responsible for making any of these decisions.
> 
> Why can't the init binary make the decision of whether or not the amount
> of free memory remaining is sufficient for user-space, instead of this
> magic 50% limit?
> 
> What are you trying to achieve by failing in this way before initramfs
> extraction instead of during / after?

Just in case before anyone tries to actually apply it: the entire patch
has both corrupted header and actual patch is corrupted - all
indentation messed.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-18  9:51         ` David Disseldorp
@ 2025-03-18 12:36           ` Stephen Eta Zhou
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Eta Zhou @ 2025-03-18 12:36 UTC (permalink / raw)
  To: David Disseldorp
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org


> Not always. The *built-in* initramfs unpack_to_rootfs() error path
> panics, but external initramfs unpack_to_rootfs() failure won't panic
> immediately...

You're right that the behavior differs between built-in and external initramfs handling. My concern is primarily about the built-in case, where unpack_to_rootfs() will lead to a panic if decompression fails.

> Are your console/serial drivers loaded as external modules? That sounds
> like a configuration problem.

My console and serial port drivers are built into the kernel. I found that if the built-in initramfs fails to be decompressed, it will enter a suspended state when panic is called, and no logs will be output (my test environment at this time does not have the early serial port). After debugging, I found that console_flush_all did not seem to find an available console. I will debug this problem in depth next.

Thanks,
Stephen
________________________________________
From: David Disseldorp <ddiss@suse.de>
Sent: Tuesday, March 18, 2025 17:51
To: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
Cc: jsperbeck@google.com <jsperbeck@google.com>; akpm@linux-foundation.org <akpm@linux-foundation.org>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; lukas@wunner.de <lukas@wunner.de>; wufan@linux.microsoft.com <wufan@linux.microsoft.com>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
 
On Tue, 18 Mar 2025 06:28:53 +0000, Stephen Eta Zhou wrote:

> > There's room for improvement WRT how out-of-memory failures are reported 
>
> I am currently trying to find a good optimization solution for this. Since initramfs is decompressed in the early stage of the kernel, if the decompression fails, it will call panic to put the kernel into a panic state.

Not always. The *built-in* initramfs unpack_to_rootfs() error path
panics, but external initramfs unpack_to_rootfs() failure won't panic
immediately...

> There is a contradiction: at this time, the console and serial port have not been initialized yet, which will cause the error message to fail to be output, resulting in a suspended state, and no valid output can be seen.

Are your console/serial drivers loaded as external modules? That sounds
like a configuration problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-18 11:55   ` Krzysztof Kozlowski
@ 2025-03-18 12:46     ` Stephen Eta Zhou
  2025-03-18 19:01       ` Krzysztof Kozlowski
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Eta Zhou @ 2025-03-18 12:46 UTC (permalink / raw)
  To: Krzysztof Kozlowski, David Disseldorp
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

Hi Krzysztof

> Just in case before anyone tries to actually apply it: the entire patch
> has both corrupted header and actual patch is corrupted - all
> indentation messed.

Sorry, this is my oversight.....
I checked it locally using checkpatch.pl and it was fine, but I just used it to import and found that there is a problem. Sorry for wasting your time. I will regenerate this patch and resend it in the correct format.

Krzysztof, thanks for pointing this out.

Thanks,
Stephen
________________________________________
From: Krzysztof Kozlowski <krzk@kernel.org>
Sent: Tuesday, March 18, 2025 19:55
To: David Disseldorp <ddiss@suse.de>; Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
Cc: jsperbeck@google.com <jsperbeck@google.com>; akpm@linux-foundation.org <akpm@linux-foundation.org>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; lukas@wunner.de <lukas@wunner.de>; wufan@linux.microsoft.com <wufan@linux.microsoft.com>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
 
On 17/03/2025 08:21, David Disseldorp wrote:
>> From 3499daeb5caf934f08a485027b5411f9ef82d6be Mon Sep 17 00:00:00 2001
>> From: Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
>> Date: Fri, 14 Mar 2025 12:32:59 +0800
>> Subject: [PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
>>
>> When initramfs is loaded into a small memory environment, if its size
>> exceeds the tmpfs max blocks limit, the loading will fail. Additionally,
>> if the required blocks are close to the tmpfs max blocks boundary,
>> subsequent drivers or subsystems using tmpfs may fail to initialize.
>>
>> To prevent this, the size limit is set to half of tmpfs max blocks.
>> This ensures that initramfs can complete its mission without exhausting
>> tmpfs resources, as user-space programs may also rely on tmpfs after boot.
>>
>> This patch adds a validation mechanism to check the decompressed size
>> of initramfs based on its compression type and ratio. If the required
>> blocks exceed half of the tmpfs max blocks limit, the loading will be
>> aborted with an appropriate error message, exposing the issue early
>> and preventing further escalation.
>
> This behaviour appears fragile and quite arbitrary. I don't think
> initramfs should be responsible for making any of these decisions.
>
> Why can't the init binary make the decision of whether or not the amount
> of free memory remaining is sufficient for user-space, instead of this
> magic 50% limit?
>
> What are you trying to achieve by failing in this way before initramfs
> extraction instead of during / after?

Just in case before anyone tries to actually apply it: the entire patch
has both corrupted header and actual patch is corrupted - all
indentation messed.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
  2025-03-18 12:46     ` Stephen Eta Zhou
@ 2025-03-18 19:01       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 12+ messages in thread
From: Krzysztof Kozlowski @ 2025-03-18 19:01 UTC (permalink / raw)
  To: Stephen Eta Zhou, David Disseldorp
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On 18/03/2025 13:46, Stephen Eta Zhou wrote:
> Hi Krzysztof
> 
>> Just in case before anyone tries to actually apply it: the entire patch
>> has both corrupted header and actual patch is corrupted - all
>> indentation messed.
> 
> Sorry, this is my oversight.....
> I checked it locally using checkpatch.pl and it was fine, but I just used it to import and found that there is a problem. Sorry for wasting your time. I will regenerate this patch and resend it in the correct format.
> 
> Krzysztof, thanks for pointing this out.
> 
> Thanks,
> Stephen
> ________________________________________
> From: Krzysztof Kozlowski <krzk@kernel.org>
> Sent: Tuesday, March 18, 2025 19:55
> To: David Disseldorp <ddiss@suse.de>; Stephen Eta Zhou <stephen.eta.zhou@outlook.com>
> Cc: jsperbeck@google.com <jsperbeck@google.com>; akpm@linux-foundation.org <akpm@linux-foundation.org>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; lukas@wunner.de <lukas@wunner.de>; wufan@linux.microsoft.com <wufan@linux.microsoft.com>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>
> Subject: Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
>  
What is this header doing this? Use standard mailing list response
style, not some copy-paste and then quote entire irrelevant context.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion
@ 2025-03-19  7:59 Stephen Eta Zhou
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Eta Zhou @ 2025-03-19  7:59 UTC (permalink / raw)
  To: Krzysztof Kozlowski, David Disseldorp
  Cc: jsperbeck@google.com, akpm@linux-foundation.org,
	gregkh@linuxfoundation.org, lukas@wunner.de,
	wufan@linux.microsoft.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On Wednesday, March 19, 2025 03:01, Krzysztof Kozlowski  wrote:
> What is this header doing this? Use standard mailing list response
> style, not some copy-paste and then quote entire irrelevant context.
oh....sorry, My email client carries these contents...
I will be more careful in the future. Apologies for the disruption to the
email thread.

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-03-19  7:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-14  5:04 [RFC PATCH] initramfs: Add size validation to prevent tmpfs exhaustion Stephen Eta Zhou
2025-03-17  7:21 ` David Disseldorp
2025-03-17  9:41   ` Stephen Eta Zhou
2025-03-18  1:14     ` David Disseldorp
2025-03-18  4:47       ` Stephen Eta Zhou
2025-03-18  6:28       ` Stephen Eta Zhou
2025-03-18  9:51         ` David Disseldorp
2025-03-18 12:36           ` Stephen Eta Zhou
2025-03-18 11:55   ` Krzysztof Kozlowski
2025-03-18 12:46     ` Stephen Eta Zhou
2025-03-18 19:01       ` Krzysztof Kozlowski
  -- strict thread matches above, loose matches on Subject: below --
2025-03-19  7:59 Stephen Eta Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox