[PATCH] fs: Add 'rootfsflags' to set rootfs mount options

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
@ 2025-08-08  1:51 Lichen Liu
  2025-08-08  2:30 ` Dave Young
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Lichen Liu @ 2025-08-08  1:51 UTC (permalink / raw)
  To: viro, brauner, rob; +Cc: kexec, linux-kernel, weilongchen, Lichen Liu

When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
By default, a tmpfs mount is limited to using 50% of the available RAM
for its content. This can be problematic in memory-constrained
environments, particularly during a kdump capture.

In a kdump scenario, the capture kernel boots with a limited amount of
memory specified by the 'crashkernel' parameter. If the initramfs is
large, it may fail to unpack into the tmpfs rootfs due to insufficient
space. This is because to get X MB of usable space in tmpfs, 2*X MB of
memory must be available for the mount. This leads to an OOM failure
during the early boot process, preventing a successful crash dump.

This patch introduces a new kernel command-line parameter, rootfsflags,
which allows passing specific mount options directly to the rootfs when
it is first mounted. This gives users control over the rootfs behavior.

For example, a user can now specify rootfsflags=size=75% to allow the
tmpfs to use up to 75% of the available memory. This can significantly
reduce the memory pressure for kdump.

Consider a practical example:

To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
the default 50% limit, this requires a memory pool of 96MB to be
available for the tmpfs mount. The total memory requirement is therefore
approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.

By using rootfsflags=size=75%, the memory pool required for the 48MB
tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory
requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a
smaller crashkernel size, such as 192MB.

An alternative approach of reusing the existing rootflags parameter was
considered. However, a new, dedicated rootfsflags parameter was chosen
to avoid altering the current behavior of rootflags (which applies to
the final root filesystem) and to prevent any potential regressions.

This approach is inspired by prior discussions and patches on the topic.
Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
Ref: https://landley.net/notes-2015.html#01-01-2015
Ref: https://lkml.org/lkml/2021/6/29/783
Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs

Signed-off-by: Lichen Liu <lichliu@redhat.com>
---
 fs/namespace.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ddfd4457d338..a450db31613e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -65,6 +65,15 @@ static int __init set_mphash_entries(char *str)
 }
 __setup("mphash_entries=", set_mphash_entries);

+static char * __initdata rootfs_flags;
+static int __init rootfs_flags_setup(char *str)
+{
+	rootfs_flags = str;
+	return 1;
+}
+
+__setup("rootfsflags=", rootfs_flags_setup);
+
 static u64 event;
 static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
 static DEFINE_IDA(mnt_group_ida);
@@ -6086,7 +6095,7 @@ static void __init init_mount_tree(void)
 	struct mnt_namespace *ns;
 	struct path root;

-	mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
+	mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", rootfs_flags);
 	if (IS_ERR(mnt))
 		panic("Can't create rootfs");

-- 
2.50.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08  1:51 [PATCH] fs: Add 'rootfsflags' to set rootfs mount options Lichen Liu
@ 2025-08-08  2:30 ` Dave Young
  2025-08-08  2:47   ` Dave Young
  2025-08-08 14:38   ` Rob Landley
  2025-08-09 15:02 ` Rob Landley
  2025-08-14  8:13 ` Askar Safin
  2 siblings, 2 replies; 10+ messages in thread
From: Dave Young @ 2025-08-08  2:30 UTC (permalink / raw)
  To: Lichen Liu; +Cc: viro, brauner, rob, kexec, linux-kernel, weilongchen

Hi Lichen,

On Fri, 8 Aug 2025 at 09:55, Lichen Liu <lichliu@redhat.com> wrote:
>
> When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
> By default, a tmpfs mount is limited to using 50% of the available RAM
> for its content. This can be problematic in memory-constrained
> environments, particularly during a kdump capture.
>
> In a kdump scenario, the capture kernel boots with a limited amount of
> memory specified by the 'crashkernel' parameter. If the initramfs is
> large, it may fail to unpack into the tmpfs rootfs due to insufficient
> space. This is because to get X MB of usable space in tmpfs, 2*X MB of
> memory must be available for the mount. This leads to an OOM failure
> during the early boot process, preventing a successful crash dump.
>
> This patch introduces a new kernel command-line parameter, rootfsflags,
> which allows passing specific mount options directly to the rootfs when
> it is first mounted. This gives users control over the rootfs behavior.
>
> For example, a user can now specify rootfsflags=size=75% to allow the
> tmpfs to use up to 75% of the available memory. This can significantly
> reduce the memory pressure for kdump.
>
> Consider a practical example:
>
> To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
> the default 50% limit, this requires a memory pool of 96MB to be
> available for the tmpfs mount. The total memory requirement is therefore
> approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
> kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.
>
> By using rootfsflags=size=75%, the memory pool required for the 48MB
> tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory
> requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a
> smaller crashkernel size, such as 192MB.
>
> An alternative approach of reusing the existing rootflags parameter was
> considered. However, a new, dedicated rootfsflags parameter was chosen
> to avoid altering the current behavior of rootflags (which applies to
> the final root filesystem) and to prevent any potential regressions.
>
> This approach is inspired by prior discussions and patches on the topic.
> Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
> Ref: https://landley.net/notes-2015.html#01-01-2015
> Ref: https://lkml.org/lkml/2021/6/29/783
> Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs
>
> Signed-off-by: Lichen Liu <lichliu@redhat.com>
> ---
>  fs/namespace.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ddfd4457d338..a450db31613e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -65,6 +65,15 @@ static int __init set_mphash_entries(char *str)
>  }
>  __setup("mphash_entries=", set_mphash_entries);
>
> +static char * __initdata rootfs_flags;
> +static int __init rootfs_flags_setup(char *str)
> +{
> +       rootfs_flags = str;

I do see there are a few similar usages in init/do_mounts.c, probably
it is old stuff and it just works.  But I think making rootfs_flags as
an array and copying str into it is the right way.

> +       return 1;
> +}
> +
> +__setup("rootfsflags=", rootfs_flags_setup);
> +
>  static u64 event;
>  static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
>  static DEFINE_IDA(mnt_group_ida);
> @@ -6086,7 +6095,7 @@ static void __init init_mount_tree(void)
>         struct mnt_namespace *ns;
>         struct path root;
>
> -       mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
> +       mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", rootfs_flags);
>         if (IS_ERR(mnt))
>                 panic("Can't create rootfs");
>
> --
> 2.50.1
>
>
Thanks
Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08  2:30 ` Dave Young
@ 2025-08-08  2:47   ` Dave Young
  2025-08-08  3:36     ` Lichen Liu
  2025-08-08 17:59     ` Rob Landley
  2025-08-08 14:38   ` Rob Landley
  1 sibling, 2 replies; 10+ messages in thread
From: Dave Young @ 2025-08-08  2:47 UTC (permalink / raw)
  To: Lichen Liu; +Cc: viro, brauner, rob, kexec, linux-kernel, weilongchen

On Fri, 8 Aug 2025 at 10:30, Dave Young <dyoung@redhat.com> wrote:
>
> Hi Lichen,
>
> On Fri, 8 Aug 2025 at 09:55, Lichen Liu <lichliu@redhat.com> wrote:
> >
> > When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
> > By default, a tmpfs mount is limited to using 50% of the available RAM
> > for its content. This can be problematic in memory-constrained
> > environments, particularly during a kdump capture.
> >
> > In a kdump scenario, the capture kernel boots with a limited amount of
> > memory specified by the 'crashkernel' parameter. If the initramfs is
> > large, it may fail to unpack into the tmpfs rootfs due to insufficient
> > space. This is because to get X MB of usable space in tmpfs, 2*X MB of
> > memory must be available for the mount. This leads to an OOM failure
> > during the early boot process, preventing a successful crash dump.
> >
> > This patch introduces a new kernel command-line parameter, rootfsflags,
> > which allows passing specific mount options directly to the rootfs when
> > it is first mounted. This gives users control over the rootfs behavior.
> >
> > For example, a user can now specify rootfsflags=size=75% to allow the
> > tmpfs to use up to 75% of the available memory. This can significantly
> > reduce the memory pressure for kdump.
> >
> > Consider a practical example:
> >
> > To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
> > the default 50% limit, this requires a memory pool of 96MB to be
> > available for the tmpfs mount. The total memory requirement is therefore
> > approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
> > kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.
> >
> > By using rootfsflags=size=75%, the memory pool required for the 48MB
> > tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory
> > requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a
> > smaller crashkernel size, such as 192MB.
> >
> > An alternative approach of reusing the existing rootflags parameter was
> > considered. However, a new, dedicated rootfsflags parameter was chosen
> > to avoid altering the current behavior of rootflags (which applies to
> > the final root filesystem) and to prevent any potential regressions.
> >
> > This approach is inspired by prior discussions and patches on the topic.
> > Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
> > Ref: https://landley.net/notes-2015.html#01-01-2015
> > Ref: https://lkml.org/lkml/2021/6/29/783
> > Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs
> >
> > Signed-off-by: Lichen Liu <lichliu@redhat.com>
> > ---
> >  fs/namespace.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index ddfd4457d338..a450db31613e 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -65,6 +65,15 @@ static int __init set_mphash_entries(char *str)
> >  }
> >  __setup("mphash_entries=", set_mphash_entries);
> >
> > +static char * __initdata rootfs_flags;
> > +static int __init rootfs_flags_setup(char *str)
> > +{
> > +       rootfs_flags = str;
>
> I do see there are a few similar usages in init/do_mounts.c, probably
> it is old stuff and it just works.  But I think making rootfs_flags as
> an array and copying str into it is the right way.

Another question, may need fs people to clarify.  If the mount is
tmpfs and it is also rootfs,  could it use 100% of the memory by
default, and then no need for an extra param?    I feel that there is
no point to reserve memory if it is a fully memory based file system.

>
> > +       return 1;
> > +}
> > +
> > +__setup("rootfsflags=", rootfs_flags_setup);
> > +
> >  static u64 event;
> >  static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
> >  static DEFINE_IDA(mnt_group_ida);
> > @@ -6086,7 +6095,7 @@ static void __init init_mount_tree(void)
> >         struct mnt_namespace *ns;
> >         struct path root;
> >
> > -       mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
> > +       mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", rootfs_flags);
> >         if (IS_ERR(mnt))
> >                 panic("Can't create rootfs");
> >
> > --
> > 2.50.1
> >
> >
> Thanks
> Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08  2:47   ` Dave Young
@ 2025-08-08  3:36     ` Lichen Liu
  2025-08-08 17:59     ` Rob Landley
  1 sibling, 0 replies; 10+ messages in thread
From: Lichen Liu @ 2025-08-08  3:36 UTC (permalink / raw)
  To: Dave Young; +Cc: viro, brauner, rob, kexec, linux-kernel, weilongchen

On Fri, Aug 8, 2025 at 10:46 AM Dave Young <dyoung@redhat.com> wrote:
>
> On Fri, 8 Aug 2025 at 10:30, Dave Young <dyoung@redhat.com> wrote:
> >
> > Hi Lichen,
> >
> > On Fri, 8 Aug 2025 at 09:55, Lichen Liu <lichliu@redhat.com> wrote:
> > >
> > > When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
> > > By default, a tmpfs mount is limited to using 50% of the available RAM
> > > for its content. This can be problematic in memory-constrained
> > > environments, particularly during a kdump capture.
> > >
> > > In a kdump scenario, the capture kernel boots with a limited amount of
> > > memory specified by the 'crashkernel' parameter. If the initramfs is
> > > large, it may fail to unpack into the tmpfs rootfs due to insufficient
> > > space. This is because to get X MB of usable space in tmpfs, 2*X MB of
> > > memory must be available for the mount. This leads to an OOM failure
> > > during the early boot process, preventing a successful crash dump.
> > >
> > > This patch introduces a new kernel command-line parameter, rootfsflags,
> > > which allows passing specific mount options directly to the rootfs when
> > > it is first mounted. This gives users control over the rootfs behavior.
> > >
> > > For example, a user can now specify rootfsflags=size=75% to allow the
> > > tmpfs to use up to 75% of the available memory. This can significantly
> > > reduce the memory pressure for kdump.
> > >
> > > Consider a practical example:
> > >
> > > To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
> > > the default 50% limit, this requires a memory pool of 96MB to be
> > > available for the tmpfs mount. The total memory requirement is therefore
> > > approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
> > > kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.
> > >
> > > By using rootfsflags=size=75%, the memory pool required for the 48MB
> > > tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory
> > > requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a
> > > smaller crashkernel size, such as 192MB.
> > >
> > > An alternative approach of reusing the existing rootflags parameter was
> > > considered. However, a new, dedicated rootfsflags parameter was chosen
> > > to avoid altering the current behavior of rootflags (which applies to
> > > the final root filesystem) and to prevent any potential regressions.
> > >
> > > This approach is inspired by prior discussions and patches on the topic.
> > > Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
> > > Ref: https://landley.net/notes-2015.html#01-01-2015
> > > Ref: https://lkml.org/lkml/2021/6/29/783
> > > Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs
> > >
> > > Signed-off-by: Lichen Liu <lichliu@redhat.com>
> > > ---
> > >  fs/namespace.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index ddfd4457d338..a450db31613e 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -65,6 +65,15 @@ static int __init set_mphash_entries(char *str)
> > >  }
> > >  __setup("mphash_entries=", set_mphash_entries);
> > >
> > > +static char * __initdata rootfs_flags;
> > > +static int __init rootfs_flags_setup(char *str)
> > > +{
> > > +       rootfs_flags = str;
> >
> > I do see there are a few similar usages in init/do_mounts.c, probably
> > it is old stuff and it just works.  But I think making rootfs_flags as
> > an array and copying str into it is the right way.
Hi Dave, thanks for your comments!

I will check how to make it better.

>
> Another question, may need fs people to clarify.  If the mount is
> tmpfs and it is also rootfs,  could it use 100% of the memory by
> default, and then no need for an extra param?    I feel that there is
> no point to reserve memory if it is a fully memory based file system.
>

I think rootfstype=ramfs will use 100% of the memory.
For kdump only, there might not be much difference between using ramfs
and tmpfs size=100%. But I think it might provide more flexibility
since rootfstype= and rootflags= can be used with root=.

https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

> >
> > > +       return 1;
> > > +}
> > > +
> > > +__setup("rootfsflags=", rootfs_flags_setup);
> > > +
> > >  static u64 event;
> > >  static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
> > >  static DEFINE_IDA(mnt_group_ida);
> > > @@ -6086,7 +6095,7 @@ static void __init init_mount_tree(void)
> > >         struct mnt_namespace *ns;
> > >         struct path root;
> > >
> > > -       mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
> > > +       mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", rootfs_flags);
> > >         if (IS_ERR(mnt))
> > >                 panic("Can't create rootfs");
> > >
> > > --
> > > 2.50.1
> > >
> > >
> > Thanks
> > Dave
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08  2:47   ` Dave Young
  2025-08-08  3:36     ` Lichen Liu
@ 2025-08-08 17:59     ` Rob Landley
  2025-08-11  1:57       ` Lichen Liu
  1 sibling, 1 reply; 10+ messages in thread
From: Rob Landley @ 2025-08-08 17:59 UTC (permalink / raw)
  To: Dave Young, Lichen Liu; +Cc: viro, brauner, kexec, linux-kernel, weilongchen

On 8/7/25 21:47, Dave Young wrote:
> Another question, may need fs people to clarify.  If the mount is
> tmpfs and it is also rootfs,  could it use 100% of the memory by
> default,

If you want to softlock the system when rootfs fills up with log files 
or something, sure.

That was one of the original motivating reasons for using tmpfs instead 
of ramfs for a persistent initramfs you don't pivot off of. Plus things 
like ramfs always reporting zero free space (it doesn't rack) so you 
can't use things like "rpm install" to add more packages at runtime, and 
so on... I had a list of reasons I added initmpfs support back in 
2013.... looks like 
https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html

(Ok, the REAL reason I did it is A) I'd documented that was how it 
worked when I wrote ramfs-rootfs-initramfs back in 2005 because it 
seemed deeply silly NOT to support that, and when nobody had made the 
obvious fix 7 years later I got guilted into it by an employer who I'd 
explained initramfs to and they asked "how do we do the tmpfs version" 
so I whipped up a quick patch and they went "you need to upstream this 
before we'll use it" so I went through The Process...)

Note that right now initmpfs isn't _specifying_ 50%, it's inheriting the 
default value from tmpfs when no arguments are specified. If you're 
special casing 100% for rootfs you'd still be passing in an argument to 
the mount call to override the 50% default, just as a hardwired string 
instead of a user-provided one (and again it would be a terrible idea).

And if you DO change tmpfs itself to globally default to 100% then 'yes 
 > /dev/shm/blah.txt' could lock your system as a normal user if they 
don't change their mount script to specify an explicit constraint. Which 
seems a bit of a regression for existing systems.

This new patch is because sometimes people making embedded systems want 
to devote more than 50% of memory to rootfs while still having the other 
benefits of tmpfs. One of those benefits is not soft-locking the kernel 
if something writes too much data to the filesystem.

History time! (It's a hobby of mine. Plus I was here for this part.)

Tmpfs was originally called "swapfs" (because ramfs couldn't use swap as 
backing store):

https://lkml.iu.edu/0102.0/0203.html

It was submitted to linux-kernel in 2001 (Peter Anvin was "?!" aghast):

https://lkml.iu.edu/0102.0/0239.html

Tmpfs got added in 2.4.3.3 ala 
https://github.com/mpe/linux-fullhistory/commit/ca56c8ee6fa0

And almost immediately people noticed the softlock issue hadn't been fixed:

https://lkml.iu.edu/0103.3/0053.html

So the 50% default limit for tmpfs was introduced in 2001 (release 
2.4.7.5) with the description "saner tmpfs mount-time limits", ala:

https://github.com/mpe/linux-fullhistory/commit/80fa70c0ea28

Jeff Garzik wired it up as an alternative to initrd in November 2002:

https://lwn.net/Articles/14448/

Alas, the result was completely undocumented. I thought it sounded like 
a cool idea (it resizes automatically!) and reverse engineered how to 
use it (ok, mostly a lot of pestering people with questions in email) 
and wrote documentation encouraging people to use it in 2005:

https://lwn.net/Articles/157676/

When I converted rootfs to be able to use tmpfs in 2013 (link above) 
there was a rootflags= but not a rootfsflags= (ramfs was intentionally a 
simple demonstration of libfs that took no arguments) and I didn't add 
one because I didn't personally need it: the 50% default was fine for me 
and you can mount -o remount to change flags after the fact. (Although I 
dunno if you can change this limit after the fact or what would happen 
if you reduced it below what the filesystem currently contained, 
probably doesn't work.)

Although looking back at my blog entries from the time, it seems I 
mostly didn't want to deal with bikeshedding about the name 
https://landley.net/notes-2013.html#29-04-2013

A year later somebody asked me why rootflags= wasn't working for 
initmpfs (http://www.lightofdawn.org/blog/?viewDetailed=00128) and I 
basically went "fixing it's easy, getting a patch into linux-kernel 
requires far too much proctology for anyone on the inside to even see 
it", and here we are 10 years later with the issue still unaddressed. 
(Open source! Fixes everything right up immediately. So responsive. No 
problems left to tackle, hard to find stuff worth doing...)

> and then no need for an extra param?    I feel that there is
> no point to reserve memory if it is a fully memory based file system.

You're confusing ramdisk with ramfs (initrd vs initramfs). The 50% isn't 
a reservation, it's a constraint. Both ramfs and tmpfs are dynamic ram 
backed filesystems.

I wrote documentation about the four types of filesystem 
(block/pipe/ram/function backed) 20 years ago back on livejournal, I 
still have a copy somewhere...

https://landley.net/toybox/doc/mount.html

Linus invented ramfs by basically just mounting the page cache as a 
filesystem with no backing store, so when memory pressure does flush 
requests it goes "nope". When you write files it allocates memory, when 
you truncate/delete files it frees memory. That's why ramfs was just a 
couple hundred lines (at the time he was factoring out libfs so /proc 
could stop being only synthetic filesystem everybody dumped every 
control knob into, and I recall he mostly did ramfs as an example of how 
minimal you could get with the new plumbing). Then tmpfs added some 
basic guardrails and the ability to use swap space as backing store in 
case of memory pressure (if you have swap, which a lot of embedded 
systems don't; note that mmap()ed files have backing store, and 
executables are basically mmap(MAP_PRIVATE) with some bells and 
whistles, so you can still swap thrash under memory pressure even 
without swap by evicting and faulting back in executable pages).

The old ramdisk mechanism from the 1990s created a virtual block device 
(/dev/ram0 and friends I think) which you would then format and mount 
using a block backed filesystem driver like ext2. This was terrible for 
a bunch of reasons, unnecessarily copying all the data to use it and 
regularly having two copies of the data in RAM (the one faulted into the 
page cache and the one in the ram block device backing store). Heck, 
when you had a system running from initramfs, you could configure out 
the whole block layer and all the block backed filesystem drivers, which 
made the kernel way smaller both in flash and at runtime. Even before 
initramfs, ramdisks largely receded into the mists of history (except 
for initrd) when loopback mounting became a thing, because you can just 
dd if=/dev/zero of=blah.img bs=1m count=16 and then format that and 
loopback mount it, and you control the size naturally (no rebooting 
needed to change it) and it's got its own built-in backing store 
allowing memory usage of the virtual image to be dynamic (ok, you can 
mlock() it if you really want to but you could _also_ loopback a file 
out of ramfs or tmpfs to accomplish that)...

The point of the 50% constraint in tmpfs is to tell the system "when I 
ask how much free space there is, here's what the maximum should be". 
Since ramfs doesn't enforce any such constraint, it always reports both 
total and free space as 0, which tools like "df" use to indicate 
"synthetic filesystem" and thus not show by default when you ask about 
"disk free" space. Ramfs will let you keep writing as long as the 
kernel's internal malloc() doesn't fail to find the next page, and THAT 
is a problem because writes will fill up every last scrap of kernel 
memory and then the rest of the kernel loses its lunch when its 
allocations fail. (They added the OOM killer to try to cope with the 
fact that recognizing you've run out of memory comes not when you mmap() 
a range but when you asynchronously fault in pages by reading or 
dirtying them, which is at memory access time not a syscall with a 
return value. That's a WHOLE SAGA! There really _isn't_ a good answer 
but people will happily argue about least bad FOREVER. The younger 
generation seems to believe that Rust will do something other than add 
abstraction layers and transition boundaries to make this worse.)

Anyway, the perennial complaint about the 50% initmpfs limit was that if 
you have a small system that needs 16 gigs of ram to run, but you have a 
cpio.gz that expands to 48 megabytes, then 64 megs SHOULD be enough for 
the system... but it won't let you. You have to give it 96 megabytes of 
ram in order to be able to use 48 megs of root filesystem, or else 
extracting the cpio.gz will fail with an out of space error before 
launching init. (This was especially common since you're about to free 
the cpio.gz after extracting it, so by the time it launches PID 1 the 
kernel has MORE memory available. There's a high water mark of memory 
usage while he system is basically idle, but once you're past that extra 
memory is just adding expense, draining your battery, producing heat...)

The embedded developers have been familiar with the problem for decades, 
and (as usual) have repeatedly fixed it locally ignoring linux-kernel 
politics. I first got asked to fix it over 10 years ago, I just find the 
kernel community unpleasant to interact with these days so mostly only 
wander in when cc'd.

The author of this patch asked me off list if I had a current version of 
the patch I'd given other people, which I hadn't updated in _years_. 
It's been fixed a bunch of times, https://lkml.org/lkml/2021/6/29/783 
was the most recent we could find, but the fixes stay out of tree 
because Linux dev is aggressively insular ever since the linux 
foundation drove away the last of the hobbyists back around 
https://lwn.net/Articles/563578/ and became corporate "certificate of 
authenticity" signed-off-by-in-triplicate land with a thousand line 
patch submission procedure document 
https://kernel.org/doc/Documentation/process/submitting-patches.rst and 
a 27 step checklist 
https://kernel.org/doc/Documentation/process/submit-checklist.rst

(Which will usually still get ignored even when you do that.)

Rob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08 17:59     ` Rob Landley
@ 2025-08-11  1:57       ` Lichen Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Lichen Liu @ 2025-08-11  1:57 UTC (permalink / raw)
  To: Rob Landley; +Cc: Dave Young, viro, brauner, kexec, linux-kernel, weilongchen

Hi Rob,

Thanks for your help with testing and for answering Dave's question in detail!

Your "history time" here is fantastic and explains the reasoning
perfectly, it is very interesting and I believe it could be part of a
publication.

Based on this discussion, it seems the patch is heading in the right
direction. Please let me know if there are any other concerns.

Thanks!
Lichen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08  2:30 ` Dave Young
  2025-08-08  2:47   ` Dave Young
@ 2025-08-08 14:38   ` Rob Landley
  1 sibling, 0 replies; 10+ messages in thread
From: Rob Landley @ 2025-08-08 14:38 UTC (permalink / raw)
  To: Dave Young, Lichen Liu; +Cc: viro, brauner, kexec, linux-kernel, weilongchen

On 8/7/25 21:30, Dave Young wrote:
> I do see there are a few similar usages in init/do_mounts.c, probably
> it is old stuff and it just works.  But I think making rootfs_flags as
> an array and copying str into it is the right way.

The lifespan of the string ends before PID 1 gets launched. The copy 
would be unnecessary and either perform an allocation or impose a 
gratuitous length limit.

Rob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08  1:51 [PATCH] fs: Add 'rootfsflags' to set rootfs mount options Lichen Liu
  2025-08-08  2:30 ` Dave Young
@ 2025-08-09 15:02 ` Rob Landley
  2025-08-14  8:13 ` Askar Safin
  2 siblings, 0 replies; 10+ messages in thread
From: Rob Landley @ 2025-08-09 15:02 UTC (permalink / raw)
  To: Lichen Liu, viro, brauner; +Cc: kexec, linux-kernel, weilongchen

On 8/7/25 20:51, Lichen Liu wrote:
> This patch introduces a new kernel command-line parameter, rootfsflags,
> which allows passing specific mount options directly to the rootfs when
> it is first mounted. This gives users control over the rootfs behavior.

Works for me. In an i486 mkroot build against stock 6.16 with this patch:

$ root/i486/run-qemu.sh
...
# grep rootfs /proc/mounts
rootfs / rootfs rw,size=125728k,nr_inodes=31432 0 0
# df
Filesystem     1K-blocks Used Available Use% Mounted on
rootfs            125728  764    124964   1% /
dev               125728    0    125728   0% /dev


$ KARGS="rootfsflags=size=1m" root/i486/run-qemu.sh
...
# grep rootfs /proc/mounts
rootfs / rootfs rw,size=1024k,nr_inodes=31432 0 0
# df
Filesystem     1K-blocks Used Available Use% Mounted on
rootfs              1024  764       260  75% /
dev               125728    0    125728   0% /dev

Tested-by: Rob Landley <rob@landley.net>

Rob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-08  1:51 [PATCH] fs: Add 'rootfsflags' to set rootfs mount options Lichen Liu
  2025-08-08  2:30 ` Dave Young
  2025-08-09 15:02 ` Rob Landley
@ 2025-08-14  8:13 ` Askar Safin
  2025-08-14 10:25   ` Lichen Liu
  2 siblings, 1 reply; 10+ messages in thread
From: Askar Safin @ 2025-08-14  8:13 UTC (permalink / raw)
  To: lichliu
  Cc: brauner, kexec, linux-kernel, rob, viro, weilongchen, cyphar,
	linux-fsdevel, linux-api, initramfs, Mimi Zohar, Stefan Berger

Lichen Liu <lichliu@redhat.com>:
> When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
> By default, a tmpfs mount is limited to using 50% of the available RAM
> for its content. This can be problematic in memory-constrained
> environments, particularly during a kdump capture.
> 
> In a kdump scenario, the capture kernel boots with a limited amount of
> memory specified by the 'crashkernel' parameter. If the initramfs is
> large, it may fail to unpack into the tmpfs rootfs due to insufficient
> space. This is because to get X MB of usable space in tmpfs, 2*X MB of
> memory must be available for the mount. This leads to an OOM failure
> during the early boot process, preventing a successful crash dump.
> 
> This patch introduces a new kernel command-line parameter, rootfsflags,
> which allows passing specific mount options directly to the rootfs when
> it is first mounted. This gives users control over the rootfs behavior.
> 
> For example, a user can now specify rootfsflags=size=75% to allow the
> tmpfs to use up to 75% of the available memory. This can significantly
> reduce the memory pressure for kdump.
> 
> Consider a practical example:
> 
> To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
> the default 50% limit, this requires a memory pool of 96MB to be
> available for the tmpfs mount. The total memory requirement is therefore
> approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
> kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.
> 
> By using rootfsflags=size=75%, the memory pool required for the 48MB
> tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory
> requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a
> smaller crashkernel size, such as 192MB.
> 
> An alternative approach of reusing the existing rootflags parameter was
> considered. However, a new, dedicated rootfsflags parameter was chosen
> to avoid altering the current behavior of rootflags (which applies to
> the final root filesystem) and to prevent any potential regressions.
> 
> This approach is inspired by prior discussions and patches on the topic.
> Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
> Ref: https://landley.net/notes-2015.html#01-01-2015
> Ref: https://lkml.org/lkml/2021/6/29/783
> Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs
> 
> Signed-off-by: Lichen Liu <lichliu@redhat.com>
> ---
>  fs/namespace.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ddfd4457d338..a450db31613e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -65,6 +65,15 @@ static int __init set_mphash_entries(char *str)
>  }
>  __setup("mphash_entries=", set_mphash_entries);
>  
> +static char * __initdata rootfs_flags;
> +static int __init rootfs_flags_setup(char *str)
> +{
> +	rootfs_flags = str;
> +	return 1;
> +}
> +
> +__setup("rootfsflags=", rootfs_flags_setup);
> +
>  static u64 event;
>  static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
>  static DEFINE_IDA(mnt_group_ida);
> @@ -6086,7 +6095,7 @@ static void __init init_mount_tree(void)
>  	struct mnt_namespace *ns;
>  	struct path root;
>  
> -	mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
> +	mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", rootfs_flags);
>  	if (IS_ERR(mnt))
>  		panic("Can't create rootfs");
>  
> -- 
> 2.50.1

Thank you for this patch!

I suggest periodically check linux-next to see whether the patch got there.

If it was not applied in resonable time, then resend it.
But this time, please, clearly specify tree, which should accept it.
I think the most apropriate tree is VFS tree here.
So, when resending please add linux-fsdevel@vger.kernel.org to CC and say in first paragraph
in your mail that the patch is for VFS tree.

--
Askar Safin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options
  2025-08-14  8:13 ` Askar Safin
@ 2025-08-14 10:25   ` Lichen Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Lichen Liu @ 2025-08-14 10:25 UTC (permalink / raw)
  To: Askar Safin
  Cc: brauner, kexec, linux-kernel, rob, viro, weilongchen, cyphar,
	linux-fsdevel, linux-api, initramfs, Mimi Zohar, Stefan Berger

On Thu, Aug 14, 2025 at 4:15 PM Askar Safin <safinaskar@zohomail.com> wrote:
>
> Lichen Liu <lichliu@redhat.com>:
> > When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
> > By default, a tmpfs mount is limited to using 50% of the available RAM
> > for its content. This can be problematic in memory-constrained
> > environments, particularly during a kdump capture.
> >
> > In a kdump scenario, the capture kernel boots with a limited amount of
> > memory specified by the 'crashkernel' parameter. If the initramfs is
> > large, it may fail to unpack into the tmpfs rootfs due to insufficient
> > space. This is because to get X MB of usable space in tmpfs, 2*X MB of
> > memory must be available for the mount. This leads to an OOM failure
> > during the early boot process, preventing a successful crash dump.
> >
> > This patch introduces a new kernel command-line parameter, rootfsflags,
> > which allows passing specific mount options directly to the rootfs when
> > it is first mounted. This gives users control over the rootfs behavior.
> >
> > For example, a user can now specify rootfsflags=size=75% to allow the
> > tmpfs to use up to 75% of the available memory. This can significantly
> > reduce the memory pressure for kdump.
> >
> > Consider a practical example:
> >
> > To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
> > the default 50% limit, this requires a memory pool of 96MB to be
> > available for the tmpfs mount. The total memory requirement is therefore
> > approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
> > kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.
> >
> > By using rootfsflags=size=75%, the memory pool required for the 48MB
> > tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory
> > requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a
> > smaller crashkernel size, such as 192MB.
> >
> > An alternative approach of reusing the existing rootflags parameter was
> > considered. However, a new, dedicated rootfsflags parameter was chosen
> > to avoid altering the current behavior of rootflags (which applies to
> > the final root filesystem) and to prevent any potential regressions.
> >
> > This approach is inspired by prior discussions and patches on the topic.
> > Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
> > Ref: https://landley.net/notes-2015.html#01-01-2015
> > Ref: https://lkml.org/lkml/2021/6/29/783
> > Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs
> >
> > Signed-off-by: Lichen Liu <lichliu@redhat.com>
> > ---
> >  fs/namespace.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index ddfd4457d338..a450db31613e 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -65,6 +65,15 @@ static int __init set_mphash_entries(char *str)
> >  }
> >  __setup("mphash_entries=", set_mphash_entries);
> >
> > +static char * __initdata rootfs_flags;
> > +static int __init rootfs_flags_setup(char *str)
> > +{
> > +     rootfs_flags = str;
> > +     return 1;
> > +}
> > +
> > +__setup("rootfsflags=", rootfs_flags_setup);
> > +
> >  static u64 event;
> >  static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
> >  static DEFINE_IDA(mnt_group_ida);
> > @@ -6086,7 +6095,7 @@ static void __init init_mount_tree(void)
> >       struct mnt_namespace *ns;
> >       struct path root;
> >
> > -     mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
> > +     mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", rootfs_flags);
> >       if (IS_ERR(mnt))
> >               panic("Can't create rootfs");
> >
> > --
> > 2.50.1
>
> Thank you for this patch!
>
> I suggest periodically check linux-next to see whether the patch got there.
>
> If it was not applied in resonable time, then resend it.
> But this time, please, clearly specify tree, which should accept it.
> I think the most apropriate tree is VFS tree here.
> So, when resending please add linux-fsdevel@vger.kernel.org to CC and say in first paragraph
> in your mail that the patch is for VFS tree.
Thank You!

I checked the linux-next and it was not applied now. I will resend
this patch and CC linux-fsdevel@vger.kernel.org.

>
> --
> Askar Safin
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-08-14 10:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08  1:51 [PATCH] fs: Add 'rootfsflags' to set rootfs mount options Lichen Liu
2025-08-08  2:30 ` Dave Young
2025-08-08  2:47   ` Dave Young
2025-08-08  3:36     ` Lichen Liu
2025-08-08 17:59     ` Rob Landley
2025-08-11  1:57       ` Lichen Liu
2025-08-08 14:38   ` Rob Landley
2025-08-09 15:02 ` Rob Landley
2025-08-14  8:13 ` Askar Safin
2025-08-14 10:25   ` Lichen Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).