public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] initramfs: Support initrd that is bigger then 2G.
@ 2014-06-20  2:12 Yinghai Lu
  2014-06-20  4:29 ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: Yinghai Lu @ 2014-06-20  2:12 UTC (permalink / raw)
  To: Andrew Morton, H. Peter Anvin, Ingo Molnar
  Cc: Tetsuo Handa, Daniel M. Weeks, linux-kernel, Yinghai Lu

When initrd (compressed or not) is used, kernel report data corrupted
with /dev/ram0.

The root cause:
During initramfs checking, if it is initrd, it will be transferred to
/initrd.image with sys_write.
sys_write only support 2G-4K write, so if the initrd ram is more than
that, /initrd.image will not complete at all.

Add local sys_write_large to loop calling sys_write to workaround the
problem.

Also need to use that in write_buffer path for cpio that have file is
more than file.

At the same time, we don't need to worry about sys_read/sys_write in
do_mounts_rd.c::crd_load. As decompressor will have fill/flush that
means it will allocate buffer and buffer is smaller than 2G.

Test with uncompressed initrd, and compressed with gz, bz2, lzma,xz,
lzop.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 init/initramfs.c |   33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6/init/initramfs.c
===================================================================
--- linux-2.6.orig/init/initramfs.c
+++ linux-2.6/init/initramfs.c
@@ -19,6 +19,26 @@
 #include <linux/syscalls.h>
 #include <linux/utime.h>
 
+static long __init sys_write_large(unsigned int fd, char *p,
+				   size_t count)
+{
+	ssize_t left = count;
+	long written;
+
+	/* sys_write only can write MAX_RW_COUNT aka 2G-4K bytes at most */
+	while (left > 0) {
+		written = sys_write(fd, p, left);
+
+		if (written <= 0)
+			break;
+
+		left -= written;
+		p += written;
+	}
+
+	return (written < 0) ? written : count;
+}
+
 static __initdata char *message;
 static void __init error(char *x)
 {
@@ -346,7 +366,7 @@ static int __init do_name(void)
 static int __init do_copy(void)
 {
 	if (count >= body_len) {
-		sys_write(wfd, victim, body_len);
+		sys_write_large(wfd, victim, body_len);
 		sys_close(wfd);
 		do_utime(vcollected, mtime);
 		kfree(vcollected);
@@ -354,7 +374,7 @@ static int __init do_copy(void)
 		state = SkipIt;
 		return 0;
 	} else {
-		sys_write(wfd, victim, count);
+		sys_write_large(wfd, victim, count);
 		body_len -= count;
 		eat(count);
 		return 1;
@@ -604,8 +624,13 @@ static int __init populate_rootfs(void)
 		fd = sys_open("/initrd.image",
 			      O_WRONLY|O_CREAT, 0700);
 		if (fd >= 0) {
-			sys_write(fd, (char *)initrd_start,
-					initrd_end - initrd_start);
+			long written = sys_write_large(fd, (char *)initrd_start,
+						initrd_end - initrd_start);
+
+			if (written != initrd_end - initrd_start)
+				pr_err("/initrd.image: incomplete write (%ld != %ld)\n",
+				       written, initrd_end - initrd_start);
+
 			sys_close(fd);
 			free_initrd();
 		}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] initramfs: Support initrd that is bigger then 2G.
  2014-06-20  2:12 [PATCH] initramfs: Support initrd that is bigger then 2G Yinghai Lu
@ 2014-06-20  4:29 ` H. Peter Anvin
  2014-06-20  5:02   ` Yinghai Lu
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2014-06-20  4:29 UTC (permalink / raw)
  To: Yinghai Lu, Andrew Morton, Ingo Molnar
  Cc: Tetsuo Handa, Daniel M. Weeks, linux-kernel

On 06/19/2014 07:12 PM, Yinghai Lu wrote:
> When initrd (compressed or not) is used, kernel report data corrupted
> with /dev/ram0.
> 
> The root cause:
> During initramfs checking, if it is initrd, it will be transferred to
> /initrd.image with sys_write.
> sys_write only support 2G-4K write, so if the initrd ram is more than
> that, /initrd.image will not complete at all.
> 
> Add local sys_write_large to loop calling sys_write to workaround the
> problem.
> 
> Also need to use that in write_buffer path for cpio that have file is
> more than file.

That sentence doesn't make sense.

> At the same time, we don't need to worry about sys_read/sys_write in
> do_mounts_rd.c::crd_load. As decompressor will have fill/flush that
> means it will allocate buffer and buffer is smaller than 2G.
> 
> Test with uncompressed initrd, and compressed with gz, bz2, lzma,xz,
> lzop.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

I would call this function xwrite(), which is usually called in userspace.

It would be nice in order to support very large initrd/initramfs, to
free the memory as it becomes available instead of requiring two copies
of the data in memory at the same time.

Otherwise,

Acked-by: H. Peter Anvin <hpa@zytor.com>

	-hpa



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] initramfs: Support initrd that is bigger then 2G.
  2014-06-20  4:29 ` H. Peter Anvin
@ 2014-06-20  5:02   ` Yinghai Lu
  2014-06-20  5:07     ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: Yinghai Lu @ 2014-06-20  5:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Ingo Molnar, Tetsuo Handa, Daniel M. Weeks,
	Linux Kernel Mailing List

On Thu, Jun 19, 2014 at 9:29 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 06/19/2014 07:12 PM, Yinghai Lu wrote:
>>
>> Also need to use that in write_buffer path for cpio that have file is
>> more than file.
>
> That sentence doesn't make sense.

I mean this path:
   unpack_to_rootfs  ===> write_buffer ===> actions[].../do_copy
and image is uncompressed cpio, and there is one big file (>2G) in that cpio.


>
>
> I would call this function xwrite(), which is usually called in userspace.

Good, will change that.

>
> It would be nice in order to support very large initrd/initramfs, to
> free the memory as it becomes available instead of requiring two copies
> of the data in memory at the same time.

for initramfs, it is from ramdisk_image/ramdisk_size to tmpfs directly.
and ramdisk_image/ramdisk_size get freed.

for initrd, it is transferred to /initrd.image in tmpfs at first, and
ramdisk_image/ramdisk_size
get freed,  at last /initrd.image is decompressed/copied to /dev/ram0
and get removed
from tempfs.

So what do you mean "free the memory"?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] initramfs: Support initrd that is bigger then 2G.
  2014-06-20  5:02   ` Yinghai Lu
@ 2014-06-20  5:07     ` H. Peter Anvin
  2014-06-20 16:03       ` Yinghai Lu
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2014-06-20  5:07 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Ingo Molnar, Tetsuo Handa, Daniel M. Weeks,
	Linux Kernel Mailing List

On 06/19/2014 10:02 PM, Yinghai Lu wrote:
> On Thu, Jun 19, 2014 at 9:29 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 06/19/2014 07:12 PM, Yinghai Lu wrote:
>>>
>>> Also need to use that in write_buffer path for cpio that have file is
>>> more than file.
>>
>> That sentence doesn't make sense.
> 
> I mean this path:
>    unpack_to_rootfs  ===> write_buffer ===> actions[].../do_copy
> and image is uncompressed cpio, and there is one big file (>2G) in that cpio.

Don't tell me, make the description clear so someone can understand it
10 years from now.
>>
>> It would be nice in order to support very large initrd/initramfs, to
>> free the memory as it becomes available instead of requiring two copies
>> of the data in memory at the same time.
> 
> for initramfs, it is from ramdisk_image/ramdisk_size to tmpfs directly.
> and ramdisk_image/ramdisk_size get freed.
> 
> for initrd, it is transferred to /initrd.image in tmpfs at first, and
> ramdisk_image/ramdisk_size
> get freed,  at last /initrd.image is decompressed/copied to /dev/ram0
> and get removed
> from tempfs.
> 
> So what do you mean "free the memory"?
> 

For each of those transfers, we don't free the source memory until the
very end.  We could free that memory as we process the input, requiring
less total memory.

	-hpa



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] initramfs: Support initrd that is bigger then 2G.
  2014-06-20  5:07     ` H. Peter Anvin
@ 2014-06-20 16:03       ` Yinghai Lu
  0 siblings, 0 replies; 5+ messages in thread
From: Yinghai Lu @ 2014-06-20 16:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Ingo Molnar, Tetsuo Handa, Daniel M. Weeks,
	Linux Kernel Mailing List

On Thu, Jun 19, 2014 at 10:07 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> For each of those transfers, we don't free the source memory until the
> very end.  We could free that memory as we process the input, requiring
> less total memory.

Yes, that would be nice enhancement.

Yinghai

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-06-20 16:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-20  2:12 [PATCH] initramfs: Support initrd that is bigger then 2G Yinghai Lu
2014-06-20  4:29 ` H. Peter Anvin
2014-06-20  5:02   ` Yinghai Lu
2014-06-20  5:07     ` H. Peter Anvin
2014-06-20 16:03       ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox