All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marco Stornelli <marco.stornelli@gmail.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, criu@openvz.org, devel@openvz.org,
	xemul@parallels.com
Subject: Re: [PATCH RFC] pram: persistent over-kexec memory file system
Date: Sun, 28 Jul 2013 13:02:12 +0200	[thread overview]
Message-ID: <51F4FA34.2070509@gmail.com> (raw)
In-Reply-To: <51F4ECF2.6040408@parallels.com>

Il 28/07/2013 12:05, Vladimir Davydov ha scritto:
> On 07/27/2013 09:37 PM, Marco Stornelli wrote:
>> Il 27/07/2013 19:35, Vladimir Davydov ha scritto:
>>> On 07/27/2013 07:41 PM, Marco Stornelli wrote:
>>>> Il 26/07/2013 14:29, Vladimir Davydov ha scritto:
>>>>> Hi,
>>>>>
>>>>> We want to propose a way to upgrade a kernel on a machine without
>>>>> restarting all the user-space services. This is to be done with CRIU
>>>>> project, but we need help from the kernel to preserve some data in
>>>>> memory while doing kexec.
>>>>>
>>>>> The key point of our implementation is leaving process memory in-place
>>>>> during reboot. This should eliminate most io operations the services
>>>>> would produce during initialization. To achieve this, we have
>>>>> implemented a pseudo file system that preserves its content during
>>>>> kexec. We propose saving CRIU dump files to this file system,
>>>>> kexec'ing
>>>>> and then restoring the processes in the newly booted kernel.
>>>>>
>>>>
>>>> http://pramfs.sourceforge.net/
>>>
>>> AFAIU it's a bit different thing: PRAMFS as well as pstore, which has
>>> already been merged, requires hardware support for over-reboot
>>> persistency, so called non-volatile RAM, i.e. RAM which is not directly
>>> accessible and so is not used by the kernel. On the contrary, what we'd
>>> like to have is preserving usual RAM on kexec. It is possible, because
>>> RAM is not reset during kexec. This would allow leaving applications
>>> working set as well as filesystem caches in place, speeding the reboot
>>> process as a whole and reducing the downtime significantly.
>>>
>>> Thanks.
>>
>> Actually not. You can use normal system RAM reserved at boot with mem
>> parameter without any kernel change. Until an hard reset happens, that
>> area will be "persistent".
>
> Thank you, we'll look at PRAMFS closer, but right now, after trying it I
> have a couple of concerns I'd appreciate if you could clarify:
>
> 1) As you advised, I tried to reserve a range of memory (passing
> memmap=4G$4G at boot) and mounted PRAMFS using the following options:
>
> # mount -t pramfs -o physaddr=0x100000000,init=4G,bs=4096 none /mnt/pramfs
>
> And it turned out that PRAMFS is very slow as compared to ramfs:
>
> # dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
> bs=4096 count=$[100*1024]
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 9.23498 s, 45.4 MB/s
> # dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
> bs=4096 count=$[100*1024] conv=notrunc
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 3.04692 s, 138 MB/s
>
> We need it to be as fast as usual RAM, because otherwise the benefit of
> it over hdd disappears. So before diving into the code, I'd like to ask
> you if it's intrinsic to PRAMFS, or can it be fixed? Or, perhaps, I used
> wrong mount/boot/config options (btw, I enabled only CONFIG_PRAMFS)?
>

In x86 you should have the write protection enabled. Turn it off or 
mount it with noprotect option.

> 2) To enable saving application dump files in memory using PRAMFS, one
> should reserve half of RAM for it. That's too expensive. While with
> ramfs, once SPLICE_F_MOVE flag is implemented, one could move anonymous
> memory pages to ramfs page cache and after kexec move it back so that
> almost no extra memory space costs would be required. Of course,
> SPLICE_F_MOVE is to be yet implemented, but with PRAMFS significant
> memory costs are inevitable... or am I wrong?
>
> Thanks.

 From this point of view you are right. Pramfs (or other solution like 
that) are out of page cache, so you can't do any memory transfer. It's 
like to have a disk but it's actually a separate piece of RAM. We could 
talk about it again when this kind of implementation will be done.

Marco

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Marco Stornelli <marco.stornelli@gmail.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, criu@openvz.org, devel@openvz.org,
	xemul@parallels.com
Subject: Re: [PATCH RFC] pram: persistent over-kexec memory file system
Date: Sun, 28 Jul 2013 13:02:12 +0200	[thread overview]
Message-ID: <51F4FA34.2070509@gmail.com> (raw)
In-Reply-To: <51F4ECF2.6040408@parallels.com>

Il 28/07/2013 12:05, Vladimir Davydov ha scritto:
> On 07/27/2013 09:37 PM, Marco Stornelli wrote:
>> Il 27/07/2013 19:35, Vladimir Davydov ha scritto:
>>> On 07/27/2013 07:41 PM, Marco Stornelli wrote:
>>>> Il 26/07/2013 14:29, Vladimir Davydov ha scritto:
>>>>> Hi,
>>>>>
>>>>> We want to propose a way to upgrade a kernel on a machine without
>>>>> restarting all the user-space services. This is to be done with CRIU
>>>>> project, but we need help from the kernel to preserve some data in
>>>>> memory while doing kexec.
>>>>>
>>>>> The key point of our implementation is leaving process memory in-place
>>>>> during reboot. This should eliminate most io operations the services
>>>>> would produce during initialization. To achieve this, we have
>>>>> implemented a pseudo file system that preserves its content during
>>>>> kexec. We propose saving CRIU dump files to this file system,
>>>>> kexec'ing
>>>>> and then restoring the processes in the newly booted kernel.
>>>>>
>>>>
>>>> http://pramfs.sourceforge.net/
>>>
>>> AFAIU it's a bit different thing: PRAMFS as well as pstore, which has
>>> already been merged, requires hardware support for over-reboot
>>> persistency, so called non-volatile RAM, i.e. RAM which is not directly
>>> accessible and so is not used by the kernel. On the contrary, what we'd
>>> like to have is preserving usual RAM on kexec. It is possible, because
>>> RAM is not reset during kexec. This would allow leaving applications
>>> working set as well as filesystem caches in place, speeding the reboot
>>> process as a whole and reducing the downtime significantly.
>>>
>>> Thanks.
>>
>> Actually not. You can use normal system RAM reserved at boot with mem
>> parameter without any kernel change. Until an hard reset happens, that
>> area will be "persistent".
>
> Thank you, we'll look at PRAMFS closer, but right now, after trying it I
> have a couple of concerns I'd appreciate if you could clarify:
>
> 1) As you advised, I tried to reserve a range of memory (passing
> memmap=4G$4G at boot) and mounted PRAMFS using the following options:
>
> # mount -t pramfs -o physaddr=0x100000000,init=4G,bs=4096 none /mnt/pramfs
>
> And it turned out that PRAMFS is very slow as compared to ramfs:
>
> # dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
> bs=4096 count=$[100*1024]
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 9.23498 s, 45.4 MB/s
> # dd if=/dev/zero of=/mnt/pramfs if=/dev/zero of=/mnt/pramfs/dummy
> bs=4096 count=$[100*1024] conv=notrunc
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 3.04692 s, 138 MB/s
>
> We need it to be as fast as usual RAM, because otherwise the benefit of
> it over hdd disappears. So before diving into the code, I'd like to ask
> you if it's intrinsic to PRAMFS, or can it be fixed? Or, perhaps, I used
> wrong mount/boot/config options (btw, I enabled only CONFIG_PRAMFS)?
>

In x86 you should have the write protection enabled. Turn it off or 
mount it with noprotect option.

> 2) To enable saving application dump files in memory using PRAMFS, one
> should reserve half of RAM for it. That's too expensive. While with
> ramfs, once SPLICE_F_MOVE flag is implemented, one could move anonymous
> memory pages to ramfs page cache and after kexec move it back so that
> almost no extra memory space costs would be required. Of course,
> SPLICE_F_MOVE is to be yet implemented, but with PRAMFS significant
> memory costs are inevitable... or am I wrong?
>
> Thanks.

 From this point of view you are right. Pramfs (or other solution like 
that) are out of page cache, so you can't do any memory transfer. It's 
like to have a disk but it's actually a separate piece of RAM. We could 
talk about it again when this kind of implementation will be done.

Marco

  reply	other threads:[~2013-07-28 11:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-26 12:29 [PATCH RFC] pram: persistent over-kexec memory file system Vladimir Davydov
2013-07-26 12:29 ` Vladimir Davydov
2013-07-26 12:29 ` Vladimir Davydov
2013-07-27 15:41 ` Marco Stornelli
2013-07-27 15:41   ` Marco Stornelli
2013-07-27 17:35   ` Vladimir Davydov
2013-07-27 17:35     ` Vladimir Davydov
2013-07-27 17:35     ` Vladimir Davydov
2013-07-27 17:37     ` Marco Stornelli
2013-07-27 17:37       ` Marco Stornelli
2013-07-28 10:05       ` Vladimir Davydov
2013-07-28 10:05         ` Vladimir Davydov
2013-07-28 10:05         ` Vladimir Davydov
2013-07-28 11:02         ` Marco Stornelli [this message]
2013-07-28 11:02           ` Marco Stornelli
2013-07-28 14:31           ` Vladimir Davydov
2013-07-28 14:31             ` Vladimir Davydov
2013-07-28 14:31             ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51F4FA34.2070509@gmail.com \
    --to=marco.stornelli@gmail.com \
    --cc=criu@openvz.org \
    --cc=devel@openvz.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vdavydov@parallels.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.