From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: arnd@arndb.de, jeremy@goop.org, linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org
Subject: Re: [RFC v2][PATCH 4/9] Memory management - dump state
Date: Wed, 27 Aug 2008 08:41:36 -0700 [thread overview]
Message-ID: <1219851696.8680.67.camel@nimitz> (raw)
In-Reply-To: <48B49C61.1040003@cs.columbia.edu>
On Tue, 2008-08-26 at 20:14 -0400, Oren Laadan wrote:
> >>>> +
> >>>> + while (addr < end) {
> >>>> + struct page *page;
> >>>> +
> >>>> + /* simplified version of get_user_pages(): already have vma,
> >>>> + * only need FOLL_TOUCH, and (for now) ignore fault stats */
> >>>> +
> >>>> + cond_resched();
> >>>> + while (!(page = follow_page(vma, addr, FOLL_TOUCH))) {
> >>>> + ret = handle_mm_fault(vma->vm_mm, vma, addr, 0);
> >>>> + if (ret & VM_FAULT_ERROR) {
> >>>> + if (ret & VM_FAULT_OOM)
> >>>> + ret = -ENOMEM;
> >>>> + else if (ret & VM_FAULT_SIGBUS)
> >>>> + ret = -EFAULT;
> >>>> + else
> >>>> + BUG();
> >>>> + break;
> >>>> + }
> >>>> + cond_resched();
> >>>> + }
> >>> At best this needs to get folded back into its own function. This is
> >> This is almost identical to the original - see the preceding comment.
> >
> > Exactly. The code is copy-and-pasted. If there's a bug in the
> > original, who will change this one? Better to simply consolidate the
> > code into one copy.
> >
> >>> pretty hard to read as-is. Also, are there truly no in-kernel functions
> >>> that can be used for this?
> >> Can you suggest one ?
> >
> > This is where the mentality has to shift. Instead of thinking "there is
> > no exact in-kernel match for what I want here" we need to consider that
> > we can modify the in-kernel ones. They can be modified to suit both
> > existing and the new needs that we have.
>
> I agree.
>
> However, my main goal now is not to make this patch perfect, but to provide
> a viable proof-of-concept that demonstrates how we want to do things. Unless
> you feel we are near ready to send these for inclusion soon (...), I intend
> to prioritize design and functionality.
So, you currently have buy-in on this basic approach from the OpenVZ
guys, and probably Ingo. If you get too far along in the "design and
functionality" and a community member comes up with a really good
objection to some basic part of the "design and functionality" you're
going to have a lot of code to rewrite.
I'm trying to get minimized the quantity of "design and functionality"
down to the bare minimum set where we can get this merged. So, yes, I
do think these should be sent for inclusion soon.
I believe that we're on course to create another large out-of-tree patch
set that does in-kernel checkpoint/restart. We have no shortage of
those. It's been done many times before.
This one will *HOPEFULLY* be different from all the rest when it gets
merged.
> >>>> + for (pgarr = ctx->pgarr; pgarr; pgarr = pgarr->next) {
> >>>> + struct page **pages = pgarr->pages;
> >>>> + int nr = pgarr->nused;
> >>>> + void *ptr;
> >>>> +
> >>>> + while (nr--) {
> >>>> + ptr = kmap(*pages);
> >>>> + ret = cr_kwrite(ctx, ptr, PAGE_SIZE);
> >>>> + kunmap(*pages);
> >>> Why not use kmap_atomic() here?
> >> It is my understanding that the code must not sleep between kmap_atomic()
> >> and kunmap_atomic().
> >
> > Yes, but you're going to absolutely kill performance on systems which
> > have expensive global TLB flushes. Frequent kmaps()s should be avoided
> > at all costs.
> >
> > The best way around this would be to change the interface to cr_kwrite()
> > so that it didn't have to use *mapped* kernel memory. Maybe an
> > interface that takes 'struct page'. Or, one where we kmap_atomic() the
> > buffer, kunmap_atomic(), copy to a temporary buffer, then cr_kwrite().
> >
> >>>> +static int cr_write_vma(struct cr_ctx *ctx, struct vm_area_struct *vma)
> >>>> +{
> >>>> + struct cr_hdr h;
> >>>> + struct cr_hdr_vma *hh = ctx->hbuf;
> >>>> + char *fname = NULL;
> >>>> + int flen = 0, how, nr, ret;
> >>>> +
> >>>> + h.type = CR_HDR_VMA;
> >>>> + h.len = sizeof(*hh);
> >>>> + h.ptag = 0;
> >>>> +
> >>>> + hh->vm_start = vma->vm_start;
> >>>> + hh->vm_end = vma->vm_end;
> >>>> + hh->vm_page_prot = vma->vm_page_prot.pgprot;
> >>>> + hh->vm_flags = vma->vm_flags;
> >>>> + hh->vm_pgoff = vma->vm_pgoff;
> >>>> +
> >>>> + if (vma->vm_flags & (VM_SHARED | VM_IO | VM_HUGETLB | VM_NONLINEAR)) {
> >>>> + pr_warning("CR: unknown VMA %#lx\n", vma->vm_flags);
> >>>> + return -ETXTBSY;
> >>>> + }
> >>> Hmmm. Interesting error code for VM_HUGETLB. :)
> >> :) well, the usual EINVAL didn't seem suitable. Any better suggestions ?
> >
> > -ENOSUP might be clearest for now. "Your system call tried to do
> > something unsupported."
>
> Are you suggesting adding a new error code ?
Dang. What's the thing we return from system calls that are
unsupported? Did I just dream that up?
-- Dave
next prev parent reply other threads:[~2008-08-27 15:41 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-21 2:58 [RFC v2][PATCH 1/9] kernel based checkpoint-restart Oren Laadan
2008-08-21 3:03 ` [RFC v2][PATCH 1/9] Create trivial sys_checkpoint/sys_restart syscalls Oren Laadan
2008-08-21 5:17 ` Oren Laadan
2008-08-22 19:32 ` Dave Hansen
2008-08-22 20:11 ` Dave Hansen
2008-08-22 21:20 ` Oren Laadan
2008-08-21 3:04 ` [RFC v2][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-08-21 9:35 ` Louis Rilling
2008-08-24 5:58 ` Oren Laadan
2008-08-22 20:01 ` Dave Hansen
2008-08-25 2:47 ` Oren Laadan
2008-08-26 16:42 ` Dave Hansen
2008-08-27 0:38 ` Oren Laadan
2008-08-21 3:04 ` [RFC v2][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-08-22 20:17 ` Dave Hansen
2008-08-21 3:05 ` [RFC v2][PATCH 4/9] Memory management - dump state Oren Laadan
2008-08-21 7:30 ` Ingo Molnar
2008-08-21 8:01 ` Justin P. Mattock
2008-08-21 10:28 ` Balbir Singh
2008-08-21 11:59 ` Ingo Molnar
2008-08-21 9:53 ` Louis Rilling
2008-08-22 21:21 ` Oren Laadan
2008-08-22 20:37 ` Dave Hansen
2008-08-24 5:40 ` Oren Laadan
2008-08-26 16:33 ` Dave Hansen
2008-08-27 0:14 ` Oren Laadan
2008-08-27 15:41 ` Dave Hansen [this message]
2008-08-27 15:57 ` Louis Rilling
2008-08-27 16:12 ` Dave Hansen
2008-08-27 16:19 ` Jeremy Fitzhardinge
2008-08-27 20:34 ` Serge E. Hallyn
2008-08-27 20:38 ` Dave Hansen
2008-08-27 20:48 ` Serge E. Hallyn
2008-08-27 20:56 ` Dave Hansen
2008-08-31 7:16 ` Oren Laadan
2008-08-31 17:34 ` Cedric Le Goater
2008-09-03 11:43 ` [Devel] " Andrey Mirkin
2008-09-03 12:15 ` Cedric Le Goater
2008-09-03 13:29 ` Andrey Mirkin
2008-09-02 15:32 ` Serge E. Hallyn
2008-08-21 3:05 ` [RFC v2][PATCH 5/9] Memory managemnet - restore state Oren Laadan
2008-08-21 10:07 ` Louis Rilling
2008-08-21 3:06 ` [RFC v2][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-08-21 3:06 ` [RFC v2][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-08-21 10:40 ` Louis Rilling
2008-08-26 17:01 ` Dave Hansen
2008-08-27 8:26 ` Louis Rilling
2008-08-21 3:07 ` [RFC v2][PATCH 8/9] File descriprtors - dump state Oren Laadan
2008-08-21 11:06 ` Louis Rilling
2008-08-25 3:28 ` Oren Laadan
2008-08-25 10:30 ` Louis Rilling
2008-08-21 3:07 ` [RFC v2][PATCH 9/9] File descriprtors (restore) Oren Laadan
2008-08-21 5:26 ` Oren Laadan
2008-08-21 5:15 ` [RFC v2][PATCH 1/9] kernel based checkpoint-restart Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1219851696.8680.67.camel@nimitz \
--to=dave@linux.vnet.ibm.com \
--cc=arnd@arndb.de \
--cc=containers@lists.linux-foundation.org \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=orenl@cs.columbia.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox