Linux Container Development
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org
Cc: Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [RFC][PATCH 2/2] CR: handle a single task with private memory maps
Date: Thu, 31 Jul 2008 10:44:47 -0400	[thread overview]
Message-ID: <4891CFDF.8060108@cs.columbia.edu> (raw)
In-Reply-To: <20080731140844.GE22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>



Louis Rilling wrote:
> On Wed, Jul 30, 2008 at 02:27:52PM -0400, Oren Laadan wrote:
>>
>> Louis Rilling wrote:
>>>> +/**
>>>> + * cr_vma_fill_pgarr - fill a page-array with addr/page tuples for a vma
>>>> + * @ctx - checkpoint context
>>>> + * @pgarr - page-array to fill
>>>> + * @vma - vma to scan
>>>> + * @start - start address (updated)
>>>> + */
>>>> +static int cr_vma_fill_pgarr(struct cr_ctx *ctx, struct cr_pgarr *pgarr,
>>>> +			     struct vm_area_struct *vma, unsigned long *start)
>>>> +{
>>>> +	unsigned long end = vma->vm_end;
>>>> +	unsigned long addr = *start;
>>>> +	struct page **pagep;
>>>> +	unsigned long *addrp;
>>>> +	int cow, nr, ret = 0;
>>>> +
>>>> +	nr = pgarr->nleft;
>>>> +	pagep = &pgarr->pages[pgarr->nused];
>>>> +	addrp = &pgarr->addrs[pgarr->nused];
>>>> +	cow = !!vma->vm_file;
>>>> +
>>>> +	while (addr < end) {
>>>> +		struct page *page;
>>>> +
>>>> +		/* simplified version of get_user_pages(): already have vma,
>>>> +		* only need FOLL_TOUCH, and (for now) ignore fault stats */
>>>> +
>>>> +		cond_resched();
>>>> +		while (!(page = follow_page(vma, addr, FOLL_TOUCH))) {
>>>> +			ret = handle_mm_fault(vma->vm_mm, vma, addr, 0);
>>>> +			if (ret & VM_FAULT_ERROR) {
>>>> +				if (ret & VM_FAULT_OOM)
>>>> +					ret = -ENOMEM;
>>>> +				else if (ret & VM_FAULT_SIGBUS)
>>>> +					ret = -EFAULT;
>>>> +				else
>>>> +					BUG();
>>>> +				break;
>>>> +			}
>>>> +			cond_resched();
>>>> +		}
>>> I guess that 'ret' should be checked somewhere after this loop.
>> yes; this is where a "break(2)" construct in C would come handy :)
> 
> Alternatively, putting the inner loop in a separate function often helps to
> handle errors in a cleaner way.

Also true. I opted to keep it that way to keep the code as similar as
possible to get_user_pages().

Note that the logic can be optimized by, instead of traversing the page
table once for each page, we could aggregate a few pages in each round.
I wanted to keep the code simple.

> 
>>>> +
>>>> +		if (IS_ERR(page)) {
>>>> +			ret = PTR_ERR(page);
>>>> +			break;
>>>> +		}
>>>> +
>>>> +		if (page == ZERO_PAGE(0))
>>>> +			page = NULL;	/* zero page: ignore */
>>>> +		else if (cow && page_mapping(page) != NULL)
>>>> +			page = NULL;	/* clean cow: ignore */
>>>> +		else {
>>>> +			get_page(page);
>>>> +			*(addrp++) = addr;
>>>> +			*(pagep++) = page;
>>>> +			if (--nr == 0) {
>>>> +				addr += PAGE_SIZE;
>>>> +				break;
>>>> +			}
>>>> +		}
>>>> +
>>>> +		addr += PAGE_SIZE;
>>>> +	}
>>>> +
>>>> +	if (unlikely(ret < 0)) {
>>>> +		nr = pgarr->nleft - nr;
>>>> +		while (nr--)
>>>> +			page_cache_release(*(--pagep));
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	*start = addr;
>>>> +	return (pgarr->nleft - nr);
>>>> +}
>>>> +
> 
> 
>>>> +int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t)
>>>> +{
>>>> +	struct cr_hdr h;
>>>> +	struct cr_hdr_mm *hh = ctx->tbuf;
>>>> +	struct mm_struct *mm;
>>>> +	struct vm_area_struct *vma;
>>>> +	int ret;
>>>> +
>>>> +	h.type = CR_HDR_MM;
>>>> +	h.len = sizeof(*hh);
>>>> +	h.id = ctx->pid;
>>>> +
>>>> +	mm = get_task_mm(t);
>>>> +
>>>> +	hh->tag = 1;	/* non-zero will mean first time encounter */
>>>> +
>>>> +	hh->start_code = mm->start_code;
>>>> +	hh->end_code = mm->end_code;
>>>> +	hh->start_data = mm->start_data;
>>>> +	hh->end_data = mm->end_data;
>>>> +	hh->start_brk = mm->start_brk;
>>>> +	hh->brk = mm->brk;
>>>> +	hh->start_stack = mm->start_stack;
>>>> +	hh->arg_start = mm->arg_start;
>>>> +	hh->arg_end = mm->arg_end;
>>>> +	hh->env_start = mm->env_start;
>>>> +	hh->env_end = mm->env_end;
>>>> +
>>>> +	hh->map_count = mm->map_count;
>>> Some fields above should also be protected with mmap_sem, like ->brk,
>>> ->map_count, and possibly others (I'm not a memory expert though).
>> true; keep in mind, though, that the container will be frozen during
>> this time, so nothing should change at all. The only exception would
>> be if, for instance, someone is killing the container while we save
>> its state.
> 
> Sure. So you think that taking mm->mmap_sem below is useless? I tend to believe
> so, since no other task should share this mm_struct at this time, and we could
> state that ptrace should not interfere during restart. However, I'm never
> confident when ptrace considerations come in...

Not quite.

Probing the value of mm->brk is always safe, although it may turn out to
yield incorrect value. Traversing the vma's isn't safe, because - if for
instance the target task dies in the middle, it may alter the vma list.
So the mmap_sem protects against the latter.

Anyway, it won't hurt to be extra safe and take the semaphore earlier.

Ptrace, btw, cannot come in because the container is (supposedly) frozen.

Oren.

>>>> +
>>>> +	/* FIX: need also mm->flags */
>>>> +
>>>> +	ret = cr_write_obj(ctx, &h, hh);
>>>> +	if (ret < 0)
>>>> +		goto out;
>>>> +
>>>> +	/* write the vma's */
>>>> +	down_read(&mm->mmap_sem);
>>>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>>>> +		if ((ret = cr_write_vma(ctx, vma)) < 0)
>>>> +			break;
>>>> +	}
>>>> +	up_read(&mm->mmap_sem);
>>>> +
>>>> +	if (ret < 0)
>>>> +		goto out;
>>>> +
>>>> +	ret = cr_write_mm_context(ctx, mm);
>>>> +
>>>> + out:
>>>> +	mmput(mm);
>>>> +	return ret;
>>>> +}
> 
> Thanks,
> 
> Louis
> 

  parent reply	other threads:[~2008-07-31 14:44 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-30  3:27 [RFC][PATCH 2/2] CR: handle a single task with private memory maps Oren Laadan
     [not found] ` <Pine.LNX.4.64.0807292325290.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-07-30  4:51   ` KOSAKI Motohiro
     [not found]     ` <20080730132257.9DF2.KOSAKI.MOTOHIRO-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2008-07-30 18:22       ` Oren Laadan
2008-07-30 20:58   ` Dave Hansen
2008-07-30 22:07   ` Serge E. Hallyn
     [not found]     ` <20080730220752.GA3518-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 22:20       ` Oren Laadan
     [not found]         ` <4890E930.9090204-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 13:57           ` Louis Rilling
     [not found]             ` <20080731135703.GC22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 15:09               ` Oren Laadan
     [not found]                 ` <4891D5C2.8090000-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 15:58                   ` Louis Rilling
     [not found]                     ` <20080731155856.GH22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 16:28                       ` Oren Laadan
     [not found]                         ` <4891E849.1050701-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 17:50                           ` Louis Rilling
     [not found]                             ` <20080731175058.GI22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 19:12                               ` Oren Laadan
     [not found]                                 ` <48920EA0.1060608-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 10:26                                   ` Louis Rilling
     [not found]                                     ` <20080801102600.GJ22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 14:15                                       ` Oren Laadan
     [not found]                                         ` <48931A7E.1040302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 18:00                                           ` Louis Rilling
     [not found]                                             ` <20080801180038.GL22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 18:51                                               ` Oren Laadan
     [not found]                                                 ` <48935B4D.7070302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-04 10:16                                                   ` Louis Rilling
2008-08-05  2:37                                                     ` Oren Laadan
     [not found]                                                       ` <4897BCE0.1080508-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-05  3:51                                                         ` Joseph Ruscio
     [not found]                                                           ` <1FA56146-7C30-4C36-982D-A50AA8BC8392-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-05  9:19                                                             ` Louis Rilling
2008-08-05 16:20                                                               ` Oren Laadan
     [not found]                                                                 ` <48987DE7.3060408-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-06 15:41                                                                   ` Joseph Ruscio
     [not found]                                                                     ` <3A99F254-E9B3-484B-85B0-29023ADA04C4-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:25                                                                       ` Louis Rilling
2008-08-05 16:23                                                             ` Dave Hansen
2008-08-06 16:15                                                               ` Joseph Ruscio
     [not found]                                                                 ` <FE4D936E-06F1-45D2-8E7C-85D87149BDC0-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:29                                                                   ` Louis Rilling
2008-08-08 17:20                                                               ` Joseph Ruscio
     [not found]                                                                 ` <03CE5BD3-E84A-4617-93BC-722ECB846C63-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-08 17:24                                                                   ` Dave Hansen
2008-08-05  9:32                                                         ` Louis Rilling
2008-07-31 21:25           ` Serge E. Hallyn
     [not found] ` <20080730161535.GB22403@hawkmoon.kerlabs.com>
     [not found]   ` <20080730161535.GB22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-30 18:27     ` Oren Laadan
     [not found]       ` <4890B2A8.8010808-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 14:08         ` Louis Rilling
     [not found]           ` <20080731140844.GE22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:44             ` Oren Laadan [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-07-30 16:52 Serge E. Hallyn
     [not found] ` <20080730165249.GA23802-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 17:40   ` Dave Hansen
2008-07-31 13:59     ` Louis Rilling
     [not found]       ` <20080731135910.GD22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:14         ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4891CFDF.8060108@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox