All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org
Cc: Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [RFC][PATCH 2/2] CR: handle a single task with private memory maps
Date: Thu, 31 Jul 2008 10:44:47 -0400	[thread overview]
Message-ID: <4891CFDF.8060108@cs.columbia.edu> (raw)
In-Reply-To: <20080731140844.GE22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>



Louis Rilling wrote:
> On Wed, Jul 30, 2008 at 02:27:52PM -0400, Oren Laadan wrote:
>>
>> Louis Rilling wrote:
>>>> +/**
>>>> + * cr_vma_fill_pgarr - fill a page-array with addr/page tuples for a vma
>>>> + * @ctx - checkpoint context
>>>> + * @pgarr - page-array to fill
>>>> + * @vma - vma to scan
>>>> + * @start - start address (updated)
>>>> + */
>>>> +static int cr_vma_fill_pgarr(struct cr_ctx *ctx, struct cr_pgarr *pgarr,
>>>> +			     struct vm_area_struct *vma, unsigned long *start)
>>>> +{
>>>> +	unsigned long end = vma->vm_end;
>>>> +	unsigned long addr = *start;
>>>> +	struct page **pagep;
>>>> +	unsigned long *addrp;
>>>> +	int cow, nr, ret = 0;
>>>> +
>>>> +	nr = pgarr->nleft;
>>>> +	pagep = &pgarr->pages[pgarr->nused];
>>>> +	addrp = &pgarr->addrs[pgarr->nused];
>>>> +	cow = !!vma->vm_file;
>>>> +
>>>> +	while (addr < end) {
>>>> +		struct page *page;
>>>> +
>>>> +		/* simplified version of get_user_pages(): already have vma,
>>>> +		* only need FOLL_TOUCH, and (for now) ignore fault stats */
>>>> +
>>>> +		cond_resched();
>>>> +		while (!(page = follow_page(vma, addr, FOLL_TOUCH))) {
>>>> +			ret = handle_mm_fault(vma->vm_mm, vma, addr, 0);
>>>> +			if (ret & VM_FAULT_ERROR) {
>>>> +				if (ret & VM_FAULT_OOM)
>>>> +					ret = -ENOMEM;
>>>> +				else if (ret & VM_FAULT_SIGBUS)
>>>> +					ret = -EFAULT;
>>>> +				else
>>>> +					BUG();
>>>> +				break;
>>>> +			}
>>>> +			cond_resched();
>>>> +		}
>>> I guess that 'ret' should be checked somewhere after this loop.
>> yes; this is where a "break(2)" construct in C would come handy :)
> 
> Alternatively, putting the inner loop in a separate function often helps to
> handle errors in a cleaner way.

Also true. I opted to keep it that way to keep the code as similar as
possible to get_user_pages().

Note that the logic can be optimized by, instead of traversing the page
table once for each page, we could aggregate a few pages in each round.
I wanted to keep the code simple.

> 
>>>> +
>>>> +		if (IS_ERR(page)) {
>>>> +			ret = PTR_ERR(page);
>>>> +			break;
>>>> +		}
>>>> +
>>>> +		if (page == ZERO_PAGE(0))
>>>> +			page = NULL;	/* zero page: ignore */
>>>> +		else if (cow && page_mapping(page) != NULL)
>>>> +			page = NULL;	/* clean cow: ignore */
>>>> +		else {
>>>> +			get_page(page);
>>>> +			*(addrp++) = addr;
>>>> +			*(pagep++) = page;
>>>> +			if (--nr == 0) {
>>>> +				addr += PAGE_SIZE;
>>>> +				break;
>>>> +			}
>>>> +		}
>>>> +
>>>> +		addr += PAGE_SIZE;
>>>> +	}
>>>> +
>>>> +	if (unlikely(ret < 0)) {
>>>> +		nr = pgarr->nleft - nr;
>>>> +		while (nr--)
>>>> +			page_cache_release(*(--pagep));
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	*start = addr;
>>>> +	return (pgarr->nleft - nr);
>>>> +}
>>>> +
> 
> 
>>>> +int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t)
>>>> +{
>>>> +	struct cr_hdr h;
>>>> +	struct cr_hdr_mm *hh = ctx->tbuf;
>>>> +	struct mm_struct *mm;
>>>> +	struct vm_area_struct *vma;
>>>> +	int ret;
>>>> +
>>>> +	h.type = CR_HDR_MM;
>>>> +	h.len = sizeof(*hh);
>>>> +	h.id = ctx->pid;
>>>> +
>>>> +	mm = get_task_mm(t);
>>>> +
>>>> +	hh->tag = 1;	/* non-zero will mean first time encounter */
>>>> +
>>>> +	hh->start_code = mm->start_code;
>>>> +	hh->end_code = mm->end_code;
>>>> +	hh->start_data = mm->start_data;
>>>> +	hh->end_data = mm->end_data;
>>>> +	hh->start_brk = mm->start_brk;
>>>> +	hh->brk = mm->brk;
>>>> +	hh->start_stack = mm->start_stack;
>>>> +	hh->arg_start = mm->arg_start;
>>>> +	hh->arg_end = mm->arg_end;
>>>> +	hh->env_start = mm->env_start;
>>>> +	hh->env_end = mm->env_end;
>>>> +
>>>> +	hh->map_count = mm->map_count;
>>> Some fields above should also be protected with mmap_sem, like ->brk,
>>> ->map_count, and possibly others (I'm not a memory expert though).
>> true; keep in mind, though, that the container will be frozen during
>> this time, so nothing should change at all. The only exception would
>> be if, for instance, someone is killing the container while we save
>> its state.
> 
> Sure. So you think that taking mm->mmap_sem below is useless? I tend to believe
> so, since no other task should share this mm_struct at this time, and we could
> state that ptrace should not interfere during restart. However, I'm never
> confident when ptrace considerations come in...

Not quite.

Probing the value of mm->brk is always safe, although it may turn out to
yield incorrect value. Traversing the vma's isn't safe, because - if for
instance the target task dies in the middle, it may alter the vma list.
So the mmap_sem protects against the latter.

Anyway, it won't hurt to be extra safe and take the semaphore earlier.

Ptrace, btw, cannot come in because the container is (supposedly) frozen.

Oren.

>>>> +
>>>> +	/* FIX: need also mm->flags */
>>>> +
>>>> +	ret = cr_write_obj(ctx, &h, hh);
>>>> +	if (ret < 0)
>>>> +		goto out;
>>>> +
>>>> +	/* write the vma's */
>>>> +	down_read(&mm->mmap_sem);
>>>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>>>> +		if ((ret = cr_write_vma(ctx, vma)) < 0)
>>>> +			break;
>>>> +	}
>>>> +	up_read(&mm->mmap_sem);
>>>> +
>>>> +	if (ret < 0)
>>>> +		goto out;
>>>> +
>>>> +	ret = cr_write_mm_context(ctx, mm);
>>>> +
>>>> + out:
>>>> +	mmput(mm);
>>>> +	return ret;
>>>> +}
> 
> Thanks,
> 
> Louis
> 

  parent reply	other threads:[~2008-07-31 14:44 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-30  3:27 [RFC][PATCH 2/2] CR: handle a single task with private memory maps Oren Laadan
     [not found] ` <20080730161535.GB22403@hawkmoon.kerlabs.com>
     [not found]   ` <20080730161535.GB22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-30 18:27     ` Oren Laadan
     [not found]       ` <4890B2A8.8010808-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 14:08         ` Louis Rilling
     [not found]           ` <20080731140844.GE22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:44             ` Oren Laadan [this message]
     [not found] ` <Pine.LNX.4.64.0807292325290.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-07-30  4:51   ` KOSAKI Motohiro
     [not found]     ` <20080730132257.9DF2.KOSAKI.MOTOHIRO-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2008-07-30 18:22       ` Oren Laadan
2008-07-30 20:58   ` Dave Hansen
2008-07-30 22:07   ` Serge E. Hallyn
     [not found]     ` <20080730220752.GA3518-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 22:20       ` Oren Laadan
     [not found]         ` <4890E930.9090204-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 13:57           ` Louis Rilling
     [not found]             ` <20080731135703.GC22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 15:09               ` Oren Laadan
     [not found]                 ` <4891D5C2.8090000-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 15:58                   ` Louis Rilling
     [not found]                     ` <20080731155856.GH22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 16:28                       ` Oren Laadan
     [not found]                         ` <4891E849.1050701-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 17:50                           ` Louis Rilling
     [not found]                             ` <20080731175058.GI22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 19:12                               ` Oren Laadan
     [not found]                                 ` <48920EA0.1060608-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 10:26                                   ` Louis Rilling
     [not found]                                     ` <20080801102600.GJ22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 14:15                                       ` Oren Laadan
     [not found]                                         ` <48931A7E.1040302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 18:00                                           ` Louis Rilling
     [not found]                                             ` <20080801180038.GL22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 18:51                                               ` Oren Laadan
     [not found]                                                 ` <48935B4D.7070302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-04 10:16                                                   ` Louis Rilling
2008-08-05  2:37                                                     ` Oren Laadan
     [not found]                                                       ` <4897BCE0.1080508-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-05  3:51                                                         ` Joseph Ruscio
     [not found]                                                           ` <1FA56146-7C30-4C36-982D-A50AA8BC8392-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-05  9:19                                                             ` Louis Rilling
2008-08-05 16:20                                                               ` Oren Laadan
     [not found]                                                                 ` <48987DE7.3060408-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-06 15:41                                                                   ` Joseph Ruscio
     [not found]                                                                     ` <3A99F254-E9B3-484B-85B0-29023ADA04C4-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:25                                                                       ` Louis Rilling
2008-08-05 16:23                                                             ` Dave Hansen
2008-08-06 16:15                                                               ` Joseph Ruscio
     [not found]                                                                 ` <FE4D936E-06F1-45D2-8E7C-85D87149BDC0-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:29                                                                   ` Louis Rilling
2008-08-08 17:20                                                               ` Joseph Ruscio
     [not found]                                                                 ` <03CE5BD3-E84A-4617-93BC-722ECB846C63-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-08 17:24                                                                   ` Dave Hansen
2008-08-05  9:32                                                         ` Louis Rilling
2008-07-31 21:25           ` Serge E. Hallyn
  -- strict thread matches above, loose matches on Subject: below --
2008-07-30 16:52 Serge E. Hallyn
     [not found] ` <20080730165249.GA23802-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 17:40   ` Dave Hansen
2008-07-31 13:59     ` Louis Rilling
     [not found]       ` <20080731135910.GD22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:14         ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4891CFDF.8060108@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.