All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
To: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: [PATCH] c/r: do not hold mmap_sem while checkpointing vma's
Date: Mon, 26 Oct 2009 13:39:14 -0400	[thread overview]
Message-ID: <4AE5DEC2.5090904@librato.com> (raw)
In-Reply-To: <20091026172423.GG23564-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
>> This patch modifies the memory checkpoint code to _not_ hold the
>> mmap_sem while dumping out the vma's.
>>
>> The problem with holding the mmap_sem is that it first takes the
>> mmap_sem and then takes the file's inode semaphore. This violates the
>> normal locking order, e,g, when taking a page fault during a copyout,
>> which is inode sem and then the mmap_sem.
>>
>> Normally this reverse locking order won't cause a lockup because a the
>> output file for the checkpoint image isn't used by the checkpointee.
>> However, there a couple of cases where it may be a problem, e.g. when
>> some async-IO happens to complete and triggers a page fault at the
>> wrong time.
>>
>> This fixes complaints from the lockdep about this reverse ordering.
>>
>> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
>> ---
>>  checkpoint/memory.c |  133 ++++++++++++++++++++++++++++++++++++---------------
>>  1 files changed, 94 insertions(+), 39 deletions(-)
> ...
>> @@ -1288,9 +1343,9 @@ static struct mm_struct *do_restore_mm(struct ckpt_ctx *ctx)
>>  		}
>>  		set_mm_exe_file(mm, file);
>>  	}
>> +	up_write(&mm->mmap_sem);
>>
>>  	ret = _ckpt_read_buffer(ctx, mm->saved_auxv, sizeof(mm->saved_auxv));
>> -	up_write(&mm->mmap_sem);
>>  	if (ret < 0)
>>  		goto out;
> 
> Could there be a race here?  (If only with someone reading /proc/PID/auxv
> while this is happening, though maybe with another task sharing the mm at
> restart)  I wonder whether you should read into a tmpbuf without mm->mmap_sem,
> then re-acquire and write into mm->saved_auxv?

There is a race, but it is a harmless race: a user who does weird
things will get weird results (*).

Note that proc_pid_auxv() does not take the mmap_sem anyway.

If another process shared mm with a restarting task, then that other
process will crash soon after the mm is restored (see *).

Oren.

  parent reply	other threads:[~2009-10-26 17:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-25 22:23 [PATCH] c/r: do not hold mmap_sem while checkpointing vma's Oren Laadan
     [not found] ` <1256509409-3866-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-26 17:24   ` Serge E. Hallyn
     [not found]     ` <20091026172423.GG23564-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-26 17:39       ` Oren Laadan [this message]
2009-10-26 20:52   ` Matt Helsley
     [not found]     ` <20091026205236.GK31446-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-26 23:24       ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AE5DEC2.5090904@librato.com \
    --to=orenl-rdfvbdnroixbdgjk7y7tuq@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.