linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: steve@chygwyn.com, serue@us.ibm.com,
	Matt Helsley <matthltc@us.ibm.com>,
	matthew@wil.cx, linux-fsdevel@vger.kernel.org,
	Containers <containers@lists.linux-foundation.org>
Subject: Re: [PATCH 9/9][cr][v2]: Restore file-locks
Date: Tue, 15 Jun 2010 00:22:24 -0400	[thread overview]
Message-ID: <4C170000.8040600@cs.columbia.edu> (raw)
In-Reply-To: <20100526235713.GA12768@us.ibm.com>



On 05/26/2010 07:57 PM, Sukadev Bhattiprolu wrote:
> steve@chygwyn.com [steve@chygwyn.com] wrote:
> | Hi,
> | 
> | On Tue, May 18, 2010 at 08:07:32PM -0700, Sukadev Bhattiprolu wrote:
> | > Restore POSIX file-locks of an application from its checkpoint image.
> | > 
> | > Read the saved file-locks from the checkpoint image and for each POSIX
> | > lock, call flock_set() to set the lock on the file.
> | > 
> | > As pointed out by Matt Helsley, no special handling is necessary for a
> | > process P2 in the checkpointed container that is blocked on a lock, L1
> | > held by another process P1.  Since processes in the restarted container
> | > begin execution only after all processes have restored. If the blocked
> | > process P2 is restored first, first, it will prepare to return an
> | > -ERESTARTSYS from the fcntl() system call, but wait for P1 to be
> | > restored. When P1 is restored, it will re-acquire the lock L1 before P1
> | > and P2 begin actual execution. This ensures that even if P2 is scheduled
> | > to run before P1, P2 will go back to waiting for the lock L1.
> | >
> | Does that imply certain conditions wrt checkpointed processes and
> | NFS exports? I'm not sure I exactly undertstand the use case which
> | this is intended to address.
> 
> Well, yes this assumes some pre-requisites are met.
> 
> First lets look at a single system.  We expect that the application
> process tree is run inside a container. This means that the file
> system(s) (and other resources like pipes, IPC) that the application
> is working with are not modified by a process outside the container.

To be precise, we require that (a) resources won't change during
the checkpoint, and (b) the filesystem view at restart would be
the same as at checkpoint.

Running applications inside an isolated container is one way to
achieve that (and more so, to provide guarantees on that). Doing
that provides certain assurance on the resulting checkpoint image.

However, the requirements may be satisfied even outside a container
by, for example, a well behaved applications; except that then we
can't say it's safe -  it depends on the application.

> We also require that the application process tree be frozen before
> checkpointing the application. So even if the checkpoint process takes
> a few minutes, the state of the resources (files, pipes, signals etc)
> does not change since a) application is containerized b) container is
> frozen.
> 
> We already have the ability to run applications inside containers, using
> the clone() system call (see lxc.sf.net for example) and the ability to
> freeze the application using the freezer cgroup in the linux kenrnel.
> 
> | 
> | I was hoping to figure out whether it would also still be safe on
> | a cluster filesystem as well,
> 
> For clusters and NFS, an external protocol has to be established so that
> the distrubuted application can be started/frozen/checkpointed/restarated
> in a coordinated way.
> 
> I think that is something that would have to be built on top of the
> checkpoint/restart functionality that we are working on. Or maybe there
> are existing implementations that we would need to plug into.
> 
> Hope that helps, but its possible I missed your question :-). If so
> please let me know.

What you refer to is checkpoint in a cluster, or distributed checkpoint
of multiple cooperating processes/applications that run on multiple
hosts. Indeed, one simple way to do it is coordinated distributed
checkpoint and restart.

However, I think the question was about a single application (or
container) that is accessing remote and clustered (and possibly
distributed) *file systems*.

Ideally, we would like to have a method to snapshot a filesystem
at checkpoint to guarantee that at restart we have a consistent
view of that file system. This is regardless of whether the file
system is local or remote.  In the absence of such a mechanism,
we will have to rely on the file system not being changed (at least
those parts that the checkpoint application expects to remain as
they had been) until the restart.

Oren.

      reply	other threads:[~2010-06-15  4:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-19  3:07 [PATCH 0/9][cr][v2]: C/R file owner and posix file locks Sukadev Bhattiprolu
2010-05-19  3:07 ` [PATCH 1/9][cr][v2]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
2010-05-19  3:07 ` [PATCH 2/9][cr][v2]: Add uid, euid params to __f_setown() Sukadev Bhattiprolu
2010-05-19  3:07 ` [PATCH 3/9][cr][v2]: Checkpoint file-owner information Sukadev Bhattiprolu
2010-05-19  3:07 ` [PATCH 4/9][cr][v2]: Restore file_owner info Sukadev Bhattiprolu
2010-06-15  4:05   ` Oren Laadan
2010-07-28 19:25     ` Sukadev Bhattiprolu
2010-07-28 22:20       ` Matt Helsley
2010-07-29 19:00         ` Serge E. Hallyn
2010-05-19  3:07 ` [PATCH 5/9][cr][v2]: Move file_lock macros into linux/fs.h Sukadev Bhattiprolu
2010-05-19  3:07 ` [PATCH 6/9][cr][v2]: Checkpoint file-locks Sukadev Bhattiprolu
2010-06-15  4:13   ` Oren Laadan
2010-07-28 19:26     ` Sukadev Bhattiprolu
     [not found]       ` <20100728192649.GB14570-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-07-28 19:42         ` Oren Laadan
2010-07-28 21:29           ` Sukadev Bhattiprolu
2010-07-28 23:39             ` Oren Laadan
2010-05-19  3:07 ` [PATCH 7/9][cr][v2]: Define flock_set() Sukadev Bhattiprolu
2010-05-19  3:07 ` [PATCH 8/9][cr][v2]: Define flock64_set() Sukadev Bhattiprolu
2010-05-19  3:07 ` [PATCH 9/9][cr][v2]: Restore file-locks Sukadev Bhattiprolu
2010-05-26  7:48   ` steve
2010-05-26 23:57     ` Sukadev Bhattiprolu
2010-06-15  4:22       ` Oren Laadan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C170000.8040600@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=matthltc@us.ibm.com \
    --cc=serue@us.ibm.com \
    --cc=steve@chygwyn.com \
    --cc=sukadev@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).