linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: Serge Hallyn <serge@hallyn.com>,
	Matt Helsley <matthltc@us.ibm.com>, Dan Smith <danms@us.ibm.com>,
	Matthew Wilcox <matthew@wil.cx>,
	Jamie Lokier <jamie@shareable.org>,
	Steven Whitehouse <swhiteho@redhat.com>,
	<linux-fsdevel@vger.kernel.org>,
	Containers <containers@lists.linux-foundation.org>
Subject: [PATCH 5/5][v5][cr]: Document design of C/R of file-locks
Date: Thu, 28 Oct 2010 23:16:41 -0700	[thread overview]
Message-ID: <1288333001-28838-6-git-send-email-sukadev@linux.vnet.ibm.com> (raw)
In-Reply-To: <1288333001-28838-1-git-send-email-sukadev@linux.vnet.ibm.com>

Summarize the file-system consistency requirements and the design of
the C/R of file-locks and leases.

Changelog[v5]:
	- This version of the patchset only checkpoints/restores file-locks.
	  C/R of file-owner information requires additional work with struct
	  pids and will be addressed in a follow-on patch. C/R of file-leases,
	  depends on C/R of file-owner info Removed the design information of
	  C/R of file leases from the Documenation for now.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 Documentation/checkpoint/file-locks |   52 +++++++++++++++++++++++++++++++++++
 1 files changed, 52 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/checkpoint/file-locks

diff --git a/Documentation/checkpoint/file-locks b/Documentation/checkpoint/file-locks
new file mode 100644
index 0000000..ccffdef
--- /dev/null
+++ b/Documentation/checkpoint/file-locks
@@ -0,0 +1,52 @@
+
+Filesystem consistency across C/R.
+==================================
+
+To checkpoint/restart a process that is using any filesystem resource, the
+kernel assumes that the file system state at the time of restart is consistent
+with its state at the time of checkpoint. In general, this consistency can be
+achieved by:
+
+	a. running the application inside a container (to ensure no process
+	   outside the container modifies the filesystem/IPC or other states)
+
+	b. freezing the application before checkpoint
+	c. taking a snapshot of the file system while application is frozen
+	d. checkpointing the application while it is frozen
+
+	e. restoring the file system state to its snapshot
+	f. restart the application inside a container
+
+i.e the kernel assumes that file system state is consistent but it does/can
+NOT verify that it is. The administrator must provide this consistency taking
+into account the file system type including whether it is local or remote,
+and the tools available in the file system (snapshot tools in btrfs or rsync
+etc).
+
+For distributed applications operating on distributed filesystems, it is
+expected that an external mechanism will coordinate the freeze/checkpoint/
+snapshot/restart across the nodes. IOW, the current semantics in the kernel
+provide for C/R on a single node.
+
+Checkpoint/restart of file-locks.
+================================
+
+To checkpoint file-locks in an application, we start with each file-descriptor
+and count the number of file-locks on that file-descriptor. We save this count
+in the checkpoint image, and then information about each file-lock on the
+file-descriptor.
+
+When restarting the application from the checkpoint, we read the file-lock
+count for each file-descriptor and then read the information about each
+file-lock. For each file-lock, we call flock_set() to set a new file-lock.
+
+No special handling is necessary for a process P2 in the checkpointed container
+that is blocked on a file-lock, L1 held by another process P1. Processes in the
+restarted container begin execution only after all processes have restored.
+If the blocked process P2 is restored first, it will prepare to return an
+-ERESTARTSYS from the fcntl() system call, but wait for P1 to be restored.
+When P1 is restored, it will re-acquire the file-lock L1 before P1 and P2 begin
+actual execution.
+
+This ensures that even if P2 is scheduled to run before P1, P2 will go
+back to waiting for the file-lock L1.
-- 
1.6.0.4


  parent reply	other threads:[~2010-10-29  6:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-29  6:16 [PATCH 0/5][v5][cr] Checkpoint/restart file locks Sukadev Bhattiprolu
2010-10-29  6:16 ` [PATCH 1/5][v5][cr]: Move file_lock macros into linux/fs.h Sukadev Bhattiprolu
2010-10-29  6:16 ` [PATCH 2/5][v5][cr]: Define flock_set() Sukadev Bhattiprolu
2010-10-29  6:16 ` [PATCH 3/5][v5][cr]: Define flock64_set() Sukadev Bhattiprolu
2010-10-29  6:16 ` [PATCH 4/5][v5][cr]: Checkpoint/restore file-locks Sukadev Bhattiprolu
2010-10-29  6:16 ` Sukadev Bhattiprolu [this message]
2010-10-29 14:31 ` [PATCH 0/5][v5][cr] Checkpoint/restart file locks Lin Ming
2010-10-29 18:35   ` Sukadev Bhattiprolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1288333001-28838-6-git-send-email-sukadev@linux.vnet.ibm.com \
    --to=sukadev@linux.vnet.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=danms@us.ibm.com \
    --cc=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=matthltc@us.ibm.com \
    --cc=orenl@cs.columbia.edu \
    --cc=serge@hallyn.com \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).