From mboxrd@z Thu Jan  1 00:00:00 1970
From: Matt Helsley <matthltc@us.ibm.com>
Subject: Re: [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases
Date: Fri, 30 Jul 2010 18:36:53 -0700
Message-ID: <20100731013653.GC2927@count0.beaverton.ibm.com>
References: <1274836063-13271-1-git-send-email-sukadev@linux.vnet.ibm.com>
 <1274836063-13271-3-git-send-email-sukadev@linux.vnet.ibm.com>
 <4C170430.2030708@cs.columbia.edu>
 <20100730191607.GA16238@us.ibm.com>
 <4C532BC9.6050109@cs.columbia.edu>
 <1280525871.2451.23.camel@localhost.localdomain>
 <4C5352EF.9080601@cs.columbia.edu>
 <20100731003504.GA3544@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Oren Laadan <orenl@cs.columbia.edu>,
	john stultz <johnstul@us.ibm.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Matt Helsley <matthltc@us.ibm.com>, matthew@wil.cx,
	linux-fsdevel@vger.kernel.org,
	Containers <containers@lists.linux-foundation.org>
To: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from e6.ny.us.ibm.com ([32.97.182.146]:33643 "EHLO e6.ny.us.ibm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754360Ab0GaBg5 (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 30 Jul 2010 21:36:57 -0400
Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147])
	by e6.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o6V1aCCA016933
	for <linux-fsdevel@vger.kernel.org>; Fri, 30 Jul 2010 21:36:12 -0400
Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217])
	by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o6V1atU12396404
	for <linux-fsdevel@vger.kernel.org>; Fri, 30 Jul 2010 21:36:55 -0400
Received: from d01av03.pok.ibm.com (loopback [127.0.0.1])
	by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o6V1asaX032603
	for <linux-fsdevel@vger.kernel.org>; Fri, 30 Jul 2010 22:36:55 -0300
Content-Disposition: inline
In-Reply-To: <20100731003504.GA3544@us.ibm.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Fri, Jul 30, 2010 at 05:35:04PM -0700, Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl@cs.columbia.edu] wrote:
> >
> >
> > john stultz wrote:
> >> On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote:
> >>> Sukadev Bhattiprolu wrote:
> >>>> Oren Laadan [orenl@cs.columbia.edu] wrote:
> >>>> | | | >  		h->fl_type = lock->fl_type;
> >>>> | > +		h->fl_type_prev = lock->fl_type_prev;
> >>>> | >  		h->fl_flags = lock->fl_flags;
> >>>> | > +		if (h->fl_type & F_INPROGRESS && | > 
> >>>> +					(lock->fl_break_time > jiffies))
> >>>> | > +			h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ;
> >>>> | | Hmmm -- we have a ctx->ktime_begin marking the start of the 
> >>>> checkpoint.
> >>>> | It is used for relative-time calculations, for example, the expiry of
> >>>> | restart-blocks and timeouts.  I suggest that we use it here too to be
> >>>> | consistent.
> >>>>
> >>>> Well, I started off using ktime_begin but discussed this with John Stultz
> >>>> (CC'd here) who pointed out that mixing different domains of time is likely
> >>>> to cause errors. ktime is an absolute time and fl_break_time uses jiffies -
> >>>> a relative time. 
> >>>>
> >>>> I think use of ktime_begin for restart_blocks is fine (since they use
> >>>> ktime_t) but using ktime_t for file leases and converting between jiffies
> >>>> and nanoseconds could be a problem, unless we convert fl_break_time to
> >>>> seconds.
> >>>>
> >>>> IOW, can we leave the above computation of ->fl_rem_lease for now ?
> >>> The data on restart_blocks keep relative time - it's the the time
> >>> to expiry relative to ktime_begin (which is absolute).
> >>>
> >>> ktime_begin merely gives a reference point in time against which
> >>> all other time-related values should be saved. The advantage is
> >>> that all time computation are relative to the same point in time
> >>> at checkpoint/restart - the time when the ktime_begin is set. It's
> >>> more consistent this way.
> >>
> >> First, forgive me for not being very aware of the checkpoint/restart
> >> code. 
> >
> > On the contrary, forgive me if I'm stating the obvious below ...
> >
> >>
> >> So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the
> >> time the system booted (more or less). And it represents the checkpoint
> >> time, correct?
> >
> > As a rule, all time measurements in the checkpoint image are
> > saved as relative values, using the start-of-checkpoint as the
> > reference point in time (*).
> >
> > So at checkpoint, every absolute time value should be converted
> > to a value relative to the start-of-checkpoint. At restart, every
> > relative time value from the image is converted back (if needed)
> > to an absolute time value using a respective start-of-restart.
> 
> One general observation, slightly off-topic. You mention 
> "start-of-restart" here and ...
> >
> > This makes sense for the most common case, where if a process
> > had 5 more seconds to sleep at the time of checkpoint, we would
> > like it to have 5 more seconds to sleep after it restarts.
> 
> ... "after it restarts" here. These two can be quite different if,
> as you mention below, the C/R is a lengthy operation.

Worse! I think the big concern is not the duration of checkpoint but the
amount of time userspace expects to store a checkpoint before
restarting.

It's an arbitrary amount of time. A user could restart 50 days later.
Or 100. etc. Sure, it's _unlikely_, but we (checkpoint/restart implementers)
shouldn't count on seeing only "reasonable" times between completion of
checkpoint and initiation of restart. We need to be robust for all
the times we see.

I think that's why Oren made the choices he did. Making times relative
to ktime_begin certainly helps keep the time values semi-sane for those
crazy arbitrary cases in addition to being nice for the "reasonable"
cases.

Cheers,
	-Matt