From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: john stultz <johnstul@us.ibm.com>,
"Serge E. Hallyn" <serge@hallyn.com>,
Matt Helsley <matthltc@us.ibm.com>,
matthew@wil.cx, linux-fsdevel@vger.kernel.org,
Containers <containers@lists.linux-foundation.org>
Subject: Re: [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases
Date: Fri, 30 Jul 2010 17:35:04 -0700 [thread overview]
Message-ID: <20100731003504.GA3544@us.ibm.com> (raw)
In-Reply-To: <4C5352EF.9080601@cs.columbia.edu>
Oren Laadan [orenl@cs.columbia.edu] wrote:
>
>
> john stultz wrote:
>> On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote:
>>> Sukadev Bhattiprolu wrote:
>>>> Oren Laadan [orenl@cs.columbia.edu] wrote:
>>>> | | | > h->fl_type = lock->fl_type;
>>>> | > + h->fl_type_prev = lock->fl_type_prev;
>>>> | > h->fl_flags = lock->fl_flags;
>>>> | > + if (h->fl_type & F_INPROGRESS && | >
>>>> + (lock->fl_break_time > jiffies))
>>>> | > + h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ;
>>>> | | Hmmm -- we have a ctx->ktime_begin marking the start of the
>>>> checkpoint.
>>>> | It is used for relative-time calculations, for example, the expiry of
>>>> | restart-blocks and timeouts. I suggest that we use it here too to be
>>>> | consistent.
>>>>
>>>> Well, I started off using ktime_begin but discussed this with John Stultz
>>>> (CC'd here) who pointed out that mixing different domains of time is likely
>>>> to cause errors. ktime is an absolute time and fl_break_time uses jiffies -
>>>> a relative time.
>>>>
>>>> I think use of ktime_begin for restart_blocks is fine (since they use
>>>> ktime_t) but using ktime_t for file leases and converting between jiffies
>>>> and nanoseconds could be a problem, unless we convert fl_break_time to
>>>> seconds.
>>>>
>>>> IOW, can we leave the above computation of ->fl_rem_lease for now ?
>>> The data on restart_blocks keep relative time - it's the the time
>>> to expiry relative to ktime_begin (which is absolute).
>>>
>>> ktime_begin merely gives a reference point in time against which
>>> all other time-related values should be saved. The advantage is
>>> that all time computation are relative to the same point in time
>>> at checkpoint/restart - the time when the ktime_begin is set. It's
>>> more consistent this way.
>>
>> First, forgive me for not being very aware of the checkpoint/restart
>> code.
>
> On the contrary, forgive me if I'm stating the obvious below ...
>
>>
>> So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the
>> time the system booted (more or less). And it represents the checkpoint
>> time, correct?
>
> As a rule, all time measurements in the checkpoint image are
> saved as relative values, using the start-of-checkpoint as the
> reference point in time (*).
>
> So at checkpoint, every absolute time value should be converted
> to a value relative to the start-of-checkpoint. At restart, every
> relative time value from the image is converted back (if needed)
> to an absolute time value using a respective start-of-restart.
One general observation, slightly off-topic. You mention
"start-of-restart" here and ...
>
> This makes sense for the most common case, where if a process
> had 5 more seconds to sleep at the time of checkpoint, we would
> like it to have 5 more seconds to sleep after it restarts.
... "after it restarts" here. These two can be quite different if,
as you mention below, the C/R is a lengthy operation.
If application had 5 seconds remaining on the lease, and C/R takes
more than 5 seconds, the application would have spent the remaining
lease frozen. Of course trying to compute the values relative to the
"end-of-restart" is theoretically impossible :-).
Maybe the user can adjust the lease-expiry in the checkpoint-image
to allow for this as well.
>
> (For the rare case where you care about the absolute timeout,
> userspace can modify the checkpoint image on-the-fly during
> restart to adjust the timeout value to be 0, or close enough
> to zero that the timeout will happen as soon as the restart
> completes).
>
> That said, ktime_begin has two purposes:
>
> 1) It stores the start-of-{checkpoint,restart} during checkpoint
> and restart respectively, and used for all conversions from
> absolute to relative time values. This is purely internal use
> during the checkpoint/restart operations.
> It make sense to use a single value, as opposed to compute the
> deltas "on the go", because checkpoint/restart can be lengthy
> operations (depending on the image size and IO speed). Doing
> so against current time, which is a moving target, can yield
> inconsistent results.
>
> (*) Looking at the code, I see that the hrtimer expiry times
> are actually calculated against current time - I'll fix that...
>
> 2) It provides information for the admin/user about what was
> the absolute time when the checkpoint operation began (maybe
> we should also end ktime_end...). Otherwise, we would have to
> wrap each checkpoint image with additional meta-data from
> userspace.
> (In particular, having that absolute time in the checkpoint
> image will allow a userspace filter to modify timeout values
> as needed in the rare case mentioned above).
>
>> Is there a similar checkpointed jiffies value?
>
> No. but ...
>
>>> I don't see the problem with jiffies vs ktime - there are functions
>>> to convert between different units (see jiffies.h). Even if you are
>>> concerned about reducing the resolution because of a conversion -
>>> well, it isn't realistic to expect nano-sec resolution after restart...
>>
>> You're right, there are functions to convert from relative jiffies to
>> relative nanoseconds, but there really aren't good functions to convert
>> from absolute jiffies to absolute CLOCK_MONOTONIC time.
>>
>> One can make a rough approximation, but its not a very precise method
>> with a number of complications (ie: 32bit jiffies wraps every ~50 days,
>> the INITIAL_JIFFIES offset has to be remembered, and the larger issue of
>> the fact that CLOCK_MONOTONIC is NTP corrected, while jiffies may or may
>> not be). So I'd strongly advise against trying to directly convert abs
>> jiffies values to abs CLOCK_MONOTONIC time.
>
> ... given this, to be able to compute the relative time where
> jiffies are involved (e.g. the remaining lease time relative to
> ktime_begin) we could also keep a ctx->jiffies_begin, calculate
> the delta, and then convert to ktime_t delta.
>
> Suka: if that works for you, can you please add this piece to
> your patch ?
Yes, sure I can do that.
Thanks John, Oren.
Sukadev
next prev parent reply other threads:[~2010-07-31 0:29 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-26 1:07 [RFC][PATCH 0/3][cr][v2]: Checkpoint/restart file leases Sukadev Bhattiprolu
2010-05-26 1:07 ` [RFC][PATCH 1/3][cr][v2]: Define do_setlease() Sukadev Bhattiprolu
[not found] ` <1274836063-13271-2-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-05-26 13:52 ` Serge E. Hallyn
2010-05-26 13:52 ` Serge E. Hallyn
2010-05-26 17:14 ` Sukadev Bhattiprolu
[not found] ` <20100526135256.GA25799-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-05-26 17:14 ` Sukadev Bhattiprolu
[not found] ` <1274836063-13271-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-05-26 1:07 ` Sukadev Bhattiprolu
2010-05-26 1:07 ` [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases Sukadev Bhattiprolu
2010-05-26 1:07 ` [RFC][PATCH 3/3][cr][v2]: fileleases: C/R of an in-progress lease Sukadev Bhattiprolu
2010-05-26 1:07 ` [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases Sukadev Bhattiprolu
2010-06-15 4:40 ` Oren Laadan
2010-07-30 19:16 ` Sukadev Bhattiprolu
[not found] ` <20100730191607.GA16238-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-07-30 19:45 ` Oren Laadan
2010-07-30 19:45 ` Oren Laadan
2010-07-30 21:37 ` john stultz
2010-07-30 22:32 ` Oren Laadan
2010-07-31 0:35 ` Sukadev Bhattiprolu [this message]
2010-07-31 1:36 ` Matt Helsley
[not found] ` <20100731003504.GA3544-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-07-31 1:36 ` Matt Helsley
2010-07-31 4:52 ` Oren Laadan
2010-07-31 4:52 ` Oren Laadan
[not found] ` <4C5352EF.9080601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-31 0:35 ` Sukadev Bhattiprolu
[not found] ` <1280525871.2451.23.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-07-30 22:32 ` Oren Laadan
[not found] ` <4C532BC9.6050109-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-30 21:37 ` john stultz
[not found] ` <4C170430.2030708-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-30 19:16 ` Sukadev Bhattiprolu
[not found] ` <1274836063-13271-3-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-06-15 4:40 ` Oren Laadan
2010-05-26 1:07 ` [RFC][PATCH 3/3][cr][v2]: fileleases: C/R of an in-progress lease Sukadev Bhattiprolu
2010-06-15 4:43 ` Oren Laadan
2010-06-16 11:18 ` Jamie Lokier
[not found] ` <20100616111843.GA15054-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2010-06-16 15:08 ` Oren Laadan
2010-06-16 17:46 ` Matt Helsley
2010-06-16 15:08 ` Oren Laadan
2010-06-16 17:46 ` Matt Helsley
[not found] ` <1274836063-13271-4-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-06-15 4:43 ` Oren Laadan
2010-06-16 11:18 ` Jamie Lokier
2010-06-16 14:52 ` Oren Laadan
2010-06-16 14:52 ` Oren Laadan
[not found] ` <4C18E534.80700-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-06-16 19:57 ` Sukadev Bhattiprolu
2010-06-16 19:57 ` Sukadev Bhattiprolu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100731003504.GA3544@us.ibm.com \
--to=sukadev@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=johnstul@us.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=matthltc@us.ibm.com \
--cc=orenl@cs.columbia.edu \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.