From: Jeff Garzik <jeff@garzik.org>
To: Pete Zaitcev <zaitcev@redhat.com>
Cc: Project Hail List <hail-devel@vger.kernel.org>
Subject: Re: [Patch 4/7] tabled: retry conflicting locks
Date: Wed, 20 Jan 2010 17:00:45 -0500 [thread overview]
Message-ID: <4B577D0D.4040908@garzik.org> (raw)
In-Reply-To: <20100120131635.64346caa@redhat.com>
On 01/20/2010 03:16 PM, Pete Zaitcev wrote:
> On Wed, 20 Jan 2010 14:53:17 -0500, Jeff Garzik<jeff@garzik.org> wrote:
>> On 01/14/2010 11:13 PM, Pete Zaitcev wrote:
>
>>> This problem was with us for a while, and even with this fix our start-up
>>> is not reliable. But at least we will not be 100% guaranteed to hang as
>>> before when restarting too quickly. So although the whole area needs some
>>> serious reworking, this specific case was just too annoying to let it
>>> continue.
>
>> This is not correct. CLD has blocking locks. You issue the LOCK op,
>> and will be notified when you have acquired the lock, possibly hours or
>> days later. There is no need to retry anything...
>
> Meanwhile, there's no way to cancel an outstanding lock requiest
> short of blowing off the whole session. I'll switch to LOCK when
> you fix that, but currently TRYLOCK is the only way (which BTW you
> use in cldcli too).
Do you mean cancelling someone else's lock request? That is not
something that meshes with the design. If you mean cancelling your own
lock request, that's probably reasonable.
But the entire logic behind LOCK is central to what needs to be done:
ensure one and only one session holds a lock, until the lock is released
or the client dies (thus forcing the server to time out and release the
dead session's locks).
If you are restarting quickly, a lock-timeout wait does not seem
unreasonable.
> N.B. ncld continues with this approach. In fact currectly it does not
> even have a method that performs a blocking lock.
That's definitely a problem, as blocking locks are pretty central to
CLD's design. If you want to own a resource, you get a blocking lock.
You only own the resource as long as the session is alive, and you have
not released the lock yourself. If you do not immediate acquire the
lock, (1) you should not access the shared resource as master, and (2)
you will be notified immediately when atomic lock acquisition occurs.
TRYLOCK is painful in the cloud because it encourages programmers, with
patch #4 being a perfect example, to create racy polling-lock solutions
where forward [lock] progress is not guaranteed. IOW, the lock-polling
loop should be in the server, with the client being asynchronously
notified of acquisition. TRYLOCK mainly exists for the less-common
situation of "if (!trylock) exit(0)" type of cloud client execution.
NFS and other protocols in this space have repeatedly shown that polling
locks is a painful, racy, byte-heavy solution for lock acquisition.
If there is a problem implementing blocking locks in the protocol or
client, let me know, and we'll fix it.
Jeff
next prev parent reply other threads:[~2010-01-20 22:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-15 4:13 [Patch 4/7] tabled: retry conflicting locks Pete Zaitcev
2010-01-20 19:53 ` Jeff Garzik
2010-01-20 20:16 ` Pete Zaitcev
2010-01-20 22:00 ` Jeff Garzik [this message]
2010-01-20 22:56 ` Pete Zaitcev
2010-02-03 23:10 ` Jeff Garzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B577D0D.4040908@garzik.org \
--to=jeff@garzik.org \
--cc=hail-devel@vger.kernel.org \
--cc=zaitcev@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.