* Re: Fw: Spam: [Bugme-new] [Bug 2829] New: posix_locks_deadlock() loops infinitely
@ 2004-06-05 7:25 Stephen Rothwell
2004-06-06 3:04 ` Stephen Rothwell
0 siblings, 1 reply; 7+ messages in thread
From: Stephen Rothwell @ 2004-06-05 7:25 UTC (permalink / raw)
To: sfr, trond.myklebust; +Cc: akpm, linux-fsdevel, willy
From: Trond Myklebust <trond.myklebust@fys.uio.no>
> På fr , 04/06/2004 klokka 23:29, skreiv Stephen Rothwell:
> >
> > I used to think so, but Andrew Tridgell convinced me that since it
> > actually returns false positives and false negatives in the presence of
> > threads, all we are really doing is trying to paper over user mode
> > programming errors ...
>
> So you are saying that extending a lock is basically a programming
> error?
No, I am saying that (given POSIX semnatics) if you have a set of
cooperating processes or threads that use POSIX advisory file locks for
resource management (or synchronization) (and any set of processing
using file locks had better be cooperating) and you get a deadlock,
then you have a programming or design error. Especially if you depend
on the OS to detect that deadlock as POSIX says that the OS does not
have to do the detection.
So any correctly written program that suspects (for some unforseen
reason) that its use of POSIX file locking may cause deadlock, MUST be
able to cope if the OS dies not detect that deadlock anyway. If it
deosn't, then there are some OS platforms that are POSIX compliant on
which that program will deadlock and it is not the OS platform's
fault/problem.
I am saying that for the sake of less complicated kernel code Linux
should become one such platform.
At the moment we return false positives and false negatives (and
apparently freeze solid under some circumstances), so we would be much
better off not trying to do something that is possibly impossible for
us to do anyway.
Besides which, the POSIX spec does not even talk about the interaction
of POSIX files locks with POSIX thread as far as I can tell, so we are
basically making up the semantics ...
Cheers,
Stephen Rothwell
P.S. If you want the longer version, ask Tridge how sambs copes with
doing locking ...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fw: Spam: [Bugme-new] [Bug 2829] New: posix_locks_deadlock() loops infinitely
2004-06-05 7:25 Fw: Spam: [Bugme-new] [Bug 2829] New: posix_locks_deadlock() loops infinitely Stephen Rothwell
@ 2004-06-06 3:04 ` Stephen Rothwell
2004-06-06 13:27 ` Killing POSIX deadlock detection Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Stephen Rothwell @ 2004-06-06 3:04 UTC (permalink / raw)
To: trond.myklebust; +Cc: akpm, linux-fsdevel, willy
[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]
On Sat, 5 Jun 2004 17:25:03 +1000 (EST) Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> At the moment we return false positives and false negatives (and
> apparently freeze solid under some circumstances), so we would be much
> better off not trying to do something that is possibly impossible for
> us to do anyway.
>
> Besides which, the POSIX spec does not even talk about the interaction
> of POSIX files locks with POSIX thread as far as I can tell, so we are
> basically making up the semantics ...
Here's my (contrived) example:
Process P1 contains threads T1 and T2
Process P2
I am using "process id" and "thread id" in the POSIX sense. These are
exclusive, whole file locks for simplicity.
T1 locks file F1 -> lock (P1, F1)
P2 locks file F2 -> lock (P2, F2)
P2 locks file F1 -> blocks against (P1, F1)
T1 locks file F2 -> blocks against (P2, F2)
Is this deadlocked? If you use "process id" as the lock owner, then maybe.
Note that T2 is allowed to remove lock (P1, F1) ...
So if the OS said "deadlocked" then it may have given a false positive ...
If the OS didn't say "deadlocked" then it may have given a false negative ...
Whether there is a deadlock, is entirely up to the behaviour of the
application P1 and the OS cannot predict that ...
Of course, if you use "thread id" as the lock owner, the this is
definitely a deadlock.
Currently Linux (2.6) and Solaris(2.8) report deadlock when T1 tries to
lock F2.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Killing POSIX deadlock detection
2004-06-06 3:04 ` Stephen Rothwell
@ 2004-06-06 13:27 ` Matthew Wilcox
2004-06-06 19:49 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2004-06-06 13:27 UTC (permalink / raw)
To: Stephen Rothwell
Cc: trond.myklebust, akpm, linux-fsdevel, willy, linux-kernel
On Sun, Jun 06, 2004 at 01:04:22PM +1000, Stephen Rothwell wrote:
> Here's my (contrived) example:
>
> Process P1 contains threads T1 and T2
> Process P2
>
> I am using "process id" and "thread id" in the POSIX sense. These are
> exclusive, whole file locks for simplicity.
>
> T1 locks file F1 -> lock (P1, F1)
> P2 locks file F2 -> lock (P2, F2)
> P2 locks file F1 -> blocks against (P1, F1)
> T1 locks file F2 -> blocks against (P2, F2)
Less contrived example -- T2 locks file F2. We report deadlock here too,
even though T1 is about to unlock file F1.
I pointed this out over a year ago when NPTL first went in and nobody
seemed interested in having the discussion then. All I got was a private
reply from Andi Kleen suggesting that we shouldn't remove it.
So, final call. Any objections to never returning -EDEADLCK?
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Killing POSIX deadlock detection
@ 2004-06-06 15:24 Lever, Charles
0 siblings, 0 replies; 7+ messages in thread
From: Lever, Charles @ 2004-06-06 15:24 UTC (permalink / raw)
To: Matthew Wilcox, Stephen Rothwell
Cc: trond.myklebust, akpm, linux-fsdevel, linux-kernel
> So, final call. Any objections to never returning -EDEADLCK?
not an objection, but a consideration.
is this a change that belongs in 2.6? it does significantly change the
behavior of the system call API, and could "break" applications.
unless this fixes a significant bug, perhaps it should wait for 2.7?
that would give fair warning to developers who need to fix their broken
programs.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Killing POSIX deadlock detection
2004-06-06 13:27 ` Killing POSIX deadlock detection Matthew Wilcox
@ 2004-06-06 19:49 ` Trond Myklebust
2004-06-06 20:09 ` Eric W. Biederman
0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2004-06-06 19:49 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Stephen Rothwell, Andrew Morton, linux-fsdevel, linux-kernel
På su , 06/06/2004 klokka 09:27, skreiv Matthew Wilcox:
\
> > T1 locks file F1 -> lock (P1, F1)
> > P2 locks file F2 -> lock (P2, F2)
> > P2 locks file F1 -> blocks against (P1, F1)
> > T1 locks file F2 -> blocks against (P2, F2)
>
> Less contrived example -- T2 locks file F2. We report deadlock here too,
> even though T1 is about to unlock file F1.
So what is better: report an error and give the user a chance to
recover, or allowing the potential deadlock?
Only the user can resolve problems such as the above threaded problem,
given the SuS definitions.
> So, final call. Any objections to never returning -EDEADLCK?
Yes: As Chuck points out, that is a fairly nasty change of the userland
API.
Worse: it is a change that fixes only one problem for only a minority of
users (those that combine locking over multiple NPTL threads - a
situation which after the "fix" remains just as poorly defined) at the
expense of reintroducing a series of deadlocking problems for those
single threaded users that rely on the EDEADLK (and have done so
throughout the entire 2.4.x series).
Finally, EDEADLK does actually appear to be mandatory to implement in
SUSv3, given that it states:
A potential for deadlock occurs if a process controlling a
locked region is put to sleep by attempting to lock another
process' locked region. If the system detects that sleeping
until a locked region is unlocked would cause a deadlock,
fcntl() shall fail with an [EDEADLK] error.
(again see
http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html)
Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Killing POSIX deadlock detection
2004-06-06 19:49 ` Trond Myklebust
@ 2004-06-06 20:09 ` Eric W. Biederman
2004-06-06 20:52 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2004-06-06 20:09 UTC (permalink / raw)
To: Trond Myklebust
Cc: Matthew Wilcox, Stephen Rothwell, Andrew Morton, linux-fsdevel,
linux-kernel
Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> På su , 06/06/2004 klokka 09:27, skreiv Matthew Wilcox:
> \
> > > T1 locks file F1 -> lock (P1, F1)
> > > P2 locks file F2 -> lock (P2, F2)
> > > P2 locks file F1 -> blocks against (P1, F1)
> > > T1 locks file F2 -> blocks against (P2, F2)
> >
> > Less contrived example -- T2 locks file F2. We report deadlock here too,
> > even though T1 is about to unlock file F1.
There is a fairly sane linux specific definition here. We should
track these things not by pid or tid, but by struct files_struct.
> So what is better: report an error and give the user a chance to
> recover, or allowing the potential deadlock?
Reading the SUS definition below we should only report a deadlock when
it is certain.
For multiple processes with the same set of file descriptors open
that is an interesting graph problem. Unless there is nothing
another process can do, to remove the deadlock situation.
> Only the user can resolve problems such as the above threaded problem,
> given the SuS definitions.
>
> > So, final call. Any objections to never returning -EDEADLCK?
>
> Yes: As Chuck points out, that is a fairly nasty change of the userland
> API.
???? Failing to detect a deadlock is not a change in the API.
It is simply a change in behavior.
> Worse: it is a change that fixes only one problem for only a minority of
> users (those that combine locking over multiple NPTL threads - a
> situation which after the "fix" remains just as poorly defined) at the
> expense of reintroducing a series of deadlocking problems for those
> single threaded users that rely on the EDEADLK (and have done so
> throughout the entire 2.4.x series).
Relying on EDEADLK is broken. That is about as bad as relying on
getting -EACCESS instead of SIGSEGV.
Detecting deadlocks is certainly a quality of implementation issue.
But unless my memory is shaky detecting deadlocks is a hard problem.
Perhaps what we should do is simply not attempt to detect deadlocks
involving threaded processes.
With threads the problems escalates from one of cycle detection
to something fairly weird.
> Finally, EDEADLK does actually appear to be mandatory to implement in
> SUSv3, given that it states:
>
> A potential for deadlock occurs if a process controlling a
> locked region is put to sleep by attempting to lock another
> process' locked region. If the system detects that sleeping
> until a locked region is unlocked would cause a deadlock,
> fcntl() shall fail with an [EDEADLK] error.
>
> (again see
> http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html)
Hmm. I don't see that the system is required to detect a deadlock.
Just what it does after it has detected one.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Killing POSIX deadlock detection
2004-06-06 20:09 ` Eric W. Biederman
@ 2004-06-06 20:52 ` Trond Myklebust
0 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2004-06-06 20:52 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Matthew Wilcox, Stephen Rothwell, Andrew Morton, linux-fsdevel,
linux-kernel
På su , 06/06/2004 klokka 16:09, skreiv Eric W. Biederman:
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>
> > På su , 06/06/2004 klokka 09:27, skreiv Matthew Wilcox:
> > \
> > > > T1 locks file F1 -> lock (P1, F1)
> > > > P2 locks file F2 -> lock (P2, F2)
> > > > P2 locks file F1 -> blocks against (P1, F1)
> > > > T1 locks file F2 -> blocks against (P2, F2)
> > >
> > > Less contrived example -- T2 locks file F2. We report deadlock here too,
> > > even though T1 is about to unlock file F1.
>
> There is a fairly sane linux specific definition here. We should
> track these things not by pid or tid, but by struct files_struct.
RTFC... Look carefully in fs/locks.c at stuff like posix_same_owner().
We currently use both the tgid and the struct files_struct (although
there are a few notable bugs where we only check the one or the
other)...
That is, however, a definition which breaks the SUS standards, and it
therefore ends up introducing pathologies such as the steal_locks crap.
struct files_struct is NOT a sane basis for tracking locks.
> > Yes: As Chuck points out, that is a fairly nasty change of the userland
> > API.
>
> ???? Failing to detect a deadlock is not a change in the API.
> It is simply a change in behavior.
It is a change in functionality from one where potential deadlocks are
detected and reported as errors to one where deadlocks are suddenly
possible. Are you saying that functionality is not a part of the API?
> Perhaps what we should do is simply not attempt to detect deadlocks
> involving threaded processes.
So how do you define (and detect) a threaded process?
Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-06-06 20:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-05 7:25 Fw: Spam: [Bugme-new] [Bug 2829] New: posix_locks_deadlock() loops infinitely Stephen Rothwell
2004-06-06 3:04 ` Stephen Rothwell
2004-06-06 13:27 ` Killing POSIX deadlock detection Matthew Wilcox
2004-06-06 19:49 ` Trond Myklebust
2004-06-06 20:09 ` Eric W. Biederman
2004-06-06 20:52 ` Trond Myklebust
-- strict thread matches above, loose matches on Subject: below --
2004-06-06 15:24 Lever, Charles
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).