public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Q] [VFS] NULL f_dentry for opened files ; possible race condition
@ 2001-08-31 17:18 Jean-Marc Saffroy
  2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Jean-Marc Saffroy @ 2001-08-31 17:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jean-Marc Saffroy

Hello,


In 2.4.9, I have encountered a strange condition while playing with file
structs chained on a superblock list (sb->s_files) : some of them can have
a NULL f_dentry pointer. The only case I found which can cause this is
when fput is called and f_count drops to zero. Is that the only case ?

While exploring the corresponding code for an explanation, I found what
looks like a possible race condition : do_remount_sb calls
fs_may_remount_ro, and only then uses lock_super to do the actual remount.

Isn't it possible for a program to open a file for writing just after
fs_may_remount_ro ? The whole thing seems to be protected by the BKL and
mount_sem, but I guess it won't stop an open.


Regards,

-- 
Jean-Marc Saffroy - Research Engineer - Silicomp Research Institute
mailto:saffroy@ri.silicomp.fr


^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [RFD] readonly/read-write semantics
@ 2001-08-31 23:35 Bryan Henderson
  2001-09-01  4:23 ` Alexander Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Bryan Henderson @ 2001-08-31 23:35 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-fsdevel-owner, linux-kernel,
	Jean-Marc Saffroy, Linus Torvalds


1) I want to see files open for write have nothing to do with it.  Unix
open/close is not a transaction, it's just a connection.  Some applications
manage to use open/close as a transaction, but we're seeing less and less
of that as more sophisticated facilities for transactions become available.

How many times have we all been frustrated trying to remount read only when
some log file that hasn't been written to for hours is open for write?

A file write is in progress when a write() system call hasn't returned, not
when the file is open for write.

Someone who wants to coordinate his mounts with the applications that use
them should use an external locking scheme.

2) I'd like to see a readonly mount state defined as "the filesystem will
not change.  Period."  Not for system calls in progress, not for cache
synchronization, not to set an "unmounted" flag, not for writes that are
queued in the device driver or device.  (That last one may stretch
feasability, but it's a worthy goal anyway).

3) A system call to put a mount into readonly state should not return until
all writes in progress have completed out to the medium, and the cache is
clean.  It should sync the cache, of course, and do whatever closing of the
filesystem an unmount would do.  Any attempt to start a new write during
this wait (which constitutes another mount state) should fail.

I was thinking an option to fail immediately instead of waiting for writes
to complete might be useful, but then I couldn't think of any write in
progress that would take enough time to make it worthwhile.  As long as any
new system call counts as a new write.

The same thinking applies to an option to kill writes in progress without
waiting.  Unless maybe it means to skip the cache synchronization.

4) I don't think it has any semantic relevance, but as part of this, I'd
also like to see the FS implementation stop considering read only mount
status to be a file permission issue.  (Today, it does in some places, but
doesn't in others).

I don't know enough about how filesystem drivers use the "readonly" state
today for damage control when errors happen, so I won't give an opinion on
that.  But it sounds like it would probably be that quiescing state I
mentioned in (3), not the readonly state I mentioned in (2).



^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [RFD] readonly/read-write semantics
@ 2001-09-01  0:38 Bryan Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Bryan Henderson @ 2001-09-01  0:38 UTC (permalink / raw)
  To: Goswin Brederlow
  Cc: Goswin Brederlow, linux-fsdevel, linux-fsdevel-owner,
	linux-kernel, Jean-Marc Saffroy, Linus Torvalds, Alexander Viro


>> 2) I'd like to see a readonly mount state defined as "the filesystem
will
>> not change.  Period."  Not for system calls in progress, not for cache
>> synchronization, not to set an "unmounted" flag, not for writes that are
>> queued in the device driver or device...
>Thats what readonly does now, isn't it?

No, it's much more sloppy than that today.  The mount is slammed into the
readonly state and writes in progress continue to modify the filesystem
medium.  The remount system call does a weak check to see that there is no
writing going on before proceeding to set the readonly flag on.  A mkdir in
progress continues to complete.  A properly timed open for write could even
slip in.  The the opener can write away while the mount is in "readonly"
state.  And finally, Al points out that a filesystem driver that gets
scared by some consistency error may stamp its foot and declare "readonly"
state right under the nose of of open-for-write file descriptors, which
allow continued writing.

>But you want that also to be
>when the filesystem is writeable but not being written to at the
>moment, i.e. if its in a consistent state or "clean".

Well, no I don't.  That's another state.  And the mount goes in and out of
that state all the time without any help from us, so the only issue would
be detecting the state to make it useful.  I used to think Unix filesystems
always did that, but I don't seem to be able to get an ext2 filesystem
(mounted r/w) into no-fsck-required state without actually unmounting it.



^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [RFD] readonly/read-write semantics
@ 2001-09-04 18:39 Bryan Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Bryan Henderson @ 2001-09-04 18:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-fsdevel, linux-kernel, Jean-Marc Saffroy, Linus Torvalds,
	Alexander Viro


>Okay, make the definition
>
>"this kernel will not attempt to change anything on that filesystem".

Not quite.  As mentioned a little earlier, you need to add "through this
mount."  It's a state of the mount, not the kernel image.

It's a good point, though, that we shouldn't kid ourselves that having a
mount in read-only state means the filesystem is read-only.  In the cases
where it does, though, it's an especially useful state.





^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [RFD] readonly/read-write semantics
@ 2001-09-04 19:50 Bryan Henderson
  2001-09-05  2:15 ` Alexander Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Bryan Henderson @ 2001-09-04 19:50 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Jean-Marc Saffroy, Linus Torvalds


>>
>> 1) I want to see files open for write have nothing to do with it.  Unix
>> open/close is not a transaction, it's just a connection.  Some
applications
>> manage to use open/close as a transaction, but we're seeing less and
less
>> of that as more sophisticated facilities for transactions become
available.
>>
>> How many times have we all been frustrated trying to remount read only
when
>> some log file that hasn't been written to for hours is open for write?
>>
>> A file write is in progress when a write() system call hasn't returned,
not
>> when the file is open for write.
>
>Uh-oh...  How about shared mappings?

It's always shared mappings, isn't it?  :-)

Virtual memory access to the file is even easier, though.  A write in
progress is an individual store to virtual memory.  The only way you could
even see it is if a page fault is in progress.  So the most you would need
to wait for in going into the hard "read only" state I defined is for any
page I/O to complete.  And for the "no new writes" state, you just write
protect all the pages (and any new ones that fault in too).




^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [RFD] readonly/read-write semantics
@ 2001-09-05  2:34 Bryan Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Bryan Henderson @ 2001-09-05  2:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Jean-Marc Saffroy, Linus Torvalds


>So the most you would need
>> to wait for in going into the hard "read only" state I defined is for
any
>> page I/O to complete.  And for the "no new writes" state, you just write
>> protect all the pages (and any new ones that fault in too).
>
>It's not that simple.  At the very least you need an equivalent of msync()
>on each of these mappings before you can do anything of that kind.

I agree.  An ordinary remount shouldn't immediately go into hard readonly
state.  It should spend some time in no-new-writes state, during which it
flushes buffered writes, and I include in that dirty VM mapped pages, and
closes the filesystem.

My most basic point underlying all this, though, is that it should _not_
wait for all the files open for write to close (or fail because because
they haven't).

I thought there were also emergency cases where the filesystem driver
didn't want any more writing going on for fear of causing more damage.
That's why I mentioned the case where you might want to go straight to hard
readonly state.

>BTW, for real fun think of the situation when you have one of the swap
>components in a regular file on your filesystem.  Do you seriously want
>do_remount() to do automagical swapoff(2) on relevant swap components?

There are all kinds of ways I can shoot myself in the foot by making a
mount readonly that I really want to be writing through.  Is this one
special?

>IMO it's a userland job.

Sounds right to me.  We weren't going to talk about implementation yet,
though.  For starters, it would just be nice to agree what MS_RDONLY means
(and perhaps a few other similar flags).



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2001-09-05  2:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-08-31 17:18 [Q] [VFS] NULL f_dentry for opened files ; possible race condition Jean-Marc Saffroy
2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
2001-09-01 13:08   ` Juan Quintela
2001-09-04  1:16   ` Jean-Marc Saffroy
2001-09-04  4:00     ` [PATCH] " Alexander Viro
  -- strict thread matches above, loose matches on Subject: below --
2001-08-31 23:35 Bryan Henderson
2001-09-01  4:23 ` Alexander Viro
2001-09-01 14:42   ` Ingo Oeser
2001-09-01 16:44     ` Alexander Viro
2001-09-01 17:13       ` Nicholas Knight
2001-09-04  2:07       ` Jean-Marc Saffroy
2001-09-04  4:09         ` Alexander Viro
2001-09-04  9:26           ` Xavier Bestel
2001-09-04 10:15             ` Alexander Viro
2001-09-04 10:20               ` Xavier Bestel
2001-09-04 10:28                 ` Alexander Viro
2001-09-04 10:59                   ` Xavier Bestel
2001-09-04 11:29                     ` Alexander Viro
2001-09-04 17:03               ` Pavel Machek
2001-09-04 16:58   ` Pavel Machek
2001-09-01  0:38 Bryan Henderson
2001-09-04 18:39 Bryan Henderson
2001-09-04 19:50 Bryan Henderson
2001-09-05  2:15 ` Alexander Viro
2001-09-05  2:34 Bryan Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox