public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] configfs updates for 5.9
@ 2020-08-03 14:07 Christoph Hellwig
  2020-08-05  0:27 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-08-03 14:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joel Becker, linux-kernel

The following changes since commit 5714ee50bb4375bd586858ad800b1d9772847452:

  copy_xstate_to_kernel: Fix typo which caused GDB regression (2020-07-19 17:09:10 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/configfs.git tags/configfs-for-5.9

for you to fetch changes up to 059ccbfff8a826c0e7bfe5d69a55e1179672d8f8:

  configfs: use flush file op to commit writes to a binary file (2020-07-20 15:00:08 +0200)

----------------------------------------------------------------
configfs update for 5.9

 - propagate binary attribute errors (Lenny Szubowicz)

----------------------------------------------------------------
Lenny Szubowicz (1):
      configfs: use flush file op to commit writes to a binary file

 fs/configfs/file.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] configfs updates for 5.9
  2020-08-03 14:07 [GIT PULL] configfs updates for 5.9 Christoph Hellwig
@ 2020-08-05  0:27 ` Linus Torvalds
  2020-08-05  0:59   ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2020-08-05  0:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Joel Becker, Linux Kernel Mailing List

On Mon, Aug 3, 2020 at 7:07 AM Christoph Hellwig <hch@infradead.org> wrote:
>
>  - propagate binary attribute errors (Lenny Szubowicz)

I pulled this, but I then unpulled again.

The commit is completely confused and wrong.

In particular, this part:

+       /* Only commit the data if no more shared refs to file */
+       if (file_count(file) > 1)
+               return 0;

is bogus and prone to races, meaning that if there are multiple
openers, *none* of them flush.

You can have two threads that close() at (roughly) the same time, both
call ->flush(), and both of those see "file_count()" being 2 and never
do anything.

They then both call "fput()" and are done, and that's when the file
count gets decremented.

The fact is, "flush()" is called for each close(), and you should
flush any pending data. Checking for some file_count() thing is bogus
and completely wrong.

So make up your mind: either you flush synchronously at *every*
close() (->flush) and you can get error notification.

Or you flush at the last close (->release), but then you absolutely
cannot get any error reporting, because the last put of the file may
be long after the last close (because other things than just
open/close can increase the refcount).

You can't try to do some middle ground, because it _will_ be buggy.
You have the above two options, not some racy third one.

             Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] configfs updates for 5.9
  2020-08-05  0:27 ` Linus Torvalds
@ 2020-08-05  0:59   ` Linus Torvalds
  2020-08-05  8:28     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2020-08-05  0:59 UTC (permalink / raw)
  To: Christoph Hellwig, Kai Mäkisara, James E.J. Bottomley
  Cc: Joel Becker, Linux Kernel Mailing List

On Tue, Aug 4, 2020 at 5:27 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The commit is completely confused and wrong.
>
> In particular, this part:
>
> +       /* Only commit the data if no more shared refs to file */
> +       if (file_count(file) > 1)
> +               return 0;
>
> is bogus and prone to races, meaning that if there are multiple
> openers, *none* of them flush.

Side note: this made me worry that this is some kind of pattern, so I
started grepping.

It isn't, thankfully. But I do notice that st_flush() has the same
bug, probably going back to forever.

In fact, that bogus use may be going back so far in time that it might
even have been almost ok. Back when we had the big kernel lock, the
fundamental race wasn't so fundamental, although it is still true that
it could possibly have been mixed up by other non-file references (ie
code sequences doing get_file() on it without ever doing a real
open/close).

I have to say, I cannot find it in myself to care about SCSI tapes and
possible races with multiple concurrent openers or whatever.

So I'm cc'ing the appropriate authorities, but I don't really expect
the bogus code in st.c to go away.

It _is_ very wrong, though, and I very much don't want to have new
cases of that wrongness.

For another example of where "file_count()" isn't reliable or useful
at close() time, you can have any of mmap/io_uring/proc and probably a
number of other cases do get_file/put_file to get a refcount on the
file that can remain active even after the last user has actually
closed it.

And yeah, maybe mmap() doesn't end up being relevant for these file
descriptors, but things like /proc/*/fdinfo/* are possible for all
files, and do that refcount increase exactly to avoid races.

So again, doing that

        if (file_count(file) > 1)

kind of check is very very wrong even outside of the fundamental race
with two close() calls at the same time.

It is a very dangerous pattern, because it likely works in practice
during testing, and looks like it might work.

But it is completely and unfixably wrong.

Again, the only reliable way to do that "last close" is "->release()",
but you will never get errors from that, since (for all the same
reasons) it might not even be done by a close. The last releaser of a
file descriptor might be that mmap/io_uring/proc case now releasing
the no longer used file, possibly long after the last "close()" call
has happened.

One acceptable half-way measure *may* be

 - do the flush with the above bogus test at ->flush() time, knowing
it might never trigger at all

 - do the flush *again* at ->release() time, in case it didn't trigger

 - add a big comment to the flush-time case to show you understand the
issue, and understand

but I'd discourage it because of how unreliable it is.

              Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] configfs updates for 5.9
  2020-08-05  0:59   ` Linus Torvalds
@ 2020-08-05  8:28     ` Christoph Hellwig
  2020-08-05 19:29       ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-08-05  8:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Kai M??kisara, James E.J. Bottomley,
	Joel Becker, Linux Kernel Mailing List, Lenny Szubowicz

On Tue, Aug 04, 2020 at 05:59:19PM -0700, Linus Torvalds wrote:
> So again, doing that
> 
>         if (file_count(file) > 1)
> 
> kind of check is very very wrong even outside of the fundamental race
> with two close() calls at the same time.
> 
> It is a very dangerous pattern, because it likely works in practice
> during testing, and looks like it might work.
> 
> But it is completely and unfixably wrong.
> 
> Again, the only reliable way to do that "last close" is "->release()",
> but you will never get errors from that, since (for all the same
> reasons) it might not even be done by a close. The last releaser of a
> file descriptor might be that mmap/io_uring/proc case now releasing
> the no longer used file, possibly long after the last "close()" call
> has happened.
> 
> One acceptable half-way measure *may* be
> 
>  - do the flush with the above bogus test at ->flush() time, knowing
> it might never trigger at all
> 
>  - do the flush *again* at ->release() time, in case it didn't trigger
> 
>  - add a big comment to the flush-time case to show you understand the
> issue, and understand
> 
> but I'd discourage it because of how unreliable it is.

Yes, but this is a fundamental problem with the commit on close
model that the configfs binary attributes implement, and we have to
work around that somehow.  The current behavior is to entirely ignore
errors, which of course has major issues.  I guess the alternative might
be to just commit on every ->flush as the use case (and yes I know
people could abuse it) for the configfs files is a set of simple
write syscalls.  Then again the above might be better than that.

Lenny, can you look ingo that?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] configfs updates for 5.9
  2020-08-05  8:28     ` Christoph Hellwig
@ 2020-08-05 19:29       ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2020-08-05 19:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kai M??kisara, James E.J. Bottomley, Joel Becker,
	Linux Kernel Mailing List, Lenny Szubowicz

On Wed, Aug 5, 2020 at 1:28 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> Yes, but this is a fundamental problem with the commit on close
> model that the configfs binary attributes implement, and we have to
> work around that somehow.  The current behavior is to entirely ignore
> errors, which of course has major issues.

Ignoring errors is better than randomly not doing the commit at all,
which is what that patch does.

> I guess the alternative might be to just commit on every ->flush

That's how ->flush() is designed to be used, yes. It's also why it's
named that way.

It can obviously cause problems too, if you have multiple opens (ie
dup/fork) and one file descriptor is closed while another is in the
middle of a series of writes

But honestly, if you do binary writes to sysfs, you should do them as
one single write, and the flush function should take that mutex and
then skip the flush if write_in_progress is true.

Because in *that* case, it has serialized with other writers and we
know that those other writers will flush when closing, and there's no
race.

You can still screw up by doing multiple independent writes and do s
close() in between the writes, but at some point it becomes a "yeah,
that's broken and nobody actually does that" issue.

                Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-05 19:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-03 14:07 [GIT PULL] configfs updates for 5.9 Christoph Hellwig
2020-08-05  0:27 ` Linus Torvalds
2020-08-05  0:59   ` Linus Torvalds
2020-08-05  8:28     ` Christoph Hellwig
2020-08-05 19:29       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox