* [GIT PULL] configfs updates for 5.9
@ 2020-08-03 14:07 Christoph Hellwig
2020-08-05 0:27 ` Linus Torvalds
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-08-03 14:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Joel Becker, linux-kernel
The following changes since commit 5714ee50bb4375bd586858ad800b1d9772847452:
copy_xstate_to_kernel: Fix typo which caused GDB regression (2020-07-19 17:09:10 -0700)
are available in the Git repository at:
git://git.infradead.org/users/hch/configfs.git tags/configfs-for-5.9
for you to fetch changes up to 059ccbfff8a826c0e7bfe5d69a55e1179672d8f8:
configfs: use flush file op to commit writes to a binary file (2020-07-20 15:00:08 +0200)
----------------------------------------------------------------
configfs update for 5.9
- propagate binary attribute errors (Lenny Szubowicz)
----------------------------------------------------------------
Lenny Szubowicz (1):
configfs: use flush file op to commit writes to a binary file
fs/configfs/file.c | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [GIT PULL] configfs updates for 5.9 2020-08-03 14:07 [GIT PULL] configfs updates for 5.9 Christoph Hellwig @ 2020-08-05 0:27 ` Linus Torvalds 2020-08-05 0:59 ` Linus Torvalds 0 siblings, 1 reply; 5+ messages in thread From: Linus Torvalds @ 2020-08-05 0:27 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Joel Becker, Linux Kernel Mailing List On Mon, Aug 3, 2020 at 7:07 AM Christoph Hellwig <hch@infradead.org> wrote: > > - propagate binary attribute errors (Lenny Szubowicz) I pulled this, but I then unpulled again. The commit is completely confused and wrong. In particular, this part: + /* Only commit the data if no more shared refs to file */ + if (file_count(file) > 1) + return 0; is bogus and prone to races, meaning that if there are multiple openers, *none* of them flush. You can have two threads that close() at (roughly) the same time, both call ->flush(), and both of those see "file_count()" being 2 and never do anything. They then both call "fput()" and are done, and that's when the file count gets decremented. The fact is, "flush()" is called for each close(), and you should flush any pending data. Checking for some file_count() thing is bogus and completely wrong. So make up your mind: either you flush synchronously at *every* close() (->flush) and you can get error notification. Or you flush at the last close (->release), but then you absolutely cannot get any error reporting, because the last put of the file may be long after the last close (because other things than just open/close can increase the refcount). You can't try to do some middle ground, because it _will_ be buggy. You have the above two options, not some racy third one. Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GIT PULL] configfs updates for 5.9 2020-08-05 0:27 ` Linus Torvalds @ 2020-08-05 0:59 ` Linus Torvalds 2020-08-05 8:28 ` Christoph Hellwig 0 siblings, 1 reply; 5+ messages in thread From: Linus Torvalds @ 2020-08-05 0:59 UTC (permalink / raw) To: Christoph Hellwig, Kai Mäkisara, James E.J. Bottomley Cc: Joel Becker, Linux Kernel Mailing List On Tue, Aug 4, 2020 at 5:27 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > The commit is completely confused and wrong. > > In particular, this part: > > + /* Only commit the data if no more shared refs to file */ > + if (file_count(file) > 1) > + return 0; > > is bogus and prone to races, meaning that if there are multiple > openers, *none* of them flush. Side note: this made me worry that this is some kind of pattern, so I started grepping. It isn't, thankfully. But I do notice that st_flush() has the same bug, probably going back to forever. In fact, that bogus use may be going back so far in time that it might even have been almost ok. Back when we had the big kernel lock, the fundamental race wasn't so fundamental, although it is still true that it could possibly have been mixed up by other non-file references (ie code sequences doing get_file() on it without ever doing a real open/close). I have to say, I cannot find it in myself to care about SCSI tapes and possible races with multiple concurrent openers or whatever. So I'm cc'ing the appropriate authorities, but I don't really expect the bogus code in st.c to go away. It _is_ very wrong, though, and I very much don't want to have new cases of that wrongness. For another example of where "file_count()" isn't reliable or useful at close() time, you can have any of mmap/io_uring/proc and probably a number of other cases do get_file/put_file to get a refcount on the file that can remain active even after the last user has actually closed it. And yeah, maybe mmap() doesn't end up being relevant for these file descriptors, but things like /proc/*/fdinfo/* are possible for all files, and do that refcount increase exactly to avoid races. So again, doing that if (file_count(file) > 1) kind of check is very very wrong even outside of the fundamental race with two close() calls at the same time. It is a very dangerous pattern, because it likely works in practice during testing, and looks like it might work. But it is completely and unfixably wrong. Again, the only reliable way to do that "last close" is "->release()", but you will never get errors from that, since (for all the same reasons) it might not even be done by a close. The last releaser of a file descriptor might be that mmap/io_uring/proc case now releasing the no longer used file, possibly long after the last "close()" call has happened. One acceptable half-way measure *may* be - do the flush with the above bogus test at ->flush() time, knowing it might never trigger at all - do the flush *again* at ->release() time, in case it didn't trigger - add a big comment to the flush-time case to show you understand the issue, and understand but I'd discourage it because of how unreliable it is. Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GIT PULL] configfs updates for 5.9 2020-08-05 0:59 ` Linus Torvalds @ 2020-08-05 8:28 ` Christoph Hellwig 2020-08-05 19:29 ` Linus Torvalds 0 siblings, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2020-08-05 8:28 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Kai M??kisara, James E.J. Bottomley, Joel Becker, Linux Kernel Mailing List, Lenny Szubowicz On Tue, Aug 04, 2020 at 05:59:19PM -0700, Linus Torvalds wrote: > So again, doing that > > if (file_count(file) > 1) > > kind of check is very very wrong even outside of the fundamental race > with two close() calls at the same time. > > It is a very dangerous pattern, because it likely works in practice > during testing, and looks like it might work. > > But it is completely and unfixably wrong. > > Again, the only reliable way to do that "last close" is "->release()", > but you will never get errors from that, since (for all the same > reasons) it might not even be done by a close. The last releaser of a > file descriptor might be that mmap/io_uring/proc case now releasing > the no longer used file, possibly long after the last "close()" call > has happened. > > One acceptable half-way measure *may* be > > - do the flush with the above bogus test at ->flush() time, knowing > it might never trigger at all > > - do the flush *again* at ->release() time, in case it didn't trigger > > - add a big comment to the flush-time case to show you understand the > issue, and understand > > but I'd discourage it because of how unreliable it is. Yes, but this is a fundamental problem with the commit on close model that the configfs binary attributes implement, and we have to work around that somehow. The current behavior is to entirely ignore errors, which of course has major issues. I guess the alternative might be to just commit on every ->flush as the use case (and yes I know people could abuse it) for the configfs files is a set of simple write syscalls. Then again the above might be better than that. Lenny, can you look ingo that? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GIT PULL] configfs updates for 5.9 2020-08-05 8:28 ` Christoph Hellwig @ 2020-08-05 19:29 ` Linus Torvalds 0 siblings, 0 replies; 5+ messages in thread From: Linus Torvalds @ 2020-08-05 19:29 UTC (permalink / raw) To: Christoph Hellwig Cc: Kai M??kisara, James E.J. Bottomley, Joel Becker, Linux Kernel Mailing List, Lenny Szubowicz On Wed, Aug 5, 2020 at 1:28 AM Christoph Hellwig <hch@infradead.org> wrote: > > Yes, but this is a fundamental problem with the commit on close > model that the configfs binary attributes implement, and we have to > work around that somehow. The current behavior is to entirely ignore > errors, which of course has major issues. Ignoring errors is better than randomly not doing the commit at all, which is what that patch does. > I guess the alternative might be to just commit on every ->flush That's how ->flush() is designed to be used, yes. It's also why it's named that way. It can obviously cause problems too, if you have multiple opens (ie dup/fork) and one file descriptor is closed while another is in the middle of a series of writes But honestly, if you do binary writes to sysfs, you should do them as one single write, and the flush function should take that mutex and then skip the flush if write_in_progress is true. Because in *that* case, it has serialized with other writers and we know that those other writers will flush when closing, and there's no race. You can still screw up by doing multiple independent writes and do s close() in between the writes, but at some point it becomes a "yeah, that's broken and nobody actually does that" issue. Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-08-05 19:29 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-08-03 14:07 [GIT PULL] configfs updates for 5.9 Christoph Hellwig 2020-08-05 0:27 ` Linus Torvalds 2020-08-05 0:59 ` Linus Torvalds 2020-08-05 8:28 ` Christoph Hellwig 2020-08-05 19:29 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox