[RFC] Re: broken userland ABI in configfs binary attributes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org,
	"Octavian Purdila" <octavian.purdila@intel.com>,
	"Pantelis Antoniou" <pantelis.antoniou@konsulko.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Kai Mäkisara" <Kai.Makisara@kolumbus.fi>,
	linux-scsi@vger.kernel.org
Subject: [RFC] Re: broken userland ABI in configfs binary attributes
Date: Mon, 26 Aug 2019 17:29:49 +0100	[thread overview]
Message-ID: <20190826162949.GA9980@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190826024838.GN1131@ZenIV.linux.org.uk>

On Mon, Aug 26, 2019 at 03:48:38AM +0100, Al Viro wrote:

> 	We might be able to paper over that mess by doing what /dev/st does -
> checking that file_count(file) == 1 in ->flush() instance and doing commit
> there in such case.  It's not entirely reliable, though, and it's definitely
> not something I'd like to see spreading.

	This "not entirely reliable" turns out to be an understatement.
If you have /proc/*/fdinfo/* being read from at the time of final close(2),
you'll get file_count(file) > 1 the last time ->flush() is called.  In other
words, we'd get the data not committed at all.

	And that problem is shared with /dev/st*, unfortunately ;-/
We could somewhat mitigate that by having fs/proc/fd.c:seq_show() call
->flush() before fput(), but that would still hide errors from close(2)
(and still have close(2) return before the data is flushed).

	read() on /proc/*/fdinfo/* does the following:

find the task_struct
grab its descriptor table, drop task_struct
lock the table, pick struct file out of it
bump struct file refcount, unlock the table
        seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\n",
                   (long long)file->f_pos, f_flags,
                   real_mount(file->f_path.mnt)->mnt_id);
        show_fd_locks(m, file, files);
        if (seq_has_overflowed(m))
                goto out;
        if (file->f_op->show_fdinfo)
                file->f_op->show_fdinfo(m, file);
drop the file reference (with fput()).

	Before "procfs: Convert /proc/pid/fdinfo/ handling routines to
seq-file v2" (in 2012), we did just snprintf() while under the lock on
descriptor table.  That commit moved the printf part from under the lock,
at the cost of grabbing and dropping file reference.  Shortly after that
"procfs: add ability to plug in auxiliary fdinfo providers" has added
->show_fdinfo() there, making it impossible to call under the descriptor
table lock - that method can block (and does so for eventpoll, idiotify,
etc.)

	We really want ->show_fdinfo() to happen before __fput() gets
anywhere near ->release().  And even the non-blocking cases can be too
costly to do under the descriptor table lock.  OTOH, it can very well be
done after or during ->flush(); the only problematic case right now
is /dev/st* that has its ->flush() do nothing in case if file_count(file)
is greater than 1.

	One kludgy way to handle that would be to have something like
FMODE_SUCKY_FLUSH that would have fs/proc/fd.c:seq_show() just do
the damn thing still under descriptor table lock and skip the rest
of it - /dev/st* has nothing in ->show_fdinfo(), and show_fd_locks()
is not too terribly costly.  Still best avoided in default case,
but...

	Another possibility is to have a secondary counter, with
__fput() waiting for it to go down to zero and fdinfo reads bumping
(and then dropping) that instead of the primary counter.  Not sure
which approach is better - adding extra logics in __fput() for the
sake of one (and not terribly common) device is not nice, but another
variant really is an ugly kludge ;-/  OTOH, this kind of "take
a secondary reference, ->release() will block until you drop it"
interface can breed deadlocks; procfs situation, AFAICS, allows to
use it safely, but it's begging to be abused...

	Ideas?  I don't like either approach, to put it very mildly,
so any cleaner suggestions would be very welcome.

PS: just dropping the check in st_flush() is probably a bad idea -
as it is, it can't overlap with st_write() and after such change it
will...

next prev parent reply	other threads:[~2019-08-26 16:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26  2:48 broken userland ABI in configfs binary attributes Al Viro
2019-08-26 16:29 ` Al Viro [this message]
2019-08-26 18:20   ` [RFC] " Matthew Wilcox
2019-08-26 19:28     ` Al Viro
2019-08-27  8:51       ` Miklos Szeredi
2019-08-27 11:58         ` Al Viro
2019-08-27 12:21           ` Miklos Szeredi
2019-08-27 12:53             ` Al Viro
2019-08-31  8:32       ` Christoph Hellwig
2019-08-31 13:35         ` Al Viro
2019-08-31 14:44           ` Christoph Hellwig
2019-08-31 15:58             ` Al Viro
2019-08-26 18:34   ` "Kai Mäkisara (Kolumbus)"
2019-08-26 19:32     ` Al Viro
2019-08-27 15:01       ` Boaz Harrosh
2019-08-27 17:27         ` Al Viro
2019-08-27 17:59           ` Boaz Harrosh
2019-08-29 22:22           ` Al Viro
2019-08-29 23:32             ` Al Viro
2019-08-30  4:10             ` Dave Chinner
2019-08-30  4:44               ` Al Viro
2019-08-31  8:28                 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190826162949.GA9980@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=Kai.Makisara@kolumbus.fi \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=octavian.purdila@intel.com \
    --cc=pantelis.antoniou@konsulko.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.