All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Christian Brauner <brauner@kernel.org>
Cc: Daniel Vetter <daniel@ffwll.ch>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	keescook@chromium.org, axboe@kernel.dk, christian.koenig@amd.com,
	dri-devel@lists.freedesktop.org, io-uring@vger.kernel.org,
	jack@suse.cz, laura@labbott.name, linaro-mm-sig@lists.linaro.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-media@vger.kernel.org, minhquangbui99@gmail.com,
	sumit.semwal@linaro.org,
	syzbot+045b454ab35fd82a35fb@syzkaller.appspotmail.com,
	syzkaller-bugs@googlegroups.com
Subject: Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
Date: Tue, 7 May 2024 12:58:14 +0200	[thread overview]
Message-ID: <ZjoJRs0Svrb9ELDu@phenom.ffwll.local> (raw)
In-Reply-To: <20240506-zweisamkeit-zinsniveau-615a2e6d7c67@brauner>

On Mon, May 06, 2024 at 04:46:54PM +0200, Christian Brauner wrote:
> On Mon, May 06, 2024 at 02:47:23PM +0200, Daniel Vetter wrote:
> > On Sun, May 05, 2024 at 01:53:48PM -0700, Linus Torvalds wrote:
> > > On Sun, 5 May 2024 at 13:30, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > 0.      special-cased ->f_count rule for ->poll() is a wart and it's
> > > > better to get rid of it.
> > > >
> > > > 1.      fs/eventpoll.c is a steaming pile of shit and I'd be glad to see
> > > > git rm taken to it.  Short of that, by all means, let's grab reference
> > > > in there around the call of vfs_poll() (see (0)).
> > > 
> > > Agreed on 0/1.
> > > 
> > > > 2.      having ->poll() instances grab extra references to file passed
> > > > to them is not something that should be encouraged; there's a plenty
> > > > of potential problems, and "caller has it pinned, so we are fine with
> > > > grabbing extra refs" is nowhere near enough to eliminate those.
> > > 
> > > So it's not clear why you hate it so much, since those extra
> > > references are totally normal in all the other VFS paths.
> > > 
> > > I mean, they are perhaps not the *common* case, but we have a lot of
> > > random get_file() calls sprinkled around in various places when you
> > > end up passing a file descriptor off to some asynchronous operation
> > > thing.
> > > 
> > > Yeah, I think most of them tend to be special operations (eg the tty
> > > TIOCCONS ioctl to redirect the console), but it's not like vfs_ioctl()
> > > is *that* different from vfs_poll. Different operation, not somehow
> > > "one is more special than the other".
> > > 
> > > cachefiles and backing-file does it for regular IO, and drop it at IO
> > > completion - not that different from what dma-buf does. It's in
> > > ->read_iter() rather than ->poll(), but again: different operations,
> > > but not "one of them is somehow fundamentally different".
> > > 
> > > > 3.      dma-buf uses of get_file() are probably safe (epoll shite aside),
> > > > but they do look fishy.  That has nothing to do with epoll.
> > > 
> > > Now, what dma-buf basically seems to do is to avoid ref-counting its
> > > own fundamental data structure, and replaces that by refcounting the
> > > 'struct file' that *points* to it instead.
> > > 
> > > And it is a bit odd, but it actually makes some amount of sense,
> > > because then what it passes around is that file pointer (and it allows
> > > passing it around from user space *as* that file).
> > > 
> > > And honestly, if you look at why it then needs to add its refcount to
> > > it all, it actually makes sense.  dma-bufs have this notion of
> > > "fences" that are basically completion points for the asynchronous
> > > DMA. Doing a "poll()" operation will add a note to the fence to get
> > > that wakeup when it's done.
> > > 
> > > And yes, logically it takes a ref to the "struct dma_buf", but because
> > > of how the lifetime of the dma_buf is associated with the lifetime of
> > > the 'struct file', that then turns into taking a ref on the file.
> > > 
> > > Unusual? Yes. But not illogical. Not obviously broken. Tying the
> > > lifetime of the dma_buf to the lifetime of a file that is passed along
> > > makes _sense_ for that use.
> > > 
> > > I'm sure dma-bufs could add another level of refcounting on the
> > > 'struct dma_buf' itself, and not make it be 1:1 with the file, but
> > > it's not clear to me what the advantage would really be, or why it
> > > would be wrong to re-use a refcount that is already there.
> > 
> > So there is generally another refcount, because dma_buf is just the
> > cross-driver interface to some kind of real underlying buffer object from
> > the various graphics related subsystems we have.
> > 
> > And since it's a pure file based api thing that ceases to serve any
> > function once the fd/file is gone we tied all the dma_buf refcounting to
> > the refcount struct file already maintains. But the underlying buffer
> > object can easily outlive the dma_buf, and over the lifetime of an
> > underlying buffer object you might actually end up creating different
> > dma_buf api wrappers for it (but at least in drm we guarantee there's at
> > most one, hence why vmwgfx does the atomic_inc_unless_zero trick, which I
> > don't particularly like and isn't really needed).
> > 
> > But we could add another refcount, it just means we have 3 of those then
> > when only really 2 are needed.
> 
> Fwiw, the TTM thing described upthread and in the other thread really
> tries hard to work around the dma_buf == file lifetime choice by hooking
> into the dma-buf specific release function so it can access the dmabuf
> and then the file. All that seems like a pretty error prone thing to me.
> So a separate refcount for dma_buf wouldn't be the worst as that would
> allow that TTM thing to benefit and remove that nasty hacking into your
> generic dma_buf ops. But maybe I'm the only one who sees it that way and
> I'm certainly not familiar enough with dma-buf.

So the tricky part is the uniqueness requirement drm has for buffer
objects (and hence dma_buf wrappers), which together with the refcounting
makes dma_buf quite tricky:

- dma_buf needs to hold some reference onto the underlying object, or it
  wont work

- but you're not allowed to just create a new dma_buf every time someone
  exports an underlying object to a dma_buf, because that would break the
  uniqueness requirement. Which means the underlying object must also hold
  some kind of reference to its dma_buf, if it exists. So that on buffer
  export it can just increment the refcount for that and return it,
  instead of creating a new one.

Which would be a reference loop that never gets freed, so you need one of
two tricks:

- Either a weak reference, i.e. just a pointer plus
  atomic_inc_unless_zero trickery like ttm does. Splitting that refcount
  into more refcounts doesn't fundamentally solve the problem, it just
  adds even more refcounts.

- Or you do what all other drm drivers do in drm_prime.c do and careful
  clean up the dma_buf re-export cache when the userspace references (but
  not all kernel internal ones) disappear, to unbreak that reference loop.
  This needs to be done with extreme care and took a lot of screaming to
  get right, because if you have a race you might end up breaking the
  uniqueness requirement and have two dma_buf floating around.

So neither of these solutions really are simple, but I agree with you that
the atomic_inc_unless_zero trickery is less simple. It's definitely not
cool that it's done by digging around in struct file internals.
-Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2024-05-07 10:58 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-08  8:26 [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove syzbot
2024-04-15 14:31 ` Jens Axboe
2024-04-15 14:57   ` Pavel Begunkov
2024-05-03 11:54 ` Bui Quang Minh
2024-05-03 18:26   ` get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove) Kees Cook
2024-05-03 18:49     ` Jens Axboe
2024-05-03 19:22       ` Kees Cook
2024-05-03 19:35         ` Jens Axboe
2024-05-03 19:59           ` Kees Cook
2024-05-03 20:28             ` Kees Cook
2024-05-03 21:11               ` Al Viro
2024-05-03 21:24                 ` Linus Torvalds
2024-05-03 21:30                   ` Al Viro
2024-05-06 17:46                   ` Stefan Metzmacher
2024-05-06 18:17                     ` Linus Torvalds
2024-05-08  8:47                       ` David Laight
2024-05-03 21:36                 ` Al Viro
2024-05-03 21:42                   ` Linus Torvalds
2024-05-03 21:53                     ` Al Viro
2024-05-06 12:23                       ` Daniel Vetter
2024-05-04  9:59             ` Christian Brauner
2024-05-03 21:11     ` [PATCH] epoll: try to be a _bit_ better about file lifetimes Linus Torvalds
2024-05-03 21:24       ` Al Viro
2024-05-03 21:33         ` Linus Torvalds
2024-05-03 21:45           ` Al Viro
2024-05-03 21:52             ` Linus Torvalds
2024-05-03 22:01               ` Al Viro
2024-05-03 22:07                 ` Al Viro
2024-05-03 23:16                   ` Linus Torvalds
2024-05-03 23:39                     ` Al Viro
2024-05-03 23:54                       ` Linus Torvalds
2024-05-04 10:44                       ` Christian Brauner
2024-05-03 22:46               ` Kees Cook
2024-05-03 23:03                 ` Al Viro
2024-05-03 23:23                   ` Kees Cook
2024-05-03 23:41                     ` Linus Torvalds
2024-05-04  9:19                       ` Christian Brauner
2024-05-06 12:37                       ` Daniel Vetter
2024-05-04  9:37           ` Christian Brauner
2024-05-04 15:32             ` Linus Torvalds
2024-05-04 15:40               ` Linus Torvalds
2024-05-04 15:53                 ` Linus Torvalds
2024-05-05 19:46                   ` Al Viro
2024-05-05 20:03                     ` Linus Torvalds
2024-05-05 20:30                       ` Al Viro
2024-05-05 20:53                         ` Linus Torvalds
2024-05-06 12:47                           ` Daniel Vetter
2024-05-06 14:46                             ` Christian Brauner
2024-05-07 10:58                               ` Daniel Vetter [this message]
2024-05-06 16:15                           ` Christian König
2024-05-05 10:50                 ` Christian Brauner
2024-05-05 16:46                   ` Linus Torvalds
2024-05-05 17:55                     ` [PATCH v2] epoll: be " Linus Torvalds
2024-05-05 18:04                       ` Jens Axboe
2024-05-05 20:01                       ` David Laight
2024-05-05 20:16                         ` Linus Torvalds
2024-05-05 20:12                     ` [PATCH] epoll: try to be a _bit_ " Al Viro
2024-05-06  8:45                     ` Christian Brauner
2024-05-06  9:26                       ` Christian Brauner
2024-05-06 14:19                         ` Christian Brauner
2024-05-07 21:02                       ` David Laight
2024-05-04 18:20               ` Linus Torvalds
2024-05-06 14:29                 ` [Linaro-mm-sig] " Christian König
2024-05-07 11:02                   ` Daniel Vetter
2024-05-07 16:46                     ` Linus Torvalds
2024-05-07 17:45                       ` Christian König
2024-05-08  7:51                         ` Michel Dänzer
2024-05-08  7:59                           ` Christian König
2024-05-08  8:23                         ` Christian Brauner
2024-05-08  9:10                           ` Christian König
2024-05-07 18:04                       ` Daniel Vetter
2024-05-07 19:07                         ` Linus Torvalds
2024-05-08  5:55                           ` Christian König
2024-05-08  8:32                             ` Daniel Vetter
2024-05-08 10:16                               ` Christian Brauner
2024-05-08  8:05                           ` Christian Brauner
2024-05-08 16:19                           ` Linus Torvalds
2024-05-08 17:14                             ` Linus Torvalds
2024-05-09 11:38                               ` Christian Brauner
2024-05-09 15:48                                 ` Linus Torvalds
2024-05-10  6:33                                   ` Christian Brauner
2024-05-08 10:08                   ` Christian Brauner
2024-05-08 15:45                     ` Daniel Vetter
2024-05-10 10:55                       ` Christian Brauner
2024-05-11 18:25                         ` David Laight
2024-05-04  9:25         ` Hillf Danton
2024-05-05 17:31       ` Jens Axboe
2024-05-04  9:45     ` get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove) Christian Brauner
2024-05-04  3:23 ` [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove Hillf Danton
2024-05-04  3:46   ` [syzbot] [fs] " syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjoJRs0Svrb9ELDu@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=keescook@chromium.org \
    --cc=laura@labbott.name \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=minhquangbui99@gmail.com \
    --cc=sumit.semwal@linaro.org \
    --cc=syzbot+045b454ab35fd82a35fb@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.