netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>, LKP <lkp@01.org>
Subject: Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
Date: Fri, 29 Jun 2018 00:37:21 +0100	[thread overview]
Message-ID: <20180628233720.GN30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFz1PQ86=6xEFM9ajZRoKF7NyAQmf8Gd8qnOOguFieWWbA@mail.gmail.com>

On Thu, Jun 28, 2018 at 03:55:35PM -0700, Linus Torvalds wrote:
> > You are misreading that mess.  What he's trying to do (other than surviving
> > the awful clusterfuck around cancels) is to handle the decision what to
> > report to userland right in the wakeup callback.  *That* is what really
> > drives the "make the second-pass ->poll() or something similar to it
> > non-blocking" (in addition to the fact that it is such in considerable
> > majority of instances).
> 
> That's just crazy BS.
> 
> Just call poll() again when you copy the data to userland (which by
> definition can block, again).
> 
> Stop the idiotic "let's break poll for stupid AIO reasons, because the
> AIO people are morons".

You underestimate the nastiness of that thing (and for the record, I'm
a lot *less* fond of AIO than you are, what with having had to read that
nest of horrors lately).  It does not "copy the data to userland"; what it
does instead is copying into an array of pages it keeps, right from
IO completion callback.  In read/write case.  This
        ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
        event = ev_page + pos % AIO_EVENTS_PER_PAGE;

        event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
        event->data = iocb->ki_user_data;
        event->res = res;
        event->res2 = res2;

        kunmap_atomic(ev_page);
        flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
is what does the copying.  And that might be done from IRQ context.
Yes, really.

They do have a slightly saner syscall that does copying from the damn
ring buffer, but its use is optional - userland can (and does) direct
read access to mmapped buffer.

Single-consumer ABIs suck and AIO is one such...

It could do schedule_work() and do blocking stuff from that - does so, in
case if it can't grab ->ctx_lock.  Earlier iteration used to try doing
everything straight from wakeup callback, and *that* was racy as hell;
I'd rather have Christoph explain which races he'd been refering to,
but there had been a whole lot of that.  Solution I suggested in the
last round of that was to offload __aio_poll_complete() via schedule_work()
both for cancel and poll wakeup cases.  Doing the common case right
from poll wakeup callback was argued to avoid noticable overhead in
common situation - that's what "aio: try to complete poll iocbs without
context switch" is about.  I'm more than slightly unhappy about the
lack of performance regression testing in non-AIO case...

At that point I would really like to see replies from Christoph - he's
on CET usually, no idea what his effective timezone is...

  reply	other threads:[~2018-06-28 23:37 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28 14:20 [RFC] replace ->get_poll_head with a waitqueue pointer in struct file Christoph Hellwig
2018-06-28 14:20 ` [PATCH 1/6] net: remove sock_poll_busy_flag Christoph Hellwig
2018-06-28 14:20 ` [PATCH 2/6] net: remove bogus RCU annotations on socket.wq Christoph Hellwig
2018-06-28 14:20 ` [PATCH 3/6] net: don't detour through struct to find the poll head Christoph Hellwig
2018-06-28 14:20 ` [PATCH 4/6] net: remove busy polling from sock_get_poll_head Christoph Hellwig
2018-06-28 14:20 ` [PATCH 5/6] net: remove sock_poll_busy_loop Christoph Hellwig
2018-06-28 14:20 ` [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer Christoph Hellwig
2018-06-28 16:40   ` Linus Torvalds
2018-06-28 18:17     ` Al Viro
2018-06-28 19:31       ` Linus Torvalds
2018-06-28 20:28         ` Al Viro
2018-06-28 20:37           ` Al Viro
2018-06-28 21:16             ` Linus Torvalds
2018-06-28 21:11           ` Linus Torvalds
2018-06-28 21:30             ` Al Viro
2018-06-28 21:39               ` Linus Torvalds
2018-06-28 22:20               ` Al Viro
2018-06-28 22:35                 ` Linus Torvalds
2018-06-28 22:49                   ` Al Viro
2018-06-28 22:55                     ` Linus Torvalds
2018-06-28 23:37                       ` Al Viro [this message]
2018-06-29  0:13                         ` Linus Torvalds
2018-06-29 13:29               ` Christoph Hellwig
2018-06-29 13:40                 ` Linus Torvalds
2018-06-29 13:28             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180628233720.GN30522@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).