From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>, LKP <lkp@01.org>
Subject: Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
Date: Fri, 29 Jun 2018 00:37:21 +0100 [thread overview]
Message-ID: <20180628233720.GN30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFz1PQ86=6xEFM9ajZRoKF7NyAQmf8Gd8qnOOguFieWWbA@mail.gmail.com>
On Thu, Jun 28, 2018 at 03:55:35PM -0700, Linus Torvalds wrote:
> > You are misreading that mess. What he's trying to do (other than surviving
> > the awful clusterfuck around cancels) is to handle the decision what to
> > report to userland right in the wakeup callback. *That* is what really
> > drives the "make the second-pass ->poll() or something similar to it
> > non-blocking" (in addition to the fact that it is such in considerable
> > majority of instances).
>
> That's just crazy BS.
>
> Just call poll() again when you copy the data to userland (which by
> definition can block, again).
>
> Stop the idiotic "let's break poll for stupid AIO reasons, because the
> AIO people are morons".
You underestimate the nastiness of that thing (and for the record, I'm
a lot *less* fond of AIO than you are, what with having had to read that
nest of horrors lately). It does not "copy the data to userland"; what it
does instead is copying into an array of pages it keeps, right from
IO completion callback. In read/write case. This
ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
event = ev_page + pos % AIO_EVENTS_PER_PAGE;
event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
event->data = iocb->ki_user_data;
event->res = res;
event->res2 = res2;
kunmap_atomic(ev_page);
flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
is what does the copying. And that might be done from IRQ context.
Yes, really.
They do have a slightly saner syscall that does copying from the damn
ring buffer, but its use is optional - userland can (and does) direct
read access to mmapped buffer.
Single-consumer ABIs suck and AIO is one such...
It could do schedule_work() and do blocking stuff from that - does so, in
case if it can't grab ->ctx_lock. Earlier iteration used to try doing
everything straight from wakeup callback, and *that* was racy as hell;
I'd rather have Christoph explain which races he'd been refering to,
but there had been a whole lot of that. Solution I suggested in the
last round of that was to offload __aio_poll_complete() via schedule_work()
both for cancel and poll wakeup cases. Doing the common case right
from poll wakeup callback was argued to avoid noticable overhead in
common situation - that's what "aio: try to complete poll iocbs without
context switch" is about. I'm more than slightly unhappy about the
lack of performance regression testing in non-AIO case...
At that point I would really like to see replies from Christoph - he's
on CET usually, no idea what his effective timezone is...
next prev parent reply other threads:[~2018-06-28 23:37 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-28 14:20 [RFC] replace ->get_poll_head with a waitqueue pointer in struct file Christoph Hellwig
2018-06-28 14:20 ` [PATCH 1/6] net: remove sock_poll_busy_flag Christoph Hellwig
2018-06-28 14:20 ` [PATCH 2/6] net: remove bogus RCU annotations on socket.wq Christoph Hellwig
2018-06-28 14:20 ` [PATCH 3/6] net: don't detour through struct to find the poll head Christoph Hellwig
2018-06-28 14:20 ` [PATCH 4/6] net: remove busy polling from sock_get_poll_head Christoph Hellwig
2018-06-28 14:20 ` [PATCH 5/6] net: remove sock_poll_busy_loop Christoph Hellwig
2018-06-28 14:20 ` [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer Christoph Hellwig
2018-06-28 16:40 ` Linus Torvalds
2018-06-28 18:17 ` Al Viro
2018-06-28 19:31 ` Linus Torvalds
2018-06-28 20:28 ` Al Viro
2018-06-28 20:37 ` Al Viro
2018-06-28 21:16 ` Linus Torvalds
2018-06-28 21:11 ` Linus Torvalds
2018-06-28 21:30 ` Al Viro
2018-06-28 21:39 ` Linus Torvalds
2018-06-28 22:20 ` Al Viro
2018-06-28 22:35 ` Linus Torvalds
2018-06-28 22:49 ` Al Viro
2018-06-28 22:55 ` Linus Torvalds
2018-06-28 23:37 ` Al Viro [this message]
2018-06-29 0:13 ` Linus Torvalds
2018-06-29 13:29 ` Christoph Hellwig
2018-06-29 13:40 ` Linus Torvalds
2018-06-29 13:28 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180628233720.GN30522@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lkp@01.org \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).