From: Al Viro <viro@ZenIV.linux.org.uk>
To: lkp@lists.01.org
Subject: Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
Date: Fri, 29 Jun 2018 00:37:21 +0100 [thread overview]
Message-ID: <20180628233720.GN30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFz1PQ86=6xEFM9ajZRoKF7NyAQmf8Gd8qnOOguFieWWbA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2687 bytes --]
On Thu, Jun 28, 2018 at 03:55:35PM -0700, Linus Torvalds wrote:
> > You are misreading that mess. What he's trying to do (other than surviving
> > the awful clusterfuck around cancels) is to handle the decision what to
> > report to userland right in the wakeup callback. *That* is what really
> > drives the "make the second-pass ->poll() or something similar to it
> > non-blocking" (in addition to the fact that it is such in considerable
> > majority of instances).
>
> That's just crazy BS.
>
> Just call poll() again when you copy the data to userland (which by
> definition can block, again).
>
> Stop the idiotic "let's break poll for stupid AIO reasons, because the
> AIO people are morons".
You underestimate the nastiness of that thing (and for the record, I'm
a lot *less* fond of AIO than you are, what with having had to read that
nest of horrors lately). It does not "copy the data to userland"; what it
does instead is copying into an array of pages it keeps, right from
IO completion callback. In read/write case. This
ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
event = ev_page + pos % AIO_EVENTS_PER_PAGE;
event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
event->data = iocb->ki_user_data;
event->res = res;
event->res2 = res2;
kunmap_atomic(ev_page);
flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
is what does the copying. And that might be done from IRQ context.
Yes, really.
They do have a slightly saner syscall that does copying from the damn
ring buffer, but its use is optional - userland can (and does) direct
read access to mmapped buffer.
Single-consumer ABIs suck and AIO is one such...
It could do schedule_work() and do blocking stuff from that - does so, in
case if it can't grab ->ctx_lock. Earlier iteration used to try doing
everything straight from wakeup callback, and *that* was racy as hell;
I'd rather have Christoph explain which races he'd been refering to,
but there had been a whole lot of that. Solution I suggested in the
last round of that was to offload __aio_poll_complete() via schedule_work()
both for cancel and poll wakeup cases. Doing the common case right
from poll wakeup callback was argued to avoid noticable overhead in
common situation - that's what "aio: try to complete poll iocbs without
context switch" is about. I'm more than slightly unhappy about the
lack of performance regression testing in non-AIO case...
At that point I would really like to see replies from Christoph - he's
on CET usually, no idea what his effective timezone is...
WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>, LKP <lkp@01.org>
Subject: Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
Date: Fri, 29 Jun 2018 00:37:21 +0100 [thread overview]
Message-ID: <20180628233720.GN30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFz1PQ86=6xEFM9ajZRoKF7NyAQmf8Gd8qnOOguFieWWbA@mail.gmail.com>
On Thu, Jun 28, 2018 at 03:55:35PM -0700, Linus Torvalds wrote:
> > You are misreading that mess. What he's trying to do (other than surviving
> > the awful clusterfuck around cancels) is to handle the decision what to
> > report to userland right in the wakeup callback. *That* is what really
> > drives the "make the second-pass ->poll() or something similar to it
> > non-blocking" (in addition to the fact that it is such in considerable
> > majority of instances).
>
> That's just crazy BS.
>
> Just call poll() again when you copy the data to userland (which by
> definition can block, again).
>
> Stop the idiotic "let's break poll for stupid AIO reasons, because the
> AIO people are morons".
You underestimate the nastiness of that thing (and for the record, I'm
a lot *less* fond of AIO than you are, what with having had to read that
nest of horrors lately). It does not "copy the data to userland"; what it
does instead is copying into an array of pages it keeps, right from
IO completion callback. In read/write case. This
ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
event = ev_page + pos % AIO_EVENTS_PER_PAGE;
event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
event->data = iocb->ki_user_data;
event->res = res;
event->res2 = res2;
kunmap_atomic(ev_page);
flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
is what does the copying. And that might be done from IRQ context.
Yes, really.
They do have a slightly saner syscall that does copying from the damn
ring buffer, but its use is optional - userland can (and does) direct
read access to mmapped buffer.
Single-consumer ABIs suck and AIO is one such...
It could do schedule_work() and do blocking stuff from that - does so, in
case if it can't grab ->ctx_lock. Earlier iteration used to try doing
everything straight from wakeup callback, and *that* was racy as hell;
I'd rather have Christoph explain which races he'd been refering to,
but there had been a whole lot of that. Solution I suggested in the
last round of that was to offload __aio_poll_complete() via schedule_work()
both for cancel and poll wakeup cases. Doing the common case right
from poll wakeup callback was argued to avoid noticable overhead in
common situation - that's what "aio: try to complete poll iocbs without
context switch" is about. I'm more than slightly unhappy about the
lack of performance regression testing in non-AIO case...
At that point I would really like to see replies from Christoph - he's
on CET usually, no idea what his effective timezone is...
next prev parent reply other threads:[~2018-06-28 23:37 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-28 14:20 [RFC] replace ->get_poll_head with a waitqueue pointer in struct file Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 1/6] net: remove sock_poll_busy_flag Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 2/6] net: remove bogus RCU annotations on socket.wq Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 3/6] net: don't detour through struct to find the poll head Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 4/6] net: remove busy polling from sock_get_poll_head Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 5/6] net: remove sock_poll_busy_loop Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 16:40 ` Linus Torvalds
2018-06-28 16:40 ` Linus Torvalds
2018-06-28 18:17 ` Al Viro
2018-06-28 18:17 ` Al Viro
2018-06-28 19:31 ` Linus Torvalds
2018-06-28 19:31 ` Linus Torvalds
2018-06-28 20:28 ` Al Viro
2018-06-28 20:28 ` Al Viro
2018-06-28 20:37 ` Al Viro
2018-06-28 20:37 ` Al Viro
2018-06-28 21:16 ` Linus Torvalds
2018-06-28 21:16 ` Linus Torvalds
2018-06-28 21:11 ` Linus Torvalds
2018-06-28 21:11 ` Linus Torvalds
2018-06-28 21:30 ` Al Viro
2018-06-28 21:30 ` Al Viro
2018-06-28 21:39 ` Linus Torvalds
2018-06-28 21:39 ` Linus Torvalds
2018-06-28 22:20 ` Al Viro
2018-06-28 22:20 ` Al Viro
2018-06-28 22:35 ` Linus Torvalds
2018-06-28 22:35 ` Linus Torvalds
2018-06-28 22:49 ` Al Viro
2018-06-28 22:49 ` Al Viro
2018-06-28 22:55 ` Linus Torvalds
2018-06-28 22:55 ` Linus Torvalds
2018-06-28 23:37 ` Al Viro [this message]
2018-06-28 23:37 ` Al Viro
2018-06-29 0:13 ` Linus Torvalds
2018-06-29 0:13 ` Linus Torvalds
2018-06-29 13:29 ` Christoph Hellwig
2018-06-29 13:29 ` Christoph Hellwig
2018-06-29 13:40 ` Linus Torvalds
2018-06-29 13:40 ` Linus Torvalds
2018-06-29 13:28 ` Christoph Hellwig
2018-06-29 13:28 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180628233720.GN30522@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=lkp@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.