From: Roman Penyaev <rpenyaev@suse.de>
To: Renzo Davoli <renzo@cs.unibo.it>
Cc: Greg KH <gregkh@linuxfoundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Davide Libenzi <davidel@xmailserver.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org, linux-kernel-owner@vger.kernel.org
Subject: Re: [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events
Date: Fri, 31 May 2019 13:48:39 +0200 [thread overview]
Message-ID: <cd20672aaf13f939b4f798d0839d2438@suse.de> (raw)
In-Reply-To: <20190531104502.GE3661@cs.unibo.it>
On 2019-05-31 12:45, Renzo Davoli wrote:
> HI Roman,
>
> On Fri, May 31, 2019 at 11:34:08AM +0200, Roman Penyaev wrote:
>> On 2019-05-27 15:36, Renzo Davoli wrote:
>> > Unfortunately this approach cannot be applied to
>> > poll/select/ppoll/pselect/epoll.
>>
>> If you have to override other systemcalls, what is the problem to
>> override
>> poll family? It will add, let's say, 50 extra code lines complexity
>> to your
>> userspace code. All you need is to be woken up by *any* event and
>> check
>> one mask variable, in order to understand what you need to do: read or
>> write,
>> basically exactly what you do in your eventfd modification, but only
>> in
>> userspace.
>
> This approach would not scale. If I want to use both a (user-space)
> network stack
> and a (emulated) device (or more stacks and devices) which
> (overridden) poll would I use?
>
> The poll of the first stack is not able to to deal with the third
> device.
Since each such a stack has a set of read/write/etc functions you always
can extend you stack with another call which returns you event mask,
specifying what exactly you have to do, e.g.:
nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
for (n = 0; n < nfds; ++n) {
struct sock *sock;
sock = events[n].data.ptr;
events = sock->get_events(sock, &events[n]);
if (events & EPOLLIN)
sock->read(sock);
if (events & EPOLLOUT)
sock->write(sock);
}
With such a virtual table you can mix all userspace stacks and even
with normal sockets, for which 'get_events' function can be declared as
static poll_t kernel_sock_get_events(struct sock *sock, struct
epoll_event *ev)
{
return ev->events;
}
Do I miss something?
>> > > Why can it not be less than 64?
>> > This is the imeplementation of 'write'. The 64 bits include the
>> > 'command'
>> > EFD_VPOLL_ADDEVENTS, EFD_VPOLL_DELEVENTS or EFD_VPOLL_MODEVENTS (in the
>> > most
>> > significant 32 bits) and the set of events (in the lowest 32 bits).
>>
>> Do you really need add/del/mod semantics? Userspace still has to keep
>> mask
>> somewhere, so you can have one simple command, which does:
>> ctx->count = events;
>> in kernel, so no masks and this games with bits are needed. That will
>> simplify API.
>
> It is true, at the price to have more complex code in user space.
> Other system calls could have beeen implemented as "set the value",
> instead there are
> ADD/DEL modification flags.
> I mean for example sigprocmask (SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK),
> or even epoll_ctl.
> While poll requires the program to keep the struct pollfd array stored
> somewhere,
> epoll is more powerful and flexible as different file descriptors can
> be added
> and deleted by different modules/components.
>
> If I have two threads implementing the send and receive path of a
> socket in a user-space
Eventually you come up with such a lock to protect your tcp or whatever
state machine. Or you have a real example where read and write paths
can work completely independently?
--
Roman
next prev parent reply other threads:[~2019-05-31 11:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-26 14:25 [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events Renzo Davoli
2019-05-26 20:24 ` kbuild test robot
2019-05-26 20:49 ` kbuild test robot
2019-05-27 3:09 ` kbuild test robot
2019-05-27 7:33 ` Greg KH
2019-05-27 13:36 ` Renzo Davoli
2019-05-31 9:34 ` Roman Penyaev
2019-05-31 10:45 ` Renzo Davoli
2019-05-31 11:48 ` Roman Penyaev [this message]
2019-06-03 15:00 ` Renzo Davoli
2019-06-06 20:11 ` Roman Penyaev
2019-06-07 9:40 ` Renzo Davoli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd20672aaf13f939b4f798d0839d2438@suse.de \
--to=rpenyaev@suse.de \
--cc=davidel@xmailserver.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel-owner@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=renzo@cs.unibo.it \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).