From: Renzo Davoli <renzo@cs.unibo.it>
To: Roman Penyaev <rpenyaev@suse.de>
Cc: Greg KH <gregkh@linuxfoundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Davide Libenzi <davidel@xmailserver.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org, linux-kernel-owner@vger.kernel.org
Subject: Re: [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events
Date: Mon, 3 Jun 2019 17:00:10 +0200 [thread overview]
Message-ID: <20190603150010.GE4312@cs.unibo.it> (raw)
In-Reply-To: <cd20672aaf13f939b4f798d0839d2438@suse.de>
Hi Roman,
I sorry for the delay in my answer, but I needed to set up a minimal
tutorial to show what I am working on and why I need a feature like the
one I am proposing.
Please, have a look of the README.md page here:
https://github.com/virtualsquare/vuos
(everything can be downloaded and tested)
On Fri, May 31, 2019 at 01:48:39PM +0200, Roman Penyaev wrote:
> Since each such a stack has a set of read/write/etc functions you always
> can extend you stack with another call which returns you event mask,
> specifying what exactly you have to do, e.g.:
>
> nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
> for (n = 0; n < nfds; ++n) {
> struct sock *sock;
>
> sock = events[n].data.ptr;
> events = sock->get_events(sock, &events[n]);
>
> if (events & EPOLLIN)
> sock->read(sock);
> if (events & EPOLLOUT)
> sock->write(sock);
> }
>
>
> With such a virtual table you can mix all userspace stacks and even
> with normal sockets, for which 'get_events' function can be declared as
>
> static poll_t kernel_sock_get_events(struct sock *sock, struct epoll_event
> *ev)
> {
> return ev->events;
> }
>
> Do I miss something?
I am not trying to port some tools to use user-space implemented stacks or device
drivers/emulators, I am seeking to a general purpose approach.
I think that the example in the section of the README "mount a user-level
networking stack" explains the situation.
The submodule vunetvdestack uses a namespace to define a networking stack connected
to a VDE network (see https://github.com/rd235/vdeplug4).
The API is clean (as it can be seen at the end of the file vunet_modules/vunetvdestack.c).
All the methods but "socket" are directly mapped to their system call counterparts:
struct vunet_operations vunet_ops = {
.socket = vdestack_socket,
.bind = bind,
.connect = connect,
.listen = listen,
.accept4 = accept4,
....
.epoll_ctl = epoll_ctl,
...
}
(the elegance of the API can be seen also in vunet_modules/vunetreal.c: a 38 lines module
implementing a gateway to the real networking of the hosting machine)
Unfortunately I cannot use the same clean interface to support user-library implemented
stacks like lwip/lwipv6/picotcp because I cannot generate EPOLL events...
Bizantine workarounds based on data structures exchanged in the data.ptr field of epoll_event
that must be decoded by the hypervisor to retrieve the missing information about the event
can be implemented... but it would be a pity ;-)
The same problem arises in umdev modules: virtual devices should generate the same
EPOLL events of their real couterparts.
I feel that the ability to generate/synthesize EPOLL events could be useful for many projects.
(In my first message I included some URLs of people seeking for this feature, retrieved by
some queries on a web search engine)
Implementations may vary as well as the kernel API to support such a feature.
As I told, my proposal has a minimal impact on the code, it does not require the definition
of new syscalls, it simply enhances the features of eventfd.
>
> Eventually you come up with such a lock to protect your tcp or whatever
> state machine. Or you have a real example where read and write paths
> can work completely independently?
Actually umvu hypervisor uses concurrent tracing of concurrent processes.
We have named this technique "guardian angels": each process/thread running in the
partial virtual machine has a correspondent thread in the hypervisor.
So if a process uses two threads to manage a network connection (say a TCP stream),
the two guardian angels replicate their requests towards the networking module.
So I am looking for a general solution, not to a pattern to port some projects.
(and I cannot use two different approaches for event driven and multi-threaded
implementations as I have to support both).
If you reached this point... Thank you for your patience.
I am more than pleased to receive further comments or proposals.
renzo
next prev parent reply other threads:[~2019-06-03 15:00 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-26 14:25 [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events Renzo Davoli
2019-05-26 20:24 ` kbuild test robot
2019-05-26 20:49 ` kbuild test robot
2019-05-27 3:09 ` kbuild test robot
2019-05-27 7:33 ` Greg KH
2019-05-27 13:36 ` Renzo Davoli
2019-05-31 9:34 ` Roman Penyaev
2019-05-31 10:45 ` Renzo Davoli
2019-05-31 11:48 ` Roman Penyaev
2019-06-03 15:00 ` Renzo Davoli [this message]
2019-06-06 20:11 ` Roman Penyaev
2019-06-07 9:40 ` Renzo Davoli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190603150010.GE4312@cs.unibo.it \
--to=renzo@cs.unibo.it \
--cc=davidel@xmailserver.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel-owner@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rpenyaev@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).