linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Renzo Davoli <renzo@cs.unibo.it>
To: Roman Penyaev <rpenyaev@suse.de>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Davide Libenzi <davidel@xmailserver.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, linux-kernel-owner@vger.kernel.org
Subject: Re: [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events
Date: Fri, 31 May 2019 12:45:02 +0200	[thread overview]
Message-ID: <20190531104502.GE3661@cs.unibo.it> (raw)
In-Reply-To: <480f1bda66b67f740f5da89189bbfca3@suse.de>

HI Roman,

On Fri, May 31, 2019 at 11:34:08AM +0200, Roman Penyaev wrote:
> On 2019-05-27 15:36, Renzo Davoli wrote:
> > Unfortunately this approach cannot be applied to
> > poll/select/ppoll/pselect/epoll.
> 
> If you have to override other systemcalls, what is the problem to override
> poll family?  It will add, let's say, 50 extra code lines complexity to your
> userspace code.  All you need is to be woken up by *any* event and check
> one mask variable, in order to understand what you need to do: read or
> write,
> basically exactly what you do in your eventfd modification, but only in
> userspace.

This approach would not scale. If I want to use both a (user-space) network stack
and a (emulated) device (or more stacks and devices) which (overridden) poll would I use?

The poll of the first stack is not able to to deal with the third device.

> 
> 
> > > Why can it not be less than 64?
> > This is the imeplementation of 'write'. The 64 bits include the
> > 'command'
> > EFD_VPOLL_ADDEVENTS, EFD_VPOLL_DELEVENTS or EFD_VPOLL_MODEVENTS (in the
> > most
> > significant 32 bits) and the set of events (in the lowest 32 bits).
> 
> Do you really need add/del/mod semantics?  Userspace still has to keep mask
> somewhere, so you can have one simple command, which does:
>    ctx->count = events;
> in kernel, so no masks and this games with bits are needed.  That will
> simplify API.

It is true, at the price to have more complex code in user space.
Other system calls could have beeen implemented as "set the value", instead there are
ADD/DEL modification flags.
I mean for example sigprocmask (SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK), or even epoll_ctl.
While poll requires the program to keep the struct pollfd array stored somewhere,
epoll is more powerful and flexible as different file descriptors can be added
and deleted by different modules/components.

If I have two threads implementing the send and receive path of a socket in a user-space
network stack implementation the epoll pending bitmap is shared so I have to create
critical sections like the following one any time I need to set or reset a bit.
	pthread_mutex_lock(mylock)
	events |= EPOLLIN
	write(efd, &events, sizeof(events));
	pthread_mutex_unlock(mylock)
Using add/del semantics locking is not required as the send path thread deals with EPOLLOUT while
its siblings receive thread uses EPOLLIN or EPOLLPRI

I would prefer the add/del/mod semantics, but if this is generally perceived as a unnecessary 
complexity in the kernel code I can update my patch.  

Thank you Roman,
			
			renzo

  reply	other threads:[~2019-05-31 10:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-26 14:25 [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events Renzo Davoli
2019-05-26 20:24 ` kbuild test robot
2019-05-26 20:49 ` kbuild test robot
2019-05-27  3:09 ` kbuild test robot
2019-05-27  7:33 ` Greg KH
2019-05-27 13:36   ` Renzo Davoli
2019-05-31  9:34     ` Roman Penyaev
2019-05-31 10:45       ` Renzo Davoli [this message]
2019-05-31 11:48         ` Roman Penyaev
2019-06-03 15:00           ` Renzo Davoli
2019-06-06 20:11             ` Roman Penyaev
2019-06-07  9:40               ` Renzo Davoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190531104502.GE3661@cs.unibo.it \
    --to=renzo@cs.unibo.it \
    --cc=davidel@xmailserver.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel-owner@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rpenyaev@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).