linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Sustrik <sustrik@250bpm.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sha Zhengju <handai.szj@taobao.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] eventfd: implementation of EFD_MASK flag
Date: Thu, 07 Feb 2013 21:11:12 +0100	[thread overview]
Message-ID: <51140A60.4070705@250bpm.com> (raw)
In-Reply-To: <5113FCA7.4020207@mit.edu>

On 07/02/13 20:12, Andy Lutomirski wrote:
> On 02/06/2013 10:41 PM, Martin Sustrik wrote:
>> When implementing network protocols in user space, one has to implement
>> fake user-space file descriptors to represent the sockets for the protocol.
>>
>> While all the BSD socket API functionality for such descriptors may be faked as
>> well (myproto_send(), myproto_recv() etc.) this approach doesn't work for
>> polling  (select, poll, epoll). For polling, real system-level file descriptor
>> is needed.
>>
>> In theory, eventfd may be used for this purpose, except that it is well suited
>> only for signaling POLLIN. With some hacking it can be also used to signal
>> POLLOUT and POLLERR, however:
>>
>> I.  There's no way to signal POLLPRI, POLLHUP etc.
>> II. There's no way to signal arbitraty combination of POLL* flags. Most notably,
>>      !POLLIN&  !POLLOUT, which is a perfectly valid combination for a network
>>      protocol (rx buffer is empty and tx buffer is full), cannot be signaled
>>      using current implementation of eventfd.
>>
>> This patch implements new EFD_MASK flag which attempts to solve this problem.
>>
>> Additionally, when implementing network protocols in user space, there's a
>> need to associate user-space state with the each "socket". If eventfd object is
>> used as a reference to the socket, it should be possible to associate an opaque
>> pointer to user-space data with it.
>>
>> The semantics of EFD_MASK are as follows:
>>
>> eventfd(2):
>>
>> If eventfd is created with EFD_MASK flag set, it is initialised in such a way
>> as to signal no events on the file descriptor when it is polled on. 'initval'
>> argument is ignored.
>>
>> write(2):
>>
>> User is allowed to write only buffers containing the following structure:
>>
>> struct efd_mask {
>>    short events;
>>    void *ptr;
>> };
>
> IMO that should be u64 ptr to avoid compat problems.

I was following the user space declaration of epoll_data:

            typedef union epoll_data {
                void        *ptr;  <-----
                int          fd;
                uint32_t     u32;
                uint64_t     u64;
            } epoll_data_t;

However, now I'm looking at the kernel side definition of the whole 
union which looks like this (obviously it assumes that pointer is never 
longer than 64 bits):

          __u64 data;

Hm, not very helpful. Anyway, I am not a kernel developer, so any 
concrete suggestion about what type to use to map cleanly to user-space 
void* is welcome.

>> The value of 'events' should be any combination of event flags as defined by
>> poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified events will
>> be signaled when polling (select, poll, epoll) on the eventfd is done later on.
>> 'ptr' is an opaque pointer that is not interpreted by eventfd object.
>
> How does this interact with EPOLLET?

That's an interesting question. The original eventfd code doesn't do 
anything specific to either edge or level mode. Neither does my patch.

Inspection of the code seems to suggest that edge vs. level distinction 
is handled elsewhere (ep_send_events_proc) where there is a separate 
list of ready events and the function, after returning the event, 
decides whether to leave the event in the list (level) or delete it from 
the list (edge).

In any case, review from someone with experience with epoll 
implementation would help.

Martin

  reply	other threads:[~2013-02-07 20:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-07  6:41 [PATCH 1/1] eventfd: implementation of EFD_MASK flag Martin Sustrik
2013-02-07 19:12 ` Andy Lutomirski
2013-02-07 20:11   ` Martin Sustrik [this message]
2013-02-08  1:03     ` Andy Lutomirski
2013-02-08  5:26       ` Martin Sustrik
2013-02-08  6:36         ` Andy Lutomirski
2013-02-08  6:55           ` Martin Sustrik
2013-02-08 22:08       ` Eric Wong
2013-02-09  3:26         ` Martin Sustrik
2013-02-07 22:44 ` Andrew Morton
2013-02-07 23:30   ` Martin Sustrik
2013-02-08 12:43   ` Martin Sustrik
2013-02-08 22:21     ` Eric Wong
2013-02-09  2:40       ` Martin Sustrik
2013-02-09  3:54         ` Eric Wong
2013-02-09  7:36           ` Martin Sustrik
2013-02-09 11:51             ` Eric Wong
2013-02-09 12:04               ` Martin Sustrik
  -- strict thread matches above, loose matches on Subject: below --
2013-02-07 23:29 Martin Sustrik
2013-02-15  2:45 ` Michał Mirosław

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51140A60.4070705@250bpm.com \
    --to=sustrik@250bpm.com \
    --cc=akpm@linux-foundation.org \
    --cc=handai.szj@taobao.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).