From: Evgeniy Polyakov <zbr@ioremap.net>
To: Eric Paris <eparis@redhat.com>
Cc: Jamie Lokier <jamie@shareable.org>,
David Miller <davem@davemloft.net>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
netdev@vger.kernel.org, viro@zeniv.linux.org.uk,
alan@linux.intel.com, hch@infradead.org,
torvalds@linux-foundation.org
Subject: Re: fanotify as syscalls
Date: Wed, 16 Sep 2009 16:05:23 +0400 [thread overview]
Message-ID: <20090916120523.GA12830@ioremap.net> (raw)
In-Reply-To: <1253051699.5213.18.camel@dhcp231-106.rdu.redhat.com>
On Tue, Sep 15, 2009 at 05:54:59PM -0400, Eric Paris (eparis@redhat.com) wrote:
> Nothing's impossible, but is netlink a square peg for this round hole?
> One of the great benefits of netlink, the attribute matching and
> filtering, although possibly useful isn't some panacea as we have to do
> that well before netlink to have anything like decent performance.
> Imagine every single fs event creating an skb and sending it with
> netlink only to have most of them dropped.
There is no problem with performance even with single IO per skb.
Consider usual send/recv calls which may end up with the same skb per
syscall - most of the overhead comes from data copy or syscall machinery
(for small writes) and not from allocation path.
I have a 3.5 years old performance graph at
http://www.ioremap.net/gallery/netlink_perf.png
which shows 400 MB/s of bandwidth for 4k writes, I'm pretty sure it is
limited by copy performance only.
> The only other benefit to netlink that I know of is the reasonable,
> easy, and clean addition of information later in time with backwards
> compatibility as needed. That's really cool, I admit, but with the
> limited amount of additional info that users have wanted out of inotify
> I think my data type extensibility should be enough.
I want alot from inotify which I'm afraid will not be easy with fanotify
either, but its existing model just does not allow its extension. I
would not be 100% sure that there will be no additional needs in a year
or so for fanotify.
> > Moreover you can implement a pool of working threads and
> > postpone all the work to them and appropriate event queues, which will
> > allow to use rlimits for the listeners and open files 'kind of' on
> > behalf of those processes.
>
> I'm sorry, I don't userstand. I don't see how worker threads help
> anything here. Can you explain what you are thinking?
I meant that it could be possible to postpone all the work of queueing,
event allocation, fd opening and population all be done on behalf of
some other threads in the system and only original process credentials
would be checked to satisfy various limits. In this case there will be
no questions in which context given fd was created and it is possible to
use async netlink nature.
I do not force you to do this of course, but there is already quite huge
infrastructure for similar tasks and it could be worth to
change/reconsider things to use existing models and not invent own.
Of course this is a matter of overall benefit.
> > But it is quite diferent from the approach you selected and which is
> > more obvious indeed. So if you ask a question whether fanotify should
> > use sockets or syscalls, I would prefer sockets
>
> I've heard someone else off list say this as well. I'm not certain why.
> I actually spent the day yesterday and have fanotify working over 5 new
> syscalls (good thing I wrote the code with separate back and and front
> ends for just this purpose) And I really don't hate it. I think 3
> might be enough.
>
> fanotify_init() ---- very much like inotify_init
> fanotify_modify_mark_at() --- like inotify_add_watch and rm_watch
> fanotify_modify_mark_fd() --- same but with an fd instead of a path
Those two can be combined I think.
> fanotify_response() --- userspace tells the kernel what to do if requested/allowed
> (could probably be done using write() to the fanotify fd)
> fanotify_exclude() --- a horrid syscall which might be better as an ioctl since it isn't strongly typed....
It all sounds good and simple, but what if you will need modify command
with new arguments? Instead of adding new typed option you will need to
add another syscall. I already did that for inotify but via ioctl and
pretty sure there will be such need for much wider fanotify some time in
the future.
> I don't see what's gained using netlink. I am already reusing the
> fsnotify code to do all my queuing. Someone help me understand the
> benefit of netlink and help me understand how we can reasonably meet the
> needs and I'll try to prototype it.
>
> 1) fd's must be opened in the recv process
Or just injected into registered process' fd table with appropriate
limit checks? In this case it can be done on behalf of whatever other
worker.
> 2) reliability, if loss must know on the send side
You have this knowledge at netlink sending time, but there is no way to
wait until 'fail' condition is removed like when you can block writing
into socket waiting for buffer space to become large enough.
And there is no way to tell how many listeners got message and how many
was dropped in multicast deliver except that there were drops.
This can be trivially extended though.
--
Evgeniy Polyakov
next prev parent reply other threads:[~2009-09-16 12:05 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-11 5:25 [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Eric Paris
2009-09-11 5:26 ` [PATCH 2/8] vfs: introduce FMODE_NONOTIFY Eric Paris
2009-09-11 5:26 ` [PATCH 3/8] fanotify: fscking all notification system Eric Paris
2009-09-11 5:26 ` [PATCH 4/8] fanotify:drop notification if they exist in the outgoing queue Eric Paris
2009-09-11 5:26 ` [PATCH 5/8] fanotify: merge notification events with different masks Eric Paris
2009-09-11 5:26 ` [PATCH 6/8] fanotify: userspace socket Eric Paris
2009-09-11 5:26 ` [PATCH 7/8] fanotify: userspace can add and remove fsnotify inode marks Eric Paris
2009-09-11 5:26 ` [PATCH 8/8] fanotify: send events to userspace over socket reads Eric Paris
2009-09-11 14:08 ` Daniel Walker
2009-09-11 14:15 ` Eric Paris
2009-09-11 14:22 ` Daniel Walker
2009-09-11 14:32 ` Daniel Walker
2009-09-11 14:32 ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Andreas Gruenbacher
2009-09-11 16:04 ` Eric Paris
2009-09-11 18:46 ` David Miller
2009-09-11 19:33 ` Eric Paris
2009-09-11 20:46 ` Jamie Lokier
2009-09-11 21:13 ` Eric Paris
2009-09-11 21:27 ` Jamie Lokier
2009-09-11 21:51 ` Eric Paris
2009-09-12 9:41 ` Evgeniy Polyakov
2009-09-14 0:17 ` Jamie Lokier
2009-09-14 14:07 ` Evgeniy Polyakov
2009-09-14 19:08 ` fanotify as syscalls Eric Paris
2009-09-15 20:16 ` Evgeniy Polyakov
2009-09-15 21:54 ` Eric Paris
2009-09-15 23:49 ` Linus Torvalds
2009-09-16 1:26 ` Eric Paris
2009-09-16 7:52 ` Jamie Lokier
2009-09-16 9:48 ` Eric Paris
2009-09-16 12:17 ` Jamie Lokier
2009-09-17 20:07 ` Andreas Gruenbacher
2009-09-18 20:52 ` Eric Paris
2009-09-18 22:00 ` Andreas Gruenbacher
2009-09-19 3:04 ` Eric Paris
2009-09-21 20:04 ` Andreas Gruenbacher
2009-09-21 20:28 ` Jamie Lokier
2009-09-21 21:27 ` Andreas Gruenbacher
2009-09-21 22:00 ` Jamie Lokier
2009-09-21 23:09 ` Andreas Gruenbacher
2009-09-21 23:56 ` Jamie Lokier
2009-09-21 22:18 ` Davide Libenzi
2009-09-21 23:12 ` Jamie Lokier
2009-09-22 14:51 ` Davide Libenzi
2009-09-22 15:31 ` Andreas Gruenbacher
2009-09-22 16:04 ` Davide Libenzi
2009-09-23 8:39 ` Tvrtko Ursulin
2009-09-23 11:20 ` hch
2009-09-23 15:35 ` Davide Libenzi
2009-09-23 21:58 ` hch
2009-09-23 11:32 ` Arjan van de Ven
2009-09-23 15:42 ` Tvrtko Ursulin
2009-09-23 15:51 ` Eric Paris
2009-09-23 21:56 ` hch
2009-09-23 15:26 ` Davide Libenzi
2009-09-23 15:45 ` Tvrtko Ursulin
2009-09-23 17:31 ` Davide Libenzi
2009-09-22 16:11 ` Eric Paris
2009-09-22 16:27 ` Jamie Lokier
2009-09-22 23:43 ` Davide Libenzi
2009-09-22 21:06 ` Eric Paris
2009-09-22 21:38 ` Andreas Gruenbacher
2009-09-16 10:41 ` Alan Cox
2009-09-16 11:41 ` Jamie Lokier
2009-09-16 12:01 ` Alan Cox
2009-09-16 12:56 ` Jamie Lokier
2009-09-16 15:53 ` Eric Paris
2009-09-16 21:49 ` Jamie Lokier
2009-09-16 22:33 ` Eric Paris
2009-09-16 11:30 ` Arnd Bergmann
2009-09-16 12:05 ` Evgeniy Polyakov [this message]
2009-09-16 12:27 ` Jamie Lokier
2009-09-17 16:40 ` Linus Torvalds
2009-09-17 17:35 ` Arjan van de Ven
2009-09-17 18:53 ` Eric Paris
2009-09-22 0:15 ` Eric W. Biederman
2009-09-22 0:22 ` Randy Dunlap
2009-09-11 21:21 ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers jamal
2009-09-11 21:42 ` Jamie Lokier
2009-09-11 22:52 ` jamal
2009-09-14 0:03 ` Jamie Lokier
2009-09-14 1:26 ` Eric Paris
2009-09-14 13:15 ` jamal
2009-09-12 9:47 ` Evgeniy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090916120523.GA12830@ioremap.net \
--to=zbr@ioremap.net \
--cc=alan@linux.intel.com \
--cc=davem@davemloft.net \
--cc=eparis@redhat.com \
--cc=hch@infradead.org \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox