From: Eric Dumazet <dada1@cosmosbay.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Ulrich Drepper <drepper@gmail.com>,
lkml <linux-kernel@vger.kernel.org>,
David Miller <davem@davemloft.net>,
Ulrich Drepper <drepper@redhat.com>,
Andrew Morton <akpm@osdl.org>, netdev <netdev@vger.kernel.org>,
Zach Brown <zach.brown@oracle.com>,
Christoph Hellwig <hch@infradead.org>,
Chase Venters <chase.venters@clientec.com>,
Johann Borck <johann.borck@densedata.com>
Subject: Re: [take19 1/4] kevent: Core files.
Date: Thu, 5 Oct 2006 12:45:03 +0200 [thread overview]
Message-ID: <200610051245.03880.dada1@cosmosbay.com> (raw)
In-Reply-To: <20061005102106.GE1015@2ka.mipt.ru>
On Thursday 05 October 2006 12:21, Evgeniy Polyakov wrote:
> On Thu, Oct 05, 2006 at 11:56:24AM +0200, Eric Dumazet (dada1@cosmosbay.com)
> > I may be wrong, but what is currently missing for me is :
> >
> > - No hardcoded limit on the max number of events. (A process that can
> > open XXX.XXX files should be allowed to open a kevent queue with at least
> > XXX.XXX events). Right now thats not clear what happens IF the current
> > limit is reached.
>
> This forces to overflows in fixed sized memory mapped buffer.
> If we remove memory mapped buffer or will allow to have overflows (and
> thus skipped entries) keven can easily scale to that limits (tested with
> xx.xxx events though).
What is missing or not obvious is : If events are skipped because of
overflows, What happens ? Connections stuck forever ? Hope that everything
will restore itself ? Is kernel able to SIGNAL this problem to user land ?
>
> > - In order to avoid touching the whole ring buffer, it might be good to
> > be able to reset the indexes to the beginning when ring buffer is empty.
> > (So if the user land is responsive enough to consume events, only first
> > pages of the mapping would be used : that saves L1/L2 cpu caches)
>
> And what happens when there are 3 empty at the beginning and \we need to
> put there 4 ready events?
Re-read what I said : when ring buffer is empty.
When ring buffer is empty, kernel can reset index right before adding XX new
events. You read 3 events consumed, I said : When all ring buffer is empty,
because all previous events were consumed by user land, then we can reset
indexes to 0.
>
> > A plus would be
> >
> > - A working/usable mmap ring buffer implementation, but I think its not
> > mandatory. System calls are not that expensive, especially if you can
> > batch XX events per syscall (like epoll). Nice thing with a ring buffer
> > is that we touch less cache lines than say epoll that have lot of linked
> > structures.
> >
> > About mmap, I think you might want a hybrid thing :
> >
> > One writable page where userland can write its index, (and hold one or
> > more futex shared by kernel) (with appropriate thread locking in case
> > multiple threads want to dequeue events). In fast path, no syscalls are
> > needed to maintain this user index.
> >
> > XXX readonly pages (for user, but r/w for kernel), where kernel write its
> > own index, and events of course.
>
> The problem is in that xxx pages - how many can we eat per kevent
> descriptor? It is pinned memory and thus it is possible to have a DoS.
> If xxx above is not enough to store all events, we will have
> yet-another-broken behaviour like rt-signal queue overflow.
>
Re-read : I have a process that has the right to open XXX.XXX handles,
allocating XXX.XXX tcp sockets, dentries, files structures, inodes, epoll
events, its obviously already a DOS risk, but controled by 'ulimit -n'
Allocating XXX.XXX * (32 or 64) bytes is a win if I can zap epoll structures
(currently more than 256 bytes per event)
epoll structures are pinned too... what's wrong with that ?
# egrep "filp|poll|TCP|dentries|sock_inode" /proc/slabinfo |cut -c1-50
tw_sock_TCP 1302 2200 192 20 1 :
request_sock_TCP 2046 4260 128 30 1 :
TCP 151509 196910 1472 5 2 :
eventpoll_pwq 146718 199439 72 53 1 :
eventpoll_epi 146718 199360 192 20 1 :
sock_inode_cache 149182 197940 640 6 1 :
filp 149537 202515 256 15 1 :
If you want to protect from DOS, just use ulimit -n 100
Eric
next prev parent reply other threads:[~2006-10-05 10:45 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <115a6230591036@2ka.mipt.ru>
2006-09-12 8:41 ` [take18 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-12 8:41 ` [take18 1/4] kevent: Core files Evgeniy Polyakov
2006-09-12 8:41 ` [take18 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-12 8:41 ` [take18 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-12 8:41 ` [take18 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-20 9:35 ` [take19 1/4] kevent: Core files Evgeniy Polyakov
2006-09-20 9:35 ` [take19 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-10-04 6:34 ` [take19 1/4] kevent: Core files Ulrich Drepper
2006-10-04 6:48 ` Evgeniy Polyakov
2006-10-04 17:57 ` Ulrich Drepper
2006-10-05 8:57 ` Evgeniy Polyakov
2006-10-05 9:56 ` Eric Dumazet
2006-10-05 10:21 ` Evgeniy Polyakov
2006-10-05 10:45 ` Eric Dumazet [this message]
2006-10-05 10:55 ` Evgeniy Polyakov
2006-10-05 12:09 ` Eric Dumazet
2006-10-05 12:37 ` Evgeniy Polyakov
2006-10-15 23:22 ` Ulrich Drepper
2006-10-16 7:33 ` Evgeniy Polyakov
2006-10-16 10:16 ` Ulrich Drepper
2006-10-16 11:23 ` Evgeniy Polyakov
2006-10-17 5:10 ` Johann Borck
2006-10-17 5:59 ` Chase Venters
2006-10-17 10:42 ` Evgeniy Polyakov
2006-10-17 13:12 ` Chase Venters
2006-10-17 13:35 ` Evgeniy Polyakov
2006-10-17 10:39 ` Evgeniy Polyakov
2006-10-17 13:19 ` Eric Dumazet
2006-10-17 13:42 ` Evgeniy Polyakov
2006-10-17 13:52 ` Eric Dumazet
2006-10-17 14:07 ` Evgeniy Polyakov
2006-10-17 14:25 ` Eric Dumazet
2006-10-17 15:09 ` Evgeniy Polyakov
2006-10-17 15:32 ` Eric Dumazet
2006-10-17 16:01 ` Evgeniy Polyakov
2006-10-17 16:26 ` Eric Dumazet
2006-10-17 16:35 ` Evgeniy Polyakov
2006-10-17 16:45 ` Eric Dumazet
2006-10-18 4:10 ` Evgeniy Polyakov
2006-10-18 4:45 ` Eric Dumazet
2006-10-17 15:33 ` Hans Henrik Happe
2006-10-05 14:01 ` Hans Henrik Happe
2006-10-05 14:15 ` Evgeniy Polyakov
2006-10-05 15:07 ` Hans Henrik Happe
2006-09-22 19:22 ` [take19 0/4] kevent: Generic event handling mechanism Andrew Morton
2006-09-23 4:23 ` Evgeniy Polyakov
2006-10-04 6:09 ` Ulrich Drepper
2006-10-04 6:10 ` Ulrich Drepper
2006-10-04 6:27 ` Evgeniy Polyakov
2006-10-04 6:24 ` Evgeniy Polyakov
2006-09-26 15:54 ` Christoph Hellwig
2006-09-27 4:46 ` Evgeniy Polyakov
2006-09-27 15:09 ` Evgeniy Polyakov
2006-10-04 4:50 ` Ulrich Drepper
2006-10-04 4:55 ` Evgeniy Polyakov
2006-10-04 7:33 ` Ulrich Drepper
2006-10-04 7:48 ` Evgeniy Polyakov
2006-10-04 17:20 ` Ulrich Drepper
2006-10-05 9:02 ` Evgeniy Polyakov
2006-10-05 14:45 ` Ulrich Drepper
2006-10-06 8:36 ` Evgeniy Polyakov
2006-10-15 22:43 ` Ulrich Drepper
2006-10-16 7:23 ` Evgeniy Polyakov
2006-10-16 9:59 ` Ulrich Drepper
2006-10-16 10:38 ` Evgeniy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200610051245.03880.dada1@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@osdl.org \
--cc=chase.venters@clientec.com \
--cc=davem@davemloft.net \
--cc=drepper@gmail.com \
--cc=drepper@redhat.com \
--cc=hch@infradead.org \
--cc=johann.borck@densedata.com \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.