From: Eric Dumazet <dada1@cosmosbay.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: dipankar@in.ibm.com, Jean Delvare <khali@linux-fr.org>,
Serge Belyshev <belyshev@depni.sinp.msu.ru>,
LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
Manfred Spraul <manfred@colorfullife.com>
Subject: Re: VFS: file-max limit 50044 reached
Date: Mon, 17 Oct 2005 21:12:53 +0200 [thread overview]
Message-ID: <4353F7B5.1040101@cosmosbay.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0510171112040.3369@g5.osdl.org>
Linus Torvalds a écrit :
>
> On Mon, 17 Oct 2005, Eric Dumazet wrote:
>
>><lazy_mode=ON>
>>Do we really need a TIF_RCUUPDATE flag, or could we just ask for a resched ?
>></lazy_mode>
>
>
> Hmm.. Your patch looks very much like one I tried already, but the big
> difference being that I just cleared the count when doing the rcu
> callback. That was because I hadn't realized the importance of the
> maxbatch thing (so it didn't work for me, like it did for you).
>
> Still - the actual RCU callback will only be called at the next timer tick
> or whatever as far as I can tell, so the first time you'll still have a
> _long_ RCU queue (and thus bad latency).
>
> I guess that's inevitable - and TIF_RCUUPDATE wouldn't even help, because
> we still need to wait for the _other_ CPU's to get to their RCU quiescent
> event.
>
> However, that leaves us with the nasty situation that we'll ve very
> inefficient: we'll do "maxbatch" RCU entries, and then return, and then
> force a whole re-schedule. That just can't be good.
>
Thats strange, because on my tests it seems that I dont have one reschedule
for 'maxbatch' items. Doing 'grep filp /proc/slabinfo' it seems I have one
'schedule' then filp count goes back to 1000.
vmstat shows about 150 context switches per second.
(This machines does 1.000.000 pair of open/close in 4.88 seconds)
oprofile data shows verly little schedule overhead :
CPU: P4 / Xeon with 2 hyper-threads, speed 1993.83 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % symbol name
132578 11.3301 path_lookup
104788 8.9551 __d_lookup
85220 7.2829 link_path_walk
63013 5.3851 sysenter_past_esp
53287 4.5539 _atomic_dec_and_lock
45825 3.9162 chrdev_open
43105 3.6837 get_unused_fd
39948 3.4139 kmem_cache_alloc
38308 3.2738 strncpy_from_user
35738 3.0542 rcu_do_batch
31850 2.7219 __link_path_walk
31355 2.6796 get_empty_filp
25941 2.2169 kmem_cache_free
24455 2.0899 __fput
24422 2.0871 sys_close
19814 1.6933 filp_dtor
19616 1.6764 free_block
19000 1.6237 open_namei
18214 1.5566 fput
15991 1.3666 fd_install
14394 1.2301 file_kill
14365 1.2276 call_rcu
14338 1.2253 kref_put
13679 1.1690 file_move
13646 1.1662 schedule
13456 1.1499 getname
13019 1.1126 kref_get
> How about instead of depending on "maxbatch", we'd depend on
> "need_resched()"? Mabe the "maxbatch" be a _minbatch_ thing, and then once
> we've done the minimum amount we _need_ to do (or emptied the RCU queue)
> we start honoring need_resched(), and return early if we do?
>
> That, together with your patch, should work, without causing ludicrous
> "reschedule every ten system calls" behaviour..
>
> Hmm?
>
> Linus
>
>
next prev parent reply other threads:[~2005-10-17 19:13 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-15 13:19 VFS: file-max limit 50044 reached Serge Belyshev
2005-10-15 17:53 ` Serge Belyshev
2005-10-16 16:23 ` Dipankar Sarma
2005-10-16 18:51 ` Serge Belyshev
2005-10-16 18:56 ` Dipankar Sarma
2005-10-17 2:19 ` Linus Torvalds
2005-10-17 4:43 ` Serge Belyshev
2005-10-17 8:32 ` Jean Delvare
2005-10-17 8:46 ` Dipankar Sarma
2005-10-17 9:10 ` Eric Dumazet
2005-10-17 9:14 ` Christoph Hellwig
2005-10-17 9:25 ` Eric Dumazet
2005-10-17 10:32 ` Dipankar Sarma
2005-10-17 12:10 ` [RCU problem] was " Eric Dumazet
2005-10-17 12:31 ` linux-os (Dick Johnson)
2005-10-17 12:36 ` Dipankar Sarma
2005-10-17 13:28 ` Eric Dumazet
2005-10-17 13:33 ` Dipankar Sarma
2005-10-17 14:54 ` Eric Dumazet
2005-10-17 15:42 ` Linus Torvalds
2005-10-17 16:01 ` Eric Dumazet
2005-10-17 16:16 ` Linus Torvalds
2005-10-17 16:29 ` Dipankar Sarma
2005-10-17 18:01 ` Eric Dumazet
2005-10-17 18:31 ` Dipankar Sarma
2005-10-17 19:00 ` Linus Torvalds
2005-10-17 18:37 ` Linus Torvalds
2005-10-17 19:12 ` Eric Dumazet [this message]
2005-10-17 19:30 ` Linus Torvalds
2005-10-17 19:39 ` Eric Dumazet
2005-10-17 20:14 ` Linus Torvalds
2005-10-17 20:25 ` Christopher Friesen
2005-10-17 20:24 ` Dipankar Sarma
2005-10-18 15:55 ` Christopher Friesen
2005-10-17 20:38 ` Linus Torvalds
2005-10-17 20:33 ` Dipankar Sarma
2005-10-17 22:40 ` Linus Torvalds
2005-10-17 22:59 ` Paul E. McKenney
2005-10-18 9:46 ` Eric Dumazet
2005-10-18 16:22 ` Paul E. McKenney
2005-10-17 18:15 ` Dipankar Sarma
2005-10-17 18:40 ` Linus Torvalds
2005-10-17 16:23 ` Dipankar Sarma
2005-10-17 16:31 ` Lee Revell
2005-10-17 16:20 ` Dipankar Sarma
2005-10-17 2:34 ` Linus Torvalds
2005-10-17 3:54 ` Roland Dreier
2005-10-17 11:54 ` Dipankar Sarma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4353F7B5.1040101@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@osdl.org \
--cc=belyshev@depni.sinp.msu.ru \
--cc=dipankar@in.ibm.com \
--cc=khali@linux-fr.org \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.