public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [rfc: patch 0/6] scalable fd management
@ 2005-05-30 10:50 Dipankar Sarma
  2005-05-30 10:52 ` [rfc: patch 1/6] Fix rcu initializers Dipankar Sarma
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Dipankar Sarma @ 2005-05-30 10:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Al Viro, Andrew Morton

This is RFC-only at the moment, but if things look well,
I would like to see this patchset get some testing in -mm.

This patchset probably doesn't set the record for longest
gestation period, but I am still glad that we can use it
to solve a problem that a lot of people care about *now*.
Maneesh and I developed this patch in 2001
where we used somewhat dodgy locking and copious memory
barriers to demonstrate making file descriptor look-up
(then fget()) lock-free using RCU. The main advantage
was with threads, but there were other problems confronting threads
users then and I decided not to push for it. Since threads
performance is now important for a lot of people, it is
time to revisit the issue. Whether we like java or not, it
is a reality, so are threaded apps. Andrew Tridgell has a
test (http://samba.org/ftp/unpacked/junkcode/thread_perf.c)
which shows that on a 4-cpu P4 box, a "readwrite"
syscall test ran twice as fast using processes as threads.

An earlier version of this patchset was published and
discussed a few months ago :
(http://marc.theaimsgroup.com/?t=109144217400003&r=1&w=2)
The consensus there was that it makes sense to make
fget()/fget_light() lock-free in order to avoid the
cache line bouncing on ->files_lock typically when the fd table
is shared. The problem with that version of the patchset
was that it piggybacked on kref and complicated a simple
kref api meant for use with strict safety rules. It required
invasive changes to wherever f_count was being used. It also
used an RCU model from 2001 with explicit memory barriers
which we don't need to use anymore.

Recently, I rewrote the patchset with the following ideas :

1. Instead of using explicit memory barriers to make the fd table
   array and fdset updates appear atomic to lock-free readers,
   I split the fd table in files_struct and put it in a separate
   structure (struct fdtable). Whenever fd array/set expansion
   happens, I allocate a new fdtable, copy the contents and
   atomically update the pointer. This allows me to use
   the recent rcu_assign_pointer() and rcu_dereference() macros.
   Howver this required significant changes in file management
   code in VFS. With this new locking model, all the known issues
   of the past have been taken care of.

2. Greg and I agreed not to loosen the kref apis. Instead I wrote
   a set of rcuref APIs that work on a regular atomic_t counters.
   This is *not* a separate refcounting API set, it is meant for
   use in regular refcounters when needed with RCU. With this,
   f_count users, however wrong they are, are spared. There is
   a separate patchset to clean some of them up, but that does not
   affect this patchset.

3. I added documentation for both the rcuref apis and for the new
   locking model I used for file descriptor table and file
   reference counting.


Testing :
-------
I have been beating up this patchset with multiplce instances of
LTP and a special test I wrote to exercise the vmalloced fdtable
path that uses keventd for freeing. It has survived 24+ hour
tests as well as a 72 hour run with chat benchmark
and fd_vmalloc tests. All this was on a 4(8)-way P4 xeon
system.

No slab leak or vmalloc leak.

I would appreciate if someone tests this on an arch without
cmpxchg (sparc32??). I intend to run some more tests
with preemption enabled and also on ppc64 myself.

Performance results :
-------------------

tiobench on a 4(8)-way (HT) P4 system on ramdisk :

					(lockfree)
Test		2.6.10-vanilla	Stdev 	2.6.10-fd	Stdev
-------------------------------------------------------------
Seqread		1400.8	  	11.52	1465.4		34.27
Randread	1594  		8.86	2397.2		29.21
Seqwrite	242.72      	3.47	238.46		6.53
Randwrite	445.74		9.15	446.4		9.75

With Tridge's thread_perf test on a 4(8)-way (HT) P4 xeon system :

2.6.12-rc5-vanilla :

Running test 'readwrite' with 8 tasks
Threads     0.34 +/- 0.01 seconds
Processes   0.16 +/- 0.00 seconds

2.6.12-rc5-fd :

Running test 'readwrite' with 8 tasks
Threads     0.17 +/- 0.02 seconds
Processes   0.17 +/- 0.02 seconds

So, the lock-free file table patchset gets rid of the overhead
of doing I/O with thread.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-06-01 12:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-30 10:50 [rfc: patch 0/6] scalable fd management Dipankar Sarma
2005-05-30 10:52 ` [rfc: patch 1/6] Fix rcu initializers Dipankar Sarma
2005-05-30 10:55 ` [rfc: patch 2/6] rcuref APIs Dipankar Sarma
2005-05-30 10:57 ` [rfc: patch 3/6] Break up files struct Dipankar Sarma
2005-05-30 10:58 ` [rfc: patch 4/6] files struct with rcu Dipankar Sarma
2005-05-30 10:59 ` [rfc: patch 5/6] lock free fd lookup Dipankar Sarma
2005-05-30 11:00 ` [rfc: patch 6/6] files locking doc Dipankar Sarma
2005-06-01 11:25 ` [rfc: patch 0/6] scalable fd management William Lee Irwin III
2005-06-01 12:50   ` Dipankar Sarma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox