All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: ego@in.ibm.com
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>,
	linux-kernel@vger.kernel.org, paulmck@us.ibm.com
Subject: Re: New filesystem for Linux
Date: Sat, 04 Nov 2006 19:27:48 +0100	[thread overview]
Message-ID: <454CDBA4.4040503@cosmosbay.com> (raw)
In-Reply-To: <20061104173716.GA618@in.ibm.com>

Gautham R Shenoy a écrit :
> On Thu, Nov 02, 2006 at 10:52:47PM +0100, Mikulas Patocka wrote:
>> Hi
> 
> Hi Mikulas
>> As my PhD thesis, I am designing and writing a filesystem, and it's now in 
>> a state that it can be released. You can download it from 
>> http://artax.karlin.mff.cuni.cz/~mikulas/spadfs/
>>
>> It has some new features, such as keeping inode information directly in 
>> directory (until you create hardlink) so that ls -la doesn't seek much, 
>> new method to keep data consistent in case of crashes (instead of 
>> journaling), free space is organized in lists of free runs and converted 
>> to bitmap only in case of extreme fragmentation.
>>
>> It is not very widely tested, so if you want, test it.
>>
>> I have these questions:
>>
>> * There is a rw semaphore that is locked for read for nearly all 
>> operations and locked for write only rarely. However locking for read 
>> causes cache line pingpong on SMP systems. Do you have an idea how to make 
>> it better?
>>
>> It could be improved by making a semaphore for each CPU and locking for 
>> read only the CPU's semaphore and for write all semaphores. Or is there a 
>> better method?
> 
> I am currently experimenting with a light-weight reader writer semaphore 
> with an objective to do away what you call a reader side cache line
> "ping pong". It achieves this by using a per-cpu refcount.
> 
> A drawback of this approach, as Eric Dumazet mentioned elsewhere in this
> thread, would be that each instance of the rw_semaphore would require
> (NR_CPUS * size_of(int)) bytes worth of memory in order to keep track of
> the per-cpu refcount, which can prove to be pretty costly if this
> rw_semaphore is for something like inode->i_alloc_sem.

We might use an hybrid approach : Use a percpu counter if NR_CPUS <= 8

#define refcount_addr(zone, cpu) zone[cpu]

For larger setups, have a fixed limit of 8 counters, and use a modulo

#define refcount_addr(zone, cpu) zone[cpu & 7]

In order not use too much memory, we could use kind of vmalloc() space, using 
one PAGE per cpu, so that addr(cpu) = base + (cpu)*PAGE_SIZE;
(vmalloc space allows a NUMA allocation if possible)

So instead of storing in an object a table of 8 pointers, we store only the 
address for cpu0.


> 
> So the question I am interested in is, how many *live* instances of this
> rw_semaphore are you expecting to have at any given time?
> If this number is a constant (and/or not very big!), the light-weight
> reader writer semaphore might be useful.
> 
> Regards
> Gautham.


  reply	other threads:[~2006-11-04 18:27 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-02 21:52 New filesystem for Linux Mikulas Patocka
2006-11-02 22:32 ` Gabriel C
2006-11-03  1:22   ` Mikulas Patocka
2006-11-03  1:41     ` Andrew Morton
2006-11-03 17:14       ` Oleg Verych
2006-11-03 17:09         ` Mikulas Patocka
2006-11-03 17:36           ` Oleg Verych
2006-11-03 18:14             ` Mikulas Patocka
2006-11-03 19:08             ` Adrian Bunk
2006-11-03 19:32               ` Oleg Verych
2006-11-03 19:00         ` Alan Cox
2006-11-03 19:14           ` Andi Kleen
2006-11-03  2:09     ` Gabriel C
2006-11-03  8:26       ` Jan Engelhardt
2006-11-03 11:52         ` Mikulas Patocka
2006-11-03 11:59           ` Mikulas Patocka
2006-11-03 12:50             ` Jan Engelhardt
2006-11-03 18:48               ` Mikulas Patocka
2006-11-03 21:51                 ` Jan Engelhardt
2006-11-03 11:47       ` Mikulas Patocka
2006-11-02 22:53 ` Eric Dumazet
2006-11-03  1:28   ` Mikulas Patocka
2006-11-03  1:43     ` Andrew Morton
2006-11-04 18:40   ` Mikulas Patocka
2006-11-04 19:07     ` Eric Dumazet
2006-11-04 19:39     ` Tomasz Torcz
2006-11-05  1:58     ` Alan Cox
2006-11-05  2:09       ` Patrick McFarland
2006-11-05 13:03     ` Maurizio Lombardi
2006-11-05 20:16       ` H. Peter Anvin
2006-11-02 22:54 ` Grzegorz Kulewski
2006-11-02 23:10   ` Eric Dumazet
2006-11-02 23:19   ` Mikulas Patocka
2006-11-02 23:29     ` Grzegorz Kulewski
2006-11-03  1:34       ` Mikulas Patocka
2006-11-03 20:30         ` Christoph Lameter
2006-11-04 18:46           ` Mikulas Patocka
2006-11-05 12:02             ` Theodore Tso
2006-11-03 22:00         ` Oleg Verych
2006-11-03 22:42           ` Mikulas Patocka
2006-11-03  0:57     ` Nigel Cunningham
2006-11-03 13:05     ` Ric Wheeler
2006-11-06  2:42     ` Phillip Susi
2006-11-04 19:59   ` Albert Cahalan
2006-11-04 21:01     ` Jan-Benedict Glaw
2006-11-05 16:37       ` Albert Cahalan
2006-11-04 23:38     ` Mikulas Patocka
2006-11-04 23:46       ` Kyle Moffett
2006-11-05 20:26         ` H. Peter Anvin
2006-11-05 21:27           ` Rene Herman
2006-11-05 21:51             ` H. Peter Anvin
2006-11-06  0:36               ` Rene Herman
2006-11-05 21:49       ` Pavel Machek
2006-11-05  1:57     ` Alan Cox
2006-11-05 11:14       ` James Courtier-Dutton
2006-11-05 11:27         ` Brad Campbell
2006-11-05 12:37           ` Alan Cox
2006-11-06  2:48           ` Phillip Susi
2006-11-05 16:22         ` Albert Cahalan
2006-11-05 17:18       ` Mikulas Patocka
2006-11-05 18:14         ` Alan Cox
2006-11-05 18:18           ` Mikulas Patocka
2006-11-05 19:14             ` Alan Cox
2006-11-02 23:15 ` Linus Torvalds
2006-11-03 20:02   ` Paul E. McKenney
2006-11-02 23:41 ` Andi Kleen
2006-11-03  1:45   ` Mikulas Patocka
2006-11-03 13:47     ` Nikita Danilov
2006-11-03 14:39       ` Mikulas Patocka
2006-11-02 23:59 ` Jörn Engel
2006-11-03  1:19   ` Mikulas Patocka
2006-11-03 10:19     ` Jörn Engel
2006-11-03 11:56       ` Mikulas Patocka
2006-11-03 12:21         ` Jörn Engel
2006-11-03 13:31           ` Mikulas Patocka
2006-11-03 13:48             ` Jörn Engel
2006-11-03 14:19               ` Mikulas Patocka
2006-11-03 14:53                 ` Jörn Engel
2006-11-03 19:01                   ` Mikulas Patocka
2006-11-04 10:46                     ` Jörn Engel
2006-11-04 18:50                       ` Mikulas Patocka
2006-11-06 21:19                         ` Jörn Engel
2006-11-03 19:51             ` Adrian Bunk
2006-11-03 19:00     ` dean gaudet
2006-11-04 10:53       ` Jörn Engel
2006-11-04 11:13         ` dean gaudet
2006-11-04 20:07           ` Jörn Engel
2006-11-04 18:52       ` Mikulas Patocka
2006-11-04 18:56         ` Grzegorz Kulewski
2006-11-04 19:18           ` Mikulas Patocka
2006-11-04 17:37 ` Gautham R Shenoy
2006-11-04 18:27   ` Eric Dumazet [this message]
2006-11-05 22:33     ` Paul E. McKenney
2006-11-05  0:52 ` Linus Torvalds
2006-11-05  4:14   ` Mikulas Patocka
2006-11-05  8:34     ` Willy Tarreau
2006-11-05 11:31       ` Jan Engelhardt
2006-11-05 14:48     ` Bruno Cesar Ribas
  -- strict thread matches above, loose matches on Subject: below --
2006-11-06 17:40 Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=454CDBA4.4040503@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=ego@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=paulmck@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.