Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	hch@infradead.org, peterz@infradead.org
Subject: Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread
Date: Tue, 22 Jun 2010 21:52:34 +0800	[thread overview]
Message-ID: <20100622135234.GA11561@localhost> (raw)
In-Reply-To: <20100622131745.GB3338@quack.suse.cz>

>   On the other hand I think we will have to come up with something
> more clever than what I do now because for some huge machines with
> nr_cpu_ids == 256, the error of the counter is 256*9*8 = 18432 so that's
> already unacceptable given the amounts we want to check (like 1536) -
> already for nr_cpu_ids == 32, the error is the same as the difference we
> want to check.  I think we'll have to come up with some scheme whose error
> is not dependent on the number of cpus or if it is dependent, it's only a
> weak dependency (like a logarithm or so).
>   Or we could rely on the fact that IO completions for a bdi won't happen on
> all CPUs and thus the error would be much more bounded. But I'm not sure
> how much that is true or not.

Yes the per CPU counter seems tricky. How about plain atomic operations? 

This test shows that atomic_dec_and_test() is about 4.5 times slower
than plain i-- in a 4-core CPU. Not bad.

Note that
1) we can avoid the atomic operations when there are no active waiters
2) most writeback will be submitted by one per-bdi-flusher, so no worry
   of cache bouncing (this also means the per CPU counter error is
   normally bounded by the batch size)
3) the cost of atomic inc/dec will be weakly related to core numbers
   but never socket numbers (based on 2), so won't scale too bad

Thanks,
Fengguang
---
$ perf stat ./atomic

 Performance counter stats for './atomic':

         903.875304  task-clock-msecs         #      0.998 CPUs 
                 76  context-switches         #      0.000 M/sec
                  0  CPU-migrations           #      0.000 M/sec
                 98  page-faults              #      0.000 M/sec
         3011186459  cycles                   #   3331.418 M/sec
         1608926490  instructions             #      0.534 IPC  
          301481656  branches                 #    333.543 M/sec
              94932  branch-misses            #      0.031 %    
              88687  cache-references         #      0.098 M/sec
               1286  cache-misses             #      0.001 M/sec

        0.905576197  seconds time elapsed

$ perf stat ./non-atomic

 Performance counter stats for './non-atomic':

         215.315814  task-clock-msecs         #      0.996 CPUs 
                 18  context-switches         #      0.000 M/sec
                  0  CPU-migrations           #      0.000 M/sec
                 99  page-faults              #      0.000 M/sec
          704358635  cycles                   #   3271.281 M/sec
          303445790  instructions             #      0.431 IPC  
          100574889  branches                 #    467.104 M/sec
              39323  branch-misses            #      0.039 %    
              36064  cache-references         #      0.167 M/sec
                850  cache-misses             #      0.004 M/sec

        0.216175521  seconds time elapsed


--------------------------------------------------------------------------------
$ cat atomic.c 
#include <stdio.h> 

typedef struct {
        int counter;
} atomic_t;

static inline int atomic_dec_and_test(atomic_t *v)
{      
        unsigned char c;

        asm volatile("lock; decl %0; sete %1"
                     : "+m" (v->counter), "=qm" (c)
                     : : "memory");
        return c != 0;
}

int main(void)
{ 
        atomic_t i;

        i.counter = 100000000;

        for (; !atomic_dec_and_test(&i);)
                ;

        return 0;
}

--------------------------------------------------------------------------------
$ cat non-atomic.c 
#include <stdio.h> 

int main(void)
{ 
        int i;

        for (i = 100000000; i; i--)
                ;

        return 0;
}

WARNING: multiple messages have this Message-ID (diff)

From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	hch@infradead.org, peterz@infradead.org
Subject: Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread
Date: Tue, 22 Jun 2010 21:52:34 +0800	[thread overview]
Message-ID: <20100622135234.GA11561@localhost> (raw)
In-Reply-To: <20100622131745.GB3338@quack.suse.cz>

>   On the other hand I think we will have to come up with something
> more clever than what I do now because for some huge machines with
> nr_cpu_ids == 256, the error of the counter is 256*9*8 = 18432 so that's
> already unacceptable given the amounts we want to check (like 1536) -
> already for nr_cpu_ids == 32, the error is the same as the difference we
> want to check.  I think we'll have to come up with some scheme whose error
> is not dependent on the number of cpus or if it is dependent, it's only a
> weak dependency (like a logarithm or so).
>   Or we could rely on the fact that IO completions for a bdi won't happen on
> all CPUs and thus the error would be much more bounded. But I'm not sure
> how much that is true or not.

Yes the per CPU counter seems tricky. How about plain atomic operations? 

This test shows that atomic_dec_and_test() is about 4.5 times slower
than plain i-- in a 4-core CPU. Not bad.

Note that
1) we can avoid the atomic operations when there are no active waiters
2) most writeback will be submitted by one per-bdi-flusher, so no worry
   of cache bouncing (this also means the per CPU counter error is
   normally bounded by the batch size)
3) the cost of atomic inc/dec will be weakly related to core numbers
   but never socket numbers (based on 2), so won't scale too bad

Thanks,
Fengguang
---
$ perf stat ./atomic

 Performance counter stats for './atomic':

         903.875304  task-clock-msecs         #      0.998 CPUs 
                 76  context-switches         #      0.000 M/sec
                  0  CPU-migrations           #      0.000 M/sec
                 98  page-faults              #      0.000 M/sec
         3011186459  cycles                   #   3331.418 M/sec
         1608926490  instructions             #      0.534 IPC  
          301481656  branches                 #    333.543 M/sec
              94932  branch-misses            #      0.031 %    
              88687  cache-references         #      0.098 M/sec
               1286  cache-misses             #      0.001 M/sec

        0.905576197  seconds time elapsed

$ perf stat ./non-atomic

 Performance counter stats for './non-atomic':

         215.315814  task-clock-msecs         #      0.996 CPUs 
                 18  context-switches         #      0.000 M/sec
                  0  CPU-migrations           #      0.000 M/sec
                 99  page-faults              #      0.000 M/sec
          704358635  cycles                   #   3271.281 M/sec
          303445790  instructions             #      0.431 IPC  
          100574889  branches                 #    467.104 M/sec
              39323  branch-misses            #      0.039 %    
              36064  cache-references         #      0.167 M/sec
                850  cache-misses             #      0.004 M/sec

        0.216175521  seconds time elapsed


--------------------------------------------------------------------------------
$ cat atomic.c 
#include <stdio.h> 

typedef struct {
        int counter;
} atomic_t;

static inline int atomic_dec_and_test(atomic_t *v)
{      
        unsigned char c;

        asm volatile("lock; decl %0; sete %1"
                     : "+m" (v->counter), "=qm" (c)
                     : : "memory");
        return c != 0;
}

int main(void)
{ 
        atomic_t i;

        i.counter = 100000000;

        for (; !atomic_dec_and_test(&i);)
                ;

        return 0;
}

--------------------------------------------------------------------------------
$ cat non-atomic.c 
#include <stdio.h> 

int main(void)
{ 
        int i;

        for (i = 100000000; i; i--)
                ;

        return 0;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-06-22 13:52 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-17 18:04 [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread Jan Kara
2010-06-17 18:04 ` Jan Kara
2010-06-18  6:09 ` Dave Chinner
2010-06-18  9:11   ` Peter Zijlstra
2010-06-18 23:29     ` Dave Chinner
2010-06-21 23:36   ` Jan Kara
2010-06-22  5:44     ` Dave Chinner
2010-06-22  6:14       ` Andrew Morton
2010-06-22  7:45         ` Peter Zijlstra
2010-06-22  8:24           ` Andrew Morton
2010-06-22  8:52             ` Peter Zijlstra
2010-06-22 10:09         ` Dave Chinner
2010-06-22 13:17           ` Jan Kara
2010-06-22 13:17             ` Jan Kara
2010-06-22 13:52             ` Wu Fengguang [this message]
2010-06-22 13:52               ` Wu Fengguang
2010-06-22 13:59               ` Peter Zijlstra
2010-06-22 13:59                 ` Peter Zijlstra
2010-06-22 14:00               ` Peter Zijlstra
2010-06-22 14:36                 ` Wu Fengguang
2010-06-22 14:02               ` Jan Kara
2010-06-22 14:02                 ` Jan Kara
2010-06-22 14:24                 ` Wu Fengguang
2010-06-22 14:24                   ` Wu Fengguang
2010-06-22 22:29                 ` Dave Chinner
2010-06-23 13:15                   ` Jan Kara
2010-06-23 13:15                     ` Jan Kara
2010-06-23 23:06                     ` Dave Chinner
2010-06-22 14:31               ` Christoph Hellwig
2010-06-22 14:31                 ` Christoph Hellwig
2010-06-22 14:38                 ` Jan Kara
2010-06-22 14:38                   ` Jan Kara
2010-06-22 22:45                   ` Dave Chinner
2010-06-23  1:34                     ` Wu Fengguang
2010-06-23  1:34                       ` Wu Fengguang
2010-06-23  3:06                       ` Dave Chinner
2010-06-23  3:22                         ` Wu Fengguang
2010-06-23  3:22                           ` Wu Fengguang
2010-06-23  6:03                           ` Dave Chinner
2010-06-23  6:03                             ` Dave Chinner
2010-06-23  6:25                             ` Wu Fengguang
2010-06-23  6:25                               ` Wu Fengguang
2010-06-23 23:42                               ` Dave Chinner
2010-06-23 23:42                                 ` Dave Chinner
2010-06-22 14:41                 ` Wu Fengguang
2010-06-22 11:19       ` Jan Kara
2010-06-22 11:19         ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:31   ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 14:02   ` Jan Kara
2010-06-21 14:02     ` Jan Kara
2010-06-21 14:10     ` Jan Kara
2010-06-21 14:10       ` Jan Kara
2010-06-21 14:12       ` Peter Zijlstra
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:42   ` Jan Kara
2010-06-21 13:42     ` Jan Kara
2010-06-22  4:07     ` Wu Fengguang
2010-06-22  4:07       ` Wu Fengguang
2010-06-22 13:27       ` Jan Kara
2010-06-22 13:27         ` Jan Kara
2010-06-22 13:33         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100622135234.GA11561@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.