linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Peter Staubach <staubach@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Theodore Tso <tytso@mit.edu>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	Myklebust Trond <Trond.Myklebust@netapp.com>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
	Jan Kara <jack@suse.cz>, Nick Piggin <npiggin@suse.de>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 00/45] some writeback experiments
Date: Thu, 8 Oct 2009 13:44:21 +0800	[thread overview]
Message-ID: <20091008054421.GA20128@localhost> (raw)
In-Reply-To: <20091008053335.GA19458@localhost>

On Thu, Oct 08, 2009 at 01:33:35PM +0800, Wu Fengguang wrote:
> On Wed, Oct 07, 2009 at 11:18:22PM +0800, Wu Fengguang wrote:
> > On Wed, Oct 07, 2009 at 09:47:14PM +0800, Peter Staubach wrote:
> > > 
> > > > # vmmon -d 1 nr_writeback nr_dirty nr_unstable      # (per 1-second samples)
> > > >      nr_writeback         nr_dirty      nr_unstable
> > > >             11227            41463            38044
> > > >             11227            41463            38044
> > > >             11227            41463            38044
> > > >             11227            41463            38044
> > 
> > I guess in the above 4 seconds, either client or (more likely) server
> > is blocked. A blocked server cannot send ACKs to knock down both
> 
> Yeah the server side is blocked.  The nfsd are mostly blocked in
> generic_file_aio_write(), in particular, the i_mutex lock! I'm copying
> one or two big files over NFS, so the i_mutex lock is heavily contented.
> 
> I'm using the default wsize=4096 for NFS-root..

Just switched to 512k wsize, and things improved: in most time the 8
nfsd are not all blocked. However, the bumpiness still remains:

     nr_writeback         nr_dirty      nr_unstable
            11105            58080            15042
            11105            58080            15042
            11233            54583            18626
            11101            51964            22036
            11105            51978            22065
            11233            52362            22577
            10985            58538            13500
            11233            53748            19721
            11047            51999            21778
            11105            50262            23572
            11105            50262            20441
            10985            52772            20721
            10977            52109            21516
            11105            48296            26629
            11105            48296            26629
            10985            52191            21042
            11166            51456            22296
            10980            50681            24466
            11233            45352            30488
            11233            45352            30488
            11105            45475            30616
            11131            45313            20355
            11233            51126            22637
            11233            51126            22637


wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
  329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
 4690  4690 TS       -  -5  24   1  0.1 S<   svc_recv                 nfsd
 4691  4691 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4692  4692 TS       -  -5  24   0  0.1 R<   ?                        nfsd
 4693  4693 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4694  4694 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4695  4695 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4696  4696 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4697  4697 TS       -  -5  24   0  0.1 R<   ?                        nfsd
wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
  329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
 4690  4690 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
 4691  4691 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4692  4692 TS       -  -5  24   1  0.1 D<   log_wait_commit          nfsd
 4693  4693 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
 4694  4694 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4695  4695 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
 4696  4696 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
 4697  4697 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd
wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
  329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
 4690  4690 TS       -  -5  24   1  0.1 S<   svc_recv                 nfsd
 4691  4691 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4692  4692 TS       -  -5  24   1  0.1 R<   ?                        nfsd
 4693  4693 TS       -  -5  24   1  0.1 R<   ?                        nfsd
 4694  4694 TS       -  -5  24   1  0.1 R<   ?                        nfsd
 4695  4695 TS       -  -5  24   1  0.1 S<   svc_recv                 nfsd
 4696  4696 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4697  4697 TS       -  -5  24   1  0.1 S<   svc_recv                 nfsd
wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
  329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
 4690  4690 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd
 4691  4691 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4692  4692 TS       -  -5  24   1  0.1 D<   nfsd_sync                nfsd
 4693  4693 TS       -  -5  24   1  0.1 D<   sync_buffer              nfsd
 4694  4694 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd
 4695  4695 TS       -  -5  24   1  0.1 S<   svc_recv                 nfsd
 4696  4696 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
 4697  4697 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd

Thanks,
Fengguang

> wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
>   329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
>  4690  4690 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4691  4691 TS       -  -5  24   0  0.0 D<   generic_file_aio_write   nfsd
>  4692  4692 TS       -  -5  24   0  0.0 D<   generic_file_aio_write   nfsd
>  4693  4693 TS       -  -5  24   0  0.0 D<   generic_file_aio_write   nfsd
>  4694  4694 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4695  4695 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd
>  4696  4696 TS       -  -5  24   1  0.0 D<   log_wait_commit          nfsd
>  4697  4697 TS       -  -5  24   0  0.0 D<   generic_file_aio_write   nfsd
> wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
>   329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
>  4690  4690 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4691  4691 TS       -  -5  24   0  0.0 D<   generic_file_aio_write   nfsd
>  4692  4692 TS       -  -5  24   0  0.0 D<   generic_file_aio_write   nfsd
>  4693  4693 TS       -  -5  24   0  0.0 D<   sync_buffer              nfsd
>  4694  4694 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4695  4695 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd
>  4696  4696 TS       -  -5  24   1  0.0 D<   generic_file_aio_write   nfsd
>  4697  4697 TS       -  -5  24   0  0.0 D<   generic_file_aio_write   nfsd
> 
> wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
>   329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
>  4690  4690 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4691  4691 TS       -  -5  24   0  0.1 D<   get_request_wait         nfsd
>  4692  4692 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4693  4693 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
>  4694  4694 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4695  4695 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4696  4696 TS       -  -5  24   0  0.1 S<   svc_recv                 nfsd
>  4697  4697 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd
> 
> wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
>   329   329 TS       -  -5  24   1  0.0 S<   worker_thread            nfsiod
>  4690  4690 TS       -  -5  24   1  0.1 D<   get_write_access         nfsd
>  4691  4691 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4692  4692 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4693  4693 TS       -  -5  24   1  0.1 D<   generic_file_aio_write   nfsd
>  4694  4694 TS       -  -5  24   1  0.1 D<   get_write_access         nfsd
>  4695  4695 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4696  4696 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
>  4697  4697 TS       -  -5  24   0  0.1 D<   generic_file_aio_write   nfsd
> 
> Thanks,
> Fengguang
> 
> > nr_writeback/nr_unstable. And the stuck nr_writeback will freeze
> > nr_dirty as well, because the dirtying process is throttled until
> > it receives enough "PG_writeback cleared" event, however the bdi-flush
> > thread is also blocked when trying to clear more PG_writeback, because
> > the client side nr_writeback limit has been reached. In summary,
> > 
> > server blocked => nr_writeback stuck => nr_writeback limit reached
> > => bdi-flush blocked => no end_page_writeback() => dirtier blocked
> > => nr_dirty stuck
> > 
> > Thanks,
> > Fengguang
> > 
> > > >             11045            53987             6490
> > > >             11033            53120             8145
> > > >             11195            52143            10886
> > > >             11211            52144            10913
> > > >             11211            52144            10913
> > > >             11211            52144            10913
> > > > 
> > > > btrfs seems to maintain a private pool of writeback pages, which can go out of
> > > > control:
> > > > 
> > > >      nr_writeback         nr_dirty
> > > >            261075              132
> > > >            252891              195
> > > >            244795              187
> > > >            236851              187
> > > >            228830              187
> > > >            221040              218
> > > >            212674              237
> > > >            204981              237
> > > > 
> > > > XFS has very interesting "bumpy writeback" behavior: it tends to wait
> > > > collect enough pages and then write the whole world.
> > > > 
> > > >      nr_writeback         nr_dirty
> > > >             80781                0
> > > >             37117            37703
> > > >             37117            43933
> > > >             81044                6
> > > >             81050                0
> > > >             43943            10199
> > > >             43930            36355
> > > >             43930            36355
> > > >             80293                0
> > > >             80285                0
> > > >             80285                0
> > > > 
> > > > Thanks,
> > > > Fengguang
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2009-10-08  5:45 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-07  7:38 [PATCH 00/45] some writeback experiments Wu Fengguang
2009-10-07  7:38 ` [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty_pages() Wu Fengguang
2009-10-09 15:12   ` Jan Kara
2009-10-09 15:18     ` Peter Zijlstra
2009-10-09 15:47       ` Jan Kara
2009-10-11  2:28         ` Wu Fengguang
2009-10-11  7:44           ` Peter Zijlstra
2009-10-11 10:50             ` Wu Fengguang
2009-10-11 10:58               ` Peter Zijlstra
2009-10-11 11:25               ` Peter Zijlstra
2009-10-12  1:26                 ` Wu Fengguang
2009-10-12  9:07                   ` Peter Zijlstra
2009-10-12  9:24                     ` Wu Fengguang
2009-10-10 21:33     ` Wu Fengguang
2009-10-12 21:18       ` Jan Kara
2009-10-13  3:24         ` Wu Fengguang
2009-10-13  8:41           ` Peter Zijlstra
2009-10-13 18:12           ` Jan Kara
2009-10-13 18:28             ` Peter Zijlstra
2009-10-14  1:38               ` Wu Fengguang
2009-10-14 11:22                 ` Peter Zijlstra
2009-10-17  5:30                   ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 02/45] writeback: reduce calculation of bdi dirty thresholds Wu Fengguang
2009-10-07  7:38 ` [PATCH 03/45] ext4: remove unused parameter wbc from __ext4_journalled_writepage() Wu Fengguang
2009-10-07  7:38 ` [PATCH 04/45] writeback: remove unused nonblocking and congestion checks Wu Fengguang
2009-10-09 15:26   ` Jan Kara
2009-10-10 13:47     ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 05/45] writeback: remove the always false bdi_cap_writeback_dirty() test Wu Fengguang
2009-10-07  7:38 ` [PATCH 06/45] writeback: use larger ratelimit when dirty_exceeded Wu Fengguang
2009-10-07  8:53   ` Peter Zijlstra
2009-10-07  9:17     ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 07/45] writeback: dont redirty tail an inode with dirty pages Wu Fengguang
2009-10-09 15:45   ` Jan Kara
2009-10-07  7:38 ` [PATCH 08/45] writeback: quit on wrap for .range_cyclic (write_cache_pages) Wu Fengguang
2009-10-07  7:38 ` [PATCH 09/45] writeback: quit on wrap for .range_cyclic (pohmelfs) Wu Fengguang
2009-10-07 12:32   ` Evgeniy Polyakov
2009-10-07 14:23     ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 10/45] writeback: quit on wrap for .range_cyclic (btrfs) Wu Fengguang
2009-10-07  7:38 ` [PATCH 11/45] writeback: quit on wrap for .range_cyclic (cifs) Wu Fengguang
2009-10-07  7:38 ` [PATCH 12/45] writeback: quit on wrap for .range_cyclic (ext4) Wu Fengguang
2009-10-07  7:38 ` [PATCH 13/45] writeback: quit on wrap for .range_cyclic (gfs2) Wu Fengguang
2009-10-07  7:38 ` [PATCH 14/45] writeback: quit on wrap for .range_cyclic (afs) Wu Fengguang
2009-10-07  7:38 ` [PATCH 15/45] writeback: fix queue_io() ordering Wu Fengguang
2009-10-07  7:38 ` [PATCH 16/45] writeback: merge for_kupdate and !for_kupdate cases Wu Fengguang
2009-10-07  7:38 ` [PATCH 17/45] writeback: only allow two background writeback works Wu Fengguang
2009-10-07  7:38 ` [PATCH 18/45] writeback: introduce wait queue for balance_dirty_pages() Wu Fengguang
2009-10-08  1:01   ` KAMEZAWA Hiroyuki
2009-10-08  1:58     ` Wu Fengguang
2009-10-08  2:40       ` KAMEZAWA Hiroyuki
2009-10-08  4:01         ` Wu Fengguang
2009-10-08  5:59           ` KAMEZAWA Hiroyuki
2009-10-08  6:07             ` Wu Fengguang
2009-10-08  6:28             ` Wu Fengguang
2009-10-08  6:39               ` KAMEZAWA Hiroyuki
2009-10-08  8:08       ` Peter Zijlstra
2009-10-08  8:11         ` KAMEZAWA Hiroyuki
2009-10-08  8:36         ` Jens Axboe
2009-10-09  2:52           ` [PATCH] writeback: account IO throttling wait as iowait Wu Fengguang
2009-10-09 10:41             ` Jens Axboe
2009-10-09 10:58               ` Wu Fengguang
2009-10-09 11:01                 ` Jens Axboe
2009-10-08  8:05     ` [PATCH 18/45] writeback: introduce wait queue for balance_dirty_pages() Peter Zijlstra
2009-10-07  7:38 ` [PATCH 19/45] writeback: remove the loop in balance_dirty_pages() Wu Fengguang
2009-10-07  7:38 ` [PATCH 20/45] NFS: introduce writeback wait queue Wu Fengguang
2009-10-07  8:53   ` Peter Zijlstra
2009-10-07  9:07     ` Wu Fengguang
2009-10-07  9:15       ` Peter Zijlstra
2009-10-07  9:19         ` Wu Fengguang
2009-10-07  9:17       ` Nick Piggin
2009-10-07  9:52         ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 21/45] writeback: estimate bdi write bandwidth Wu Fengguang
2009-10-07  8:53   ` Peter Zijlstra
2009-10-07  9:39     ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 22/45] writeback: show bdi write bandwidth in debugfs Wu Fengguang
2009-10-07  7:38 ` [PATCH 23/45] writeback: kill space in debugfs item name Wu Fengguang
2009-10-07  7:38 ` [PATCH 24/45] writeback: remove global nr_to_write and use timeout instead Wu Fengguang
2009-10-07  7:38 ` [PATCH 25/45] writeback: convert wbc.nr_to_write to per-file parameter Wu Fengguang
2009-10-07  7:38 ` [PATCH 26/45] block: pass the non-rotational queue flag to backing_dev_info Wu Fengguang
2009-10-07  7:38 ` [PATCH 27/45] writeback: introduce wbc.for_background Wu Fengguang
2009-10-07  7:38 ` [PATCH 28/45] writeback: introduce wbc.nr_segments Wu Fengguang
2009-10-07  7:38 ` [PATCH 29/45] writeback: fix the shmem AOP_WRITEPAGE_ACTIVATE case Wu Fengguang
2009-10-07 11:57   ` Hugh Dickins
2009-10-07 14:00     ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 30/45] vmscan: lumpy pageout Wu Fengguang
2009-10-07  7:38 ` [PATCH 31/45] writeback: sync old inodes first in background writeback Wu Fengguang
2010-07-12  3:01   ` Christoph Hellwig
2010-07-12 15:24     ` Wu Fengguang
2009-10-07  7:38 ` [PATCH 32/45] writeback: update kupdate expire timestamp on each scan of b_io Wu Fengguang
2009-10-07  7:38 ` [PATCH 34/45] writeback: sync livelock - kick background writeback Wu Fengguang
2009-10-07  7:38 ` [PATCH 35/45] writeback: sync livelock - use single timestamp for whole sync work Wu Fengguang
2009-10-07  7:38 ` [PATCH 36/45] writeback: sync livelock - curb dirty speed for inodes to be synced Wu Fengguang
2009-10-07  7:38 ` [PATCH 37/45] writeback: use timestamp to indicate dirty exceeded Wu Fengguang
2009-10-07  7:38 ` [PATCH 38/45] writeback: introduce queue b_more_io_wait Wu Fengguang
2009-10-07  7:38 ` [PATCH 39/45] writeback: remove wbc.more_io Wu Fengguang
2009-10-07  7:38 ` [PATCH 40/45] writeback: requeue_io_wait() on I_SYNC locked inode Wu Fengguang
2009-10-07  7:38 ` [PATCH 41/45] writeback: requeue_io_wait() on pages_skipped inode Wu Fengguang
2009-10-07  7:39 ` [PATCH 42/45] writeback: requeue_io_wait() on blocked inode Wu Fengguang
2009-10-07  7:39 ` [PATCH 43/45] writeback: requeue_io_wait() on fs redirtied inode Wu Fengguang
2009-10-07  7:39 ` [PATCH 44/45] NFS: remove NFS_INO_FLUSHING lock Wu Fengguang
2009-10-07 13:11   ` Peter Staubach
2009-10-07 13:32     ` Wu Fengguang
2009-10-07 13:59       ` Peter Staubach
2009-10-08  1:44         ` Wu Fengguang
2009-10-07  7:39 ` [PATCH 45/45] btrfs: fix race on syncing the btree inode Wu Fengguang
2009-10-07  8:53 ` [PATCH 00/45] some writeback experiments Peter Zijlstra
2009-10-07 10:17 ` [PATCH 14/45] writeback: quit on wrap for .range_cyclic (afs) David Howells
2009-10-07 10:21   ` Nick Piggin
2009-10-07 10:47     ` Wu Fengguang
2009-10-07 11:23       ` Nick Piggin
2009-10-07 12:21         ` Wu Fengguang
2009-10-07 13:47 ` [PATCH 00/45] some writeback experiments Peter Staubach
2009-10-07 15:18   ` Wu Fengguang
2009-10-08  5:33     ` Wu Fengguang
2009-10-08  5:44       ` Wu Fengguang [this message]
2009-10-07 14:26 ` Theodore Tso
2009-10-07 14:45   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091008054421.GA20128@localhost \
    --to=fengguang.wu@intel.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=shaohua.li@intel.com \
    --cc=staubach@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).