From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932979Ab0CKAP3 (ORCPT ); Wed, 10 Mar 2010 19:15:29 -0500 Received: from moutng.kundenserver.de ([212.227.126.186]:59112 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932969Ab0CKAP2 (ORCPT ); Wed, 10 Mar 2010 19:15:28 -0500 From: "Hans-Peter Jansen" To: Christoph Hellwig Subject: Re: howto combat highly pathologic latencies on a server? Date: Thu, 11 Mar 2010 01:15:14 +0100 User-Agent: KMail/1.9.10 Cc: linux-kernel@vger.kernel.org References: <201003101817.42812.hpj@urpla.net> <20100310181548.GA25684@infradead.org> In-Reply-To: <20100310181548.GA25684@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201003110115.14555.hpj@urpla.net> X-Provags-ID: V01U2FsdGVkX18ToHUj+BvpFOgnMqMjdGlL2VJERBQRszFiijJ E9fFyzEVddGDLGlI5jCJft8nbDkgn0XuIYZL1kq2/inFRNCibB 1/1JViywB6tzIglU8cz0g== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday 10 March 2010, 19:15:48 Christoph Hellwig wrote: > On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote: > > While this system usually operates fine, it suffers from delays, that > > are displayed in latencytop as: "Writing page to disk: 8425,5 ms": > > ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec > > range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png, > > ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png. > > > > >From other observations, this issue "feels" like it is induced by > > > single > > > > syncronisation points in the block layer, eg. if I create heavy IO load > > on one RAID array, say resizing a VMware disk image, it can take up to > > a minute to log in by ssh, although the ssh login does not touch this > > area at all (different RAID arrays). Note, that the latencytop > > snapshots above are made during normal operation, not this kind of > > load.. > > I had very similar issues on various systems (mostly using xfs, but some > with ext3, too) using kernels before ~ 2.6.30 when using the cfq I/O > scheduler. Switching to noop fixed that for me, or upgrading to a > recent kernel where cfq behaves better again. Christoph, thanks for this valuable suggestion: I've changed it to noop right away, and also: vm.dirty_ratio = 20 vm.dirty_background_ratio = 1 since the defaults of 40 and 10 seem to also not fit my needs. Even the 20 might be still oversized with 8GB total mem. Thanks, Pete