From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932665Ab0CJSPv (ORCPT ); Wed, 10 Mar 2010 13:15:51 -0500 Received: from bombadil.infradead.org ([18.85.46.34]:32851 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754727Ab0CJSPt (ORCPT ); Wed, 10 Mar 2010 13:15:49 -0500 Date: Wed, 10 Mar 2010 13:15:48 -0500 From: Christoph Hellwig To: Hans-Peter Jansen Cc: linux-kernel@vger.kernel.org Subject: Re: howto combat highly pathologic latencies on a server? Message-ID: <20100310181548.GA25684@infradead.org> References: <201003101817.42812.hpj@urpla.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201003101817.42812.hpj@urpla.net> User-Agent: Mutt/1.5.19 (2009-01-05) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote: > While this system usually operates fine, it suffers from delays, that are > displayed in latencytop as: "Writing page to disk: 8425,5 ms": > ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec > range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png, > ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png. > > >From other observations, this issue "feels" like it is induced by single > syncronisation points in the block layer, eg. if I create heavy IO load on > one RAID array, say resizing a VMware disk image, it can take up to a > minute to log in by ssh, although the ssh login does not touch this area at > all (different RAID arrays). Note, that the latencytop snapshots above are > made during normal operation, not this kind of load.. I had very similar issues on various systems (mostly using xfs, but some with ext3, too) using kernels before ~ 2.6.30 when using the cfq I/O scheduler. Switching to noop fixed that for me, or upgrading to a recent kernel where cfq behaves better again.