From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Moyer Subject: Re: iowait stability regressed from kernel 2.6.22 under heavy load of multi-thread aio writing/reading Date: Tue, 15 Apr 2008 09:37:03 -0400 Message-ID: References: <91b13c310804110217g1d2c3ee4p69fee6fd43f4abd2@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org To: "rae l" Return-path: In-Reply-To: <91b13c310804110217g1d2c3ee4p69fee6fd43f4abd2@mail.gmail.com> (rae l.'s message of "Fri, 11 Apr 2008 17:17:31 +0800") Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org "rae l" writes: > I found a problem with the vanilla kernel from 2.6.22 (include 23, 24): > > the situation is to test a customized linux distribution under heavy IO load: > 1. the client process initiates tens of POSIX threads (using > libpthread), each thread uses aio_write or aio_read(using librt) > operating > on one small file, then close it and write another small file; > the whole objective is to get a maxium throughput of small files, > by generating heavy aio stress on the system; > > I have tested the vanilla kernel 2.6.22/23/24.y, all these kernels > have the common problem: > 1. I use top, vmstat, and iostat to monitor the system performance, I > found that the iowait time of CPU is high at most time, > above 60%, and not stable while the client process running, > although the throughput is stable, the CPU iowait time not stable; > As a result of instability processes felt too long and unacceptable delay. > 2. in seldom testing cases, the system even stopped: with bi/bo are 0 > and iowait 100%, all writing processes are blocking with > uninterruptible state (D state in ps output), these cases are all > system running after several days writing, but cannot guarantee > to reproduce; > > First I suspect the filesystem or the storage medium have problems, I > tried ext2/ext3/reiserfs/xfs, for different filesytems and > SATA/SAS, IDE disks, and several commercial hard RAID card for > different storage medium, but the bad result remained; > > Then I tested 2.6.21.7 kernel, this results a stable iowait CPU time > (below 10%) but not so good throughput of small files; > > Now I think the improvements of IO effciency in the development of > 2.6.22 also caused the instability of iowait time, right? but how > to find out? Simple bisecting seems not work. Why doesn't bisecting work? Can you provide your test code so others can verify your findings? > by the way, a question is how to guarantee a kernel not regress under > heavy multi-thread aio writing load? ltp project seems not give the > answer: Well, providing your test code would be a step closer to achieving this guarantee. If it is scriptable, then there is a chance we could integrate it into the aio-dio regression test suite: http://git.kernel.org/?p=linux/kernel/git/zab/aio-dio-regress.git;a=summary Cheers, Jeff