From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: Running a separate fio process for each disk? References: <56464ACC.9030605@kernel.dk> <56465F00.1060504@kernel.dk> <564F798A.8050009@kernel.dk> <564FB905.7060609@kernel.dk> From: Jens Axboe Message-ID: <5654876F.9080803@kernel.dk> Date: Tue, 24 Nov 2015 08:51:11 -0700 MIME-Version: 1.0 In-Reply-To: <564FB905.7060609@kernel.dk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit To: Akash Verma Cc: Caio Villela , Allen Schade , fio List-ID: Did you try current -git yet? I think it should work for both scenarios. It's a silly bug, would be great to have confirmation that it's fixed. Then I'll spin a new release. On 11/20/2015 05:21 PM, Jens Axboe wrote: > And finally, there's a potential fix, if you run commit > 99afcdb53dc3 or later. So please do try that as well, and > see if that behaves any better for you. > > > On 11/20/2015 05:03 PM, Jens Axboe wrote: >> Hi, >> >> OK, I see. Can you pull the latest -git, and then run fio >> --cpuclock-test on one of the boxes where you see the issue? It should >> have commit 5896d827e1e2 or later. >> >> >> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma > > wrote: >> >> Hi Jens, >> The issue is not seen with non-cpu clock sources, or when using a >> single process (with individual threads, the only config I tried). We >> only see the issue when using multiple processes and the cpu clock >> source. >> >> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe > > wrote: >> > On 11/20/2015 12:37 PM, Caio Villela wrote: >> >> >> >> Hello Allen and Jens, >> >> >> >> Sorry for the long output, this is just in case you want the >> details. >> >> Here is a simple explanation for the problem. I want to run a 15 >> minute >> >> random write, using 1 Meg requests, and measure throughput and >> latency. >> >> What seems to be the problem is that if the test system has a >> large >> >> number of drives - the system that I am testing here has 28 >> drives - >> >> then the time accounting seems to go bad for some of the >> processes. >> >> What you see below is that during the 15 minutes from start, all >> disks >> >> are getting hit the same, as they should. Then, after 15 >> minutes, there >> >> are 15 drives that are still running.... after 5 minutes over the >> >> specified 15 minutes, there is still one drive running. Then >> looking at >> >> the amount of IOs sent to each drive, the ones that ran on that >> excess >> >> time have much more IOs. FIO still reports that all drives ran >> for 15 >> >> minutes, although some ran for more than 20 minutes. >> >> >> >> We will attempt to run a single process instead of 28 instances >> of FIO >> >> to see if this goes away. >> > >> > >> > Could you also check if adding clocksource=gettimeofday makes any >> > difference? This sounds very odd. >> > >> > Assuming this was run with fio -git? >> > >> > >> > -- >> > Jens Axboe >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe fio" in >> > the body of a message tomajordomo@vger.kernel.org >> >> > More majordomo info athttp://vger.kernel.org/majordomo-info.html >> >> > > -- Jens Axboe