From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: Running a separate fio process for each disk? References: <56464ACC.9030605@kernel.dk> <56465F00.1060504@kernel.dk> <564F798A.8050009@kernel.dk> From: Jens Axboe Message-ID: <564FB905.7060609@kernel.dk> Date: Fri, 20 Nov 2015 17:21:25 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit To: Akash Verma Cc: Caio Villela , Allen Schade , fio List-ID: And finally, there's a potential fix, if you run commit 99afcdb53dc3 or later. So please do try that as well, and see if that behaves any better for you. On 11/20/2015 05:03 PM, Jens Axboe wrote: > Hi, > > OK, I see. Can you pull the latest -git, and then run fio > --cpuclock-test on one of the boxes where you see the issue? It should > have commit 5896d827e1e2 or later. > > > On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma > wrote: > > Hi Jens, > The issue is not seen with non-cpu clock sources, or when using a > single process (with individual threads, the only config I tried). We > only see the issue when using multiple processes and the cpu clock > source. > > On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe > wrote: > > On 11/20/2015 12:37 PM, Caio Villela wrote: > >> > >> Hello Allen and Jens, > >> > >> Sorry for the long output, this is just in case you want the > details. > >> Here is a simple explanation for the problem. I want to run a 15 > minute > >> random write, using 1 Meg requests, and measure throughput and > latency. > >> What seems to be the problem is that if the test system has a large > >> number of drives - the system that I am testing here has 28 drives - > >> then the time accounting seems to go bad for some of the processes. > >> What you see below is that during the 15 minutes from start, all > disks > >> are getting hit the same, as they should. Then, after 15 > minutes, there > >> are 15 drives that are still running.... after 5 minutes over the > >> specified 15 minutes, there is still one drive running. Then > looking at > >> the amount of IOs sent to each drive, the ones that ran on that > excess > >> time have much more IOs. FIO still reports that all drives ran > for 15 > >> minutes, although some ran for more than 20 minutes. > >> > >> We will attempt to run a single process instead of 28 instances > of FIO > >> to see if this goes away. > > > > > > Could you also check if adding clocksource=gettimeofday makes any > > difference? This sounds very odd. > > > > Assuming this was run with fio -git? > > > > > > -- > > Jens Axboe > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fio" in > > the body of a message tomajordomo@vger.kernel.org > > More majordomo info athttp://vger.kernel.org/majordomo-info.html > > -- Jens Axboe