From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261177AbULWIKD (ORCPT ); Thu, 23 Dec 2004 03:10:03 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261176AbULWIKD (ORCPT ); Thu, 23 Dec 2004 03:10:03 -0500 Received: from ns.virtualhost.dk ([195.184.98.160]:14793 "EHLO virtualhost.dk") by vger.kernel.org with ESMTP id S261177AbULWIIU (ORCPT ); Thu, 23 Dec 2004 03:08:20 -0500 Date: Thu, 23 Dec 2004 09:08:07 +0100 From: Jens Axboe To: Marcelo Tosatti Cc: "M. Edward Borasky" , Linux Kernel Mailing List Subject: Re: Negative "ios_in_flight" in the 2.4 kernel Message-ID: <20041223080806.GG12463@suse.de> References: <1103691937.23157.14.camel@DreamGate> <20041222111642.GD12463@suse.de> <1103728782.26340.34.camel@DreamGate> <20041222155816.GF3088@logos.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041222155816.GF3088@logos.cnet> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 22 2004, Marcelo Tosatti wrote: > On Wed, Dec 22, 2004 at 07:19:42AM -0800, M. Edward Borasky wrote: > > On Wed, 2004-12-22 at 12:16 +0100, Jens Axboe wrote: > > > > > > > Question: wouldn't a simple refusal to decrement ios_in_flight in > > > > "down_ios" if it's zero fix this, or am I missing something? > > > > > > That would paper over the real bug, but it will work for you. > > What is the "real bug", then? What will "work for me" is accurate disk > > usage tick counts. The intent of these statistics is something known as > > Operational Analysis of Queueing Networks. > > > > The "requirement" is that the operations on each device be accurately > > counted, and the "wall clock" time spent *waiting* for requests and the > > time spent *servicing* requests be accurately accumulated for each > > device. The sector count is a bonus. > > > > >From these raw counters, one can, and iostat does, compute throughput, > > utilization, average service time, average wait time and average queue > > length. An excellent and highly readable reference for the math involved > > can be found at > > > > http://www.cs.washington.edu/homes/lazowska/qsp/Images/Chap_03.pdf > > > > That is the intent behind these counters, and what will "work for me" is > > a kernel that captures the raw counters correctly. If forcing > > ios_in_flight to be non-negative is done at the expense of losing or > > gaining ticks in the wait or service time accumulators, then it will not > > work for me. > > Well something is deaccounting uncorrectly (doh), probably the disk/partition > accounting logic is doing wrong in some condition, Jens? > > void req_merged_io(struct request *req) > { > struct hd_struct *hd1, *hd2; > > locate_hd_struct(req, &hd1, &hd2); > if (hd1) > down_ios(hd1); > if (hd2) > down_ios(hd2); > } > > void req_finished_io(struct request *req) > { > struct hd_struct *hd1, *hd2; > > locate_hd_struct(req, &hd1, &hd2); > if (hd1) > account_io_end(hd1, req); > if (hd2) > account_io_end(hd2, req); > } > > We could eliminate that possibility if you ran your tests with a single > non-partitioned disk, but thats just a guess. It would be nice to know if this was a vanilla kernel or patched in some way. The only recent bug in this area I remember was a bad merge in the SUSE tree with the io_request_lock scaling patch. (and don't trim the cc list when replying, at least not if you want people to see your message) -- Jens Axboe