From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764338AbYDOMrq (ORCPT ); Tue, 15 Apr 2008 08:47:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755756AbYDOMrh (ORCPT ); Tue, 15 Apr 2008 08:47:37 -0400 Received: from g1t0029.austin.hp.com ([15.216.28.36]:1893 "EHLO g1t0029.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755806AbYDOMrg (ORCPT ); Tue, 15 Apr 2008 08:47:36 -0400 Message-ID: <4804A3E4.1060605@hp.com> Date: Tue, 15 Apr 2008 08:47:32 -0400 From: "Alan D. Brunelle" User-Agent: Thunderbird 2.0.0.12 (X11/20080227) MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Cc: Jens Axboe Subject: Block IO: more io-cpu-affinity results Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On a 4-way IA64 box we are seeing definite improvements in overall system responsiveness w/ the patch series currently in Jens' io-cpu-affinity branch on his block IO git repository. In this microbenchmark, I peg 4 processes to 4 separate processors: 2 are doing CPU-intensive work (sqrts) and 2 are doing IO-intensive work (4KB direct reads from RAID array cache - thus limiting physical disk accesses). There are 2 variables: whether rq_affinity is on or off for the devices under test for the IO-intensive procs, and whether the IO-intensive procs are pegged onto the same CPU as is handling IRQs for its device. The results are averaged over 4-minute runs per permutation. When the IO-intensive procs are pegged onto the CPU that is handling IRQs for its device, we see no real difference between rq_affinity on or off: rq=0 local=1 66.616 (M sqrt/sec) 12.312 (K ios/sec) rq=1 local=1 66.616 (M sqrt/sec) 12.314 (K ios/sec) Both see 66.616 million sqrts per second, and 12,300 IOs per second. However, when we move the 2 IO-intensive threads onto CPUs that are not handling its IRQs, we see a definite improvement - both in terms of the amount of CPU-intensive work we can do (about 4%), as well as the number of IOs per second achieved (about 1%): rq=0 local=0 61.929 (M sqrt/sec) 11.911 (K ios/sec) rq=1 local=0 64.386 (M sqrt/sec) 12.026 (K ios/sec) Alan