From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754465AbYCMMNc (ORCPT ); Thu, 13 Mar 2008 08:13:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752519AbYCMMNZ (ORCPT ); Thu, 13 Mar 2008 08:13:25 -0400 Received: from brick.kernel.dk ([87.55.233.238]:7087 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751798AbYCMMNZ (ORCPT ); Thu, 13 Mar 2008 08:13:25 -0400 Date: Thu, 13 Mar 2008 13:13:21 +0100 From: Jens Axboe To: Max Krasnyanskiy Cc: linux-kernel@vger.kernel.org, npiggin@suse.de, dgc@sgi.com Subject: Re: [PATCH 0/7] IO CPU affinity testing series Message-ID: <20080313121321.GJ17940@kernel.dk> References: <1205322940-20127-1-git-send-email-jens.axboe@oracle.com> <47D83EF5.6070302@qualcomm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47D83EF5.6070302@qualcomm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 12 2008, Max Krasnyanskiy wrote: > Jens Axboe wrote: > >Hi, > > > >Here's a new round of patches to play with io cpu affinity. It can, > >as always, also be found in the block git repo. The branch name is > >'io-cpu-affinity'. > > > >The major change since last post is the abandonment of the kthread > >approach. It was definitely slower then may 'add IPI to signal remote > >block softirq' hack. So I decided to base this on the scalable > >smp_call_function_single() that Nick posted. I tweaked it a bit to > >make it more suitable for my use and also faster. > > > >As for functionality, the only change is that I added a bio hint > >that the submitter can use to ask for completion on the same CPU > >that submitted the IO. Pass in BIO_CPU_AFFINE for that to occur. > > > >Otherwise the modes are the same as last time: > > > >- You can set a specific cpumask for queuing IO, and the block layer > > will move submitters to one of those CPUs. > >- You can set a specific cpumask for completion of IO, in which case > > the block layer will move the completion to one of those CPUs. > >- You can set rq_affinity mode, in which case IOs will always be > > completed on the CPU that submitted them. > > > >Look in /sys/block//queue/ for the three sysfs variables that > >modify this behaviour. > > > >I'd be interested in getting some testing done on this, to see if > >it really helps the larger end of the scale. Dave, I know you > >have a lot of experience in this area and would appreciate your > >input and/or testing. I'm not sure if any of the above modes will > >allow you to do what you need for eg XFS - if you want all meta data > >IO completed on one (or a set of) CPU(s), then I can add a mode > >that will allow you to play with that. Or if something else, give me > >some input and we can take it from there! > > Very cool stuff. I think I can use it for cpu isolation purposes. > ie Isolating a cpu from the io activity. > > You may have noticed that I started a bunch of discussion on CPU isolation. > One thing that came out of that is the suggestion to use cpusets for > managing this affinity masks. We're still discussing the details, the > general idea is to provide extra flags in the cpusets that enable/disable > various activities > on the cpus that belong to the set. > > For example in this particular case we'd have something like "cpusets.io" > flag that would indicate whether cpus in the set are allowed to to the IO > or not. > In other words: > /dev/cpuset/io (cpus=0,1,2; io=1) > /dev/cpuset/no-io (cpus=3,4,5; io=0) > > I'm not sure whether this makes sense or not. One advantage is that it's > more dynamic and more flexible. If for example you add cpu to the io cpuset > it will automatically start handling io requests. The code posted here works on the queue level, where as you want this to be a global setting. So it'll require a bit of extra stuff to handle that case, but the base infrastructure would not care. > btw What did you mean by "to see if it really helps the larger end of the > scale", what problem were you guys trying to solve ? I'm guessing cpu > isolation would probably be an unexpected user of io cpu affinity :). Nope, I didn't really consider isolation :-) It's meant to speed up IO on larger SMP systems by reducing cache line contention (or bouncing) by keeping data and/or locks local to a CPU (or a set of CPUs). -- Jens Axboe