From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+w=401wt.eu-S1754465AbYCMMNc@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754465AbYCMMNc (ORCPT <rfc822;w@1wt.eu>);
	Thu, 13 Mar 2008 08:13:32 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752519AbYCMMNZ
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Thu, 13 Mar 2008 08:13:25 -0400
Received: from brick.kernel.dk ([87.55.233.238]:7087 "EHLO kernel.dk"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751798AbYCMMNZ (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 13 Mar 2008 08:13:25 -0400
Date: Thu, 13 Mar 2008 13:13:21 +0100
From: Jens Axboe <jens.axboe@oracle.com>
To: Max Krasnyanskiy <maxk@qualcomm.com>
Cc: linux-kernel@vger.kernel.org, npiggin@suse.de, dgc@sgi.com
Subject: Re: [PATCH 0/7] IO CPU affinity testing series
Message-ID: <20080313121321.GJ17940@kernel.dk>
References: <1205322940-20127-1-git-send-email-jens.axboe@oracle.com> <47D83EF5.6070302@qualcomm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47D83EF5.6070302@qualcomm.com>
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Mar 12 2008, Max Krasnyanskiy wrote:
> Jens Axboe wrote:
> >Hi,
> >
> >Here's a new round of patches to play with io cpu affinity. It can,
> >as always, also be found in the block git repo. The branch name is
> >'io-cpu-affinity'.
> >
> >The major change since last post is the abandonment of the kthread
> >approach. It was definitely slower then may 'add IPI to signal remote
> >block softirq' hack. So I decided to base this on the scalable
> >smp_call_function_single() that Nick posted. I tweaked it a bit to
> >make it more suitable for my use and also faster.
> >
> >As for functionality, the only change is that I added a bio hint
> >that the submitter can use to ask for completion on the same CPU
> >that submitted the IO. Pass in BIO_CPU_AFFINE for that to occur.
> >
> >Otherwise the modes are the same as last time:
> >
> >- You can set a specific cpumask for queuing IO, and the block layer
> >  will move submitters to one of those CPUs.
> >- You can set a specific cpumask for completion of IO, in which case
> >  the block layer will move the completion to one of those CPUs.
> >- You can set rq_affinity mode, in which case IOs will always be
> >  completed on the CPU that submitted them.
> >
> >Look in /sys/block/<dev>/queue/ for the three sysfs variables that
> >modify this behaviour.
> >
> >I'd be interested in getting some testing done on this, to see if
> >it really helps the larger end of the scale. Dave, I know you
> >have a lot of experience in this area and would appreciate your
> >input and/or testing. I'm not sure if any of the above modes will
> >allow you to do what you need for eg XFS - if you want all meta data
> >IO completed on one (or a set of) CPU(s), then I can add a mode
> >that will allow you to play with that. Or if something else, give me
> >some input and we can take it from there!
> 
> Very cool stuff. I think I can use it for cpu isolation purposes.
> ie Isolating a cpu from the io activity.
> 
> You may have noticed that I started a bunch of discussion on CPU isolation.
> One thing that came out of that is the suggestion to use cpusets for 
> managing this affinity masks. We're still discussing the details, the 
> general idea is to provide extra flags in the cpusets that enable/disable 
> various activities
> on the cpus that belong to the set.
> 
> For example in this particular case we'd have something like "cpusets.io" 
> flag that would indicate whether cpus in the set are allowed to to the IO 
> or not.
> In other words:
> 	/dev/cpuset/io (cpus=0,1,2; io=1)
> 	/dev/cpuset/no-io (cpus=3,4,5; io=0)
> 
> I'm not sure whether this makes sense or not. One advantage is that it's 
> more dynamic and more flexible. If for example you add cpu to the io cpuset 
> it will automatically start handling io requests.

The code posted here works on the queue level, where as you want this to
be a global setting. So it'll require a bit of extra stuff to handle
that case, but the base infrastructure would not care.

> btw What did you mean by "to see if it really helps the larger end of the 
> scale", what problem were you guys trying to solve ? I'm guessing cpu 
> isolation would probably be an unexpected user of io cpu affinity :).

Nope, I didn't really consider isolation :-)

It's meant to speed up IO on larger SMP systems by reducing cache line
contention (or bouncing) by keeping data and/or locks local to a CPU (or
a set of CPUs).

-- 
Jens Axboe