From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48603 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752959AbcJERVh (ORCPT ); Wed, 5 Oct 2016 13:21:37 -0400 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u95HImLl036115 for ; Wed, 5 Oct 2016 13:21:36 -0400 Received: from e24smtp02.br.ibm.com (e24smtp02.br.ibm.com [32.104.18.86]) by mx0b-001b2d01.pphosted.com with ESMTP id 25w3a151jd-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 05 Oct 2016 13:21:35 -0400 Received: from localhost by e24smtp02.br.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 5 Oct 2016 14:21:34 -0300 Subject: Re: aio: questions with ioctx_alloc() and large num_possible_cpus() To: Kent Overstreet References: <20161005063435.mtw2keukyxwbwo2k@kmo-pixel> Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-kernel@vger.kernel.org From: Mauricio Faria de Oliveira Date: Wed, 5 Oct 2016 14:21:27 -0300 MIME-Version: 1.0 In-Reply-To: <20161005063435.mtw2keukyxwbwo2k@kmo-pixel> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Message-Id: <965fc993-97c2-48b9-82e3-6c3444d0ffe5@linux.vnet.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Kent, Thanks for commenting. I understood more of the code in trying to make sense of your point, but there are some things still unclear about it; if you could help a bit more, please. Can you describe how a single thread might not be able to use all the slots because 'up to about half of the reqs_available slots might be on other percpu reqs_available' ? I see that the thread might be scheduled on different CPUs (say, only 2 possible CPUs) and perform get_reqs_available() on both -- but that only gives one req_batch to each CPU, and for req_batch to be half of reqs_available its denominator needs to be 2, which doesn't happen w/ num_possible_cpus() * 4 -- which is 8. So I'm a bit confused here. atomic_set(&ctx->reqs_available, ctx->nr_events - 1); ctx->req_batch = (ctx->nr_events - 1) / (num_possible_cpus() * 4); On 10/05/2016 03:34 AM, Kent Overstreet wrote: >> - why "num_possible_cpus() * 4", and why "max(nr_events, )" ? > For the scheme to work - percpu allocation of slots - we have to ensure that > there aren't too many unused slots stranded on other CPUs. The stranding is > limited to 1/4th of the slots [snip] By 'unused slots' you mean the slots included in the batch allocated to a particular cpu but not actually used by a thread in that cpu? (e.g., get_reqs_available() called once, unused_slots == req_batch - 1) Can you please detail a bit more how the limit to 1/4th of the slots is ensured because of "num_possible_cpus() * 4", and what is the scenario where the math is based on? I've been thinking and assuming values for a while now, and didn't figure out the point where / how it occurs. Thanks for your support, -- Mauricio Faria de Oliveira IBM Linux Technology Center