From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751351AbbCSU0C (ORCPT <rfc822;w@1wt.eu>);
	Thu, 19 Mar 2015 16:26:02 -0400
Received: from mail-ig0-f171.google.com ([209.85.213.171]:37172 "EHLO
	mail-ig0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750839AbbCSU0A (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 19 Mar 2015 16:26:00 -0400
Message-ID: <550B30D5.9080207@kernel.dk>
Date: Thu, 19 Mar 2015 14:25:57 -0600
From: Jens Axboe <axboe@kernel.dk>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Tejun Heo <tj@kernel.org>, kent.overstreet@gmail.com,
        Benjamin LaHaise <bcrl@kvack.org>
CC: linux-kernel@vger.kernel.org
Subject: Re: serial percpu_ref draining in exit_aio()
References: <20150319173413.GU25365@htj.duckdns.org>
In-Reply-To: <20150319173413.GU25365@htj.duckdns.org>
Content-Type: multipart/mixed;
 boundary="------------000802020802040108060303"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

This is a multi-part message in MIME format.
--------------000802020802040108060303
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit

On 03/19/2015 11:34 AM, Tejun Heo wrote:
> Hello,
>
> So, Jens noticed that fio process exiting takes seconds when there are
> multiple aio contexts and the culprit seems to be the serial
> percpu_ref draining in exit_aio().  It's generally a bad idea to
> expose RCU latencies to userland because they add up really quickly
> and are unrelated to other performance parameters.  Can you guys
> please at least update the code so that it waits for all percpu_refs
> to drain at the same time rather than one after another?  That should
> resolve the worst part of the problem.

This works for me. Before:

real	0m5.872s
user	0m0.020s
sys	0m0.040s

after

real	0m0.246s
user	0m0.020s
sys	0m0.040s

It solves the exit_aio() issue, but if the app calls io_destroy(), then 
we are back to square one...

-- 
Jens Axboe


--------------000802020802040108060303
Content-Type: text/x-patch;
 name="aio-exit-parallel.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="aio-exit-parallel.patch"

diff --git a/fs/aio.c b/fs/aio.c
index f8e52a1854c1..73b0de46577b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -805,18 +805,35 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
 void exit_aio(struct mm_struct *mm)
 {
 	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+	struct completion *comp = NULL;
 	int i;
 
 	if (!table)
 		return;
 
+	if (table->nr > 1) {
+		comp = kmalloc(table->nr * sizeof(struct completion),
+					GFP_KERNEL);
+		if (comp)
+			for (i = 0; i < table->nr; i++)
+				init_completion(&comp[i]);
+	}
+
 	for (i = 0; i < table->nr; ++i) {
 		struct kioctx *ctx = table->table[i];
 		struct completion requests_done =
 			COMPLETION_INITIALIZER_ONSTACK(requests_done);
 
-		if (!ctx)
+		/*
+		 * Complete it early, so the below wait_for_completion()
+		 * doesn't expect a complete() from the RCU callback
+		 */
+		if (!ctx) {
+			if (comp)
+				complete(&comp[i]);
 			continue;
+		}
+
 		/*
 		 * We don't need to bother with munmap() here - exit_mmap(mm)
 		 * is coming and it'll unmap everything. And we simply can't,
@@ -825,10 +842,20 @@ void exit_aio(struct mm_struct *mm)
 		 * that it needs to unmap the area, just set it to 0.
 		 */
 		ctx->mmap_size = 0;
-		kill_ioctx(mm, ctx, &requests_done);
+		if (comp)
+			kill_ioctx(mm, ctx, &comp[i]);
+		else {
+			kill_ioctx(mm, ctx, &requests_done);
+			wait_for_completion(&requests_done);
+		}
+	}
 
-		/* Wait until all IO for the context are done. */
-		wait_for_completion(&requests_done);
+	if (comp) {
+		for (i = 0; i < table->nr; i++) {
+			/* Wait until all IO for the context are done. */
+			wait_for_completion(&comp[i]);
+		}
+		kfree(comp);
 	}
 
 	RCU_INIT_POINTER(mm->ioctx_table, NULL);

--------------000802020802040108060303--