From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug() Date: Thu, 17 Sep 2015 19:56:47 -0400 Message-ID: <20150917235647.GG8624@ret.masoncoding.com> References: <20150916151621.GA8624@ret.masoncoding.com> <20150916195806.GD29530@quack.suse.cz> <20150916200012.GB8624@ret.masoncoding.com> <20150916220704.GM3902@dastard> <20150917003738.GN3902@dastard> <20150917021453.GO3902@dastard> <20150917224230.GF8624@ret.masoncoding.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Dave Chinner , Jan Kara , Josef Bacik , LKML , linux-fsdevel , Neil Brown , Christoph Hellwig , Tejun Heo To: Linus Torvalds Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:43145 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751821AbbIQX5Y (ORCPT ); Thu, 17 Sep 2015 19:57:24 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Sep 17, 2015 at 04:08:19PM -0700, Linus Torvalds wrote: > On Thu, Sep 17, 2015 at 3:42 PM, Chris Mason wrote: > > > > Playing around with the plug a little, most of the unplugs are coming > > from the cond_resched_lock(). Not really sure why we are doing the > > cond_resched() there, we should be doing it before we retake the lock > > instead. > > > > This patch takes my box (with dirty thresholds at 1.5GB/3GB) from 195K > > files/sec up to 213K. Average IO size is the same as 4.3-rc1. > > Ok, so at least for you, part of the problem really ends up being that > there's a mix of the "synchronous" unplugging (by the actual explicit > "blk_finish_plug(&plug);") and the writeback that is handed off to > kblockd_workqueue. > > I'm not seeing why that should be an issue. Sure, there's some CPU > overhead to context switching, but I don't see that it should be that > big of a deal. > > I wonder if there is something more serious wrong with the kblockd_workqueue. I'm driving the box pretty hard, it's right on the line between CPU bound and IO bound. So I've got 32 fs_mark processes banging away and 32 CPUs (16 really, with hyperthreading). They are popping in and out of balance_dirty_pages() so I have high CPU utilization alternating with high IO wait times. There no reads at all, so all of these waits are for buffered writes. People in balance_dirty_pages are indirectly waiting on the unplug, so maybe the context switch overhead on a loaded box is enough to explain it. We've definitely gotten more than 9% by inlining small synchronous items in btrfs in the past, but those were more explicitly synchronous. I know it's painfully hand wavy. I don't see any other users of the kblockd workqueues, and the perf profiles don't jump out at me. I'll feel better about the patch if Dave confirms any gains. -chris