From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chris Mason <clm@fb.com>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()
Date: Thu, 17 Sep 2015 19:56:47 -0400
Message-ID: <20150917235647.GG8624@ret.masoncoding.com>
References: <20150916151621.GA8624@ret.masoncoding.com>
 <20150916195806.GD29530@quack.suse.cz>
 <20150916200012.GB8624@ret.masoncoding.com>
 <20150916220704.GM3902@dastard>
 <20150917003738.GN3902@dastard>
 <CA+55aFw3=_asAhUR3=o0pv0vtOJpownyWJpAfgSVtJVeaX0+bQ@mail.gmail.com>
 <20150917021453.GO3902@dastard>
 <CA+55aFz6zfHQnrwtimgm9v10s8dkF-e1w1aQQ3aWperbZGT1Jg@mail.gmail.com>
 <20150917224230.GF8624@ret.masoncoding.com>
 <CA+55aFw40VNejeCtHC+-fPThK+xp9WnoNGQUwYW2JEVoVp5JJw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Cc: Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	Josef Bacik <jbacik@fb.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Neil Brown <neilb@suse.de>, Christoph Hellwig <hch@lst.de>,
	Tejun Heo <tj@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:43145 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1751821AbbIQX5Y (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 17 Sep 2015 19:57:24 -0400
Content-Disposition: inline
In-Reply-To: <CA+55aFw40VNejeCtHC+-fPThK+xp9WnoNGQUwYW2JEVoVp5JJw@mail.gmail.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Thu, Sep 17, 2015 at 04:08:19PM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2015 at 3:42 PM, Chris Mason <clm@fb.com> wrote:
> >
> > Playing around with the plug a little, most of the unplugs are coming
> > from the cond_resched_lock().  Not really sure why we are doing the
> > cond_resched() there, we should be doing it before we retake the lock
> > instead.
> >
> > This patch takes my box (with dirty thresholds at 1.5GB/3GB) from 195K
> > files/sec up to 213K.  Average IO size is the same as 4.3-rc1.
> 
> Ok, so at least for you, part of the problem really ends up being that
> there's a mix of the "synchronous" unplugging (by the actual explicit
> "blk_finish_plug(&plug);") and the writeback that is handed off to
> kblockd_workqueue.
> 
> I'm not seeing why that should be an issue. Sure, there's some CPU
> overhead to context switching, but I don't see that it should be that
> big of a deal.
> 
> I wonder if there is something more serious wrong with the kblockd_workqueue.

I'm driving the box pretty hard, it's right on the line between CPU
bound and IO bound.  So I've got 32 fs_mark processes banging away and
32 CPUs (16 really, with hyperthreading).

They are popping in and out of balance_dirty_pages() so I have high CPU
utilization alternating with high IO wait times.  There no reads at all,
so all of these waits are for buffered writes.

People in balance_dirty_pages are indirectly waiting on the unplug, so
maybe the context switch overhead on a loaded box is enough to explain
it.  We've definitely gotten more than 9% by inlining small synchronous
items in btrfs in the past, but those were more explicitly synchronous.

I know it's painfully hand wavy.  I don't see any other users of the
kblockd workqueues, and the perf profiles don't jump out at me.  I'll
feel better about the patch if Dave confirms any gains.

-chris