From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752662AbbIQWmq (ORCPT <rfc822;w@1wt.eu>);
	Thu, 17 Sep 2015 18:42:46 -0400
Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:62054 "EHLO
	m0041696.ppops.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
	with ESMTP id S1751706AbbIQWmm (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 17 Sep 2015 18:42:42 -0400
Date: Thu, 17 Sep 2015 18:42:30 -0400
From: Chris Mason <clm@fb.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
CC: Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
        Josef Bacik <jbacik@fb.com>, LKML <linux-kernel@vger.kernel.org>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        Neil Brown <neilb@suse.de>, Christoph Hellwig <hch@lst.de>,
        Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()
Message-ID: <20150917224230.GF8624@ret.masoncoding.com>
Mail-Followup-To: Chris Mason <clm@fb.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	Josef Bacik <jbacik@fb.com>, LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Neil Brown <neilb@suse.de>, Christoph Hellwig <hch@lst.de>,
	Tejun Heo <tj@kernel.org>
References: <20150913231258.GS26895@dastard>
 <CA+55aFxYQ4bfBbyG97W=Yrzf+tpezrvktBBb=DmFX1we2+mdrw@mail.gmail.com>
 <20150916151621.GA8624@ret.masoncoding.com>
 <20150916195806.GD29530@quack.suse.cz>
 <20150916200012.GB8624@ret.masoncoding.com>
 <20150916220704.GM3902@dastard>
 <20150917003738.GN3902@dastard>
 <CA+55aFw3=_asAhUR3=o0pv0vtOJpownyWJpAfgSVtJVeaX0+bQ@mail.gmail.com>
 <20150917021453.GO3902@dastard>
 <CA+55aFz6zfHQnrwtimgm9v10s8dkF-e1w1aQQ3aWperbZGT1Jg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CA+55aFz6zfHQnrwtimgm9v10s8dkF-e1w1aQQ3aWperbZGT1Jg@mail.gmail.com>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
X-Originating-IP: [192.168.52.123]
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000
 definitions=2015-09-17_07:2015-09-17,2015-09-17,1970-01-01 signatures=0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Sep 17, 2015 at 12:39:51PM -0700, Linus Torvalds wrote:
> On Wed, Sep 16, 2015 at 7:14 PM, Dave Chinner <david@fromorbit.com> wrote:
> >>
> >> Dave, if you're testing my current -git, the other performance issue
> >> might still be the spinlock thing.
> >
> > I have the fix as the first commit in my local tree - it'll remain
> > there until I get a conflict after an update. :)
> 
> Ok. I'm happy to report that you should get a conflict now, and that
> the spinlock code should work well for your virtualized case again.
> 
> No updates on the plugging thing yet, I'll wait a bit and follow this
> thread and see if somebody comes up with any explanations or theories
> in the hope that we might not need to revert (or at least have a more
> targeted change).

Playing around with the plug a little, most of the unplugs are coming
from the cond_resched_lock().  Not really sure why we are doing the
cond_resched() there, we should be doing it before we retake the lock
instead.

This patch takes my box (with dirty thresholds at 1.5GB/3GB) from 195K
files/sec up to 213K.  Average IO size is the same as 4.3-rc1.

It probably won't help Dave, since most of his unplugs should have been
from the cond_resched_locked() too.

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 587ac08..05ed541 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1481,6 +1481,19 @@ static long writeback_sb_inodes(struct super_block *sb,
 		wbc_detach_inode(&wbc);
 		work->nr_pages -= write_chunk - wbc.nr_to_write;
 		wrote += write_chunk - wbc.nr_to_write;
+
+		if (need_resched()) {
+			/*
+			 * we're plugged and don't want to hand off to kblockd
+			 * for the actual unplug work.  But we do want to
+			 * reschedule.  So flush our plug and then
+			 * schedule away
+			 */
+			blk_flush_plug(current);
+			cond_resched();
+		}
+
+
 		spin_lock(&wb->list_lock);
 		spin_lock(&inode->i_lock);
 		if (!(inode->i_state & I_DIRTY_ALL))
@@ -1488,7 +1501,7 @@ static long writeback_sb_inodes(struct super_block *sb,
 		requeue_inode(inode, wb, &wbc);
 		inode_sync_complete(inode);
 		spin_unlock(&inode->i_lock);
-		cond_resched_lock(&wb->list_lock);
+
 		/*
 		 * bail out to wb_writeback() often enough to check
 		 * background threshold and other termination conditions.