From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Brown Subject: Re: MD write performance issue - found Catalyst patches Date: Thu, 29 Oct 2009 17:41:21 +1100 Message-ID: <19177.14609.138378.581065@notabene.brown> References: <66781b10910180300j2006a4b7q21444bb27dd9434e@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=unknown Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: message from mark delfman on Sunday October 18 Sender: linux-raid-owner@vger.kernel.org To: mark delfman Cc: =?ISO-8859-1?Q?Mattias_Hellstr=F6m?= , Linux RAID Mailing List , npiggin@suse.de List-Id: linux-raid.ids On Sunday October 18, markdelfman@googlemail.com wrote: > We have tracked the performance drop to the attached two commits in > 2.6.28.6. The performance never fully recovers in later kernels so > I presuming that the change in the write cache is still affecting MD > today. >=20 > The problem for us is that although we have slowly tracked it down, w= e > have no understanding of linux at this level and simply wouldn=92t kn= ow > where go from this point. >=20 > Considering this seems to only effect MD and not hardware based RAID > (in our tests) I thought that this would be an appropriate place to > post these patches and findings. >=20 > There are 2 patches which impact MD performance via a filesystem: >=20 > a) commit 66c85494570396661479ba51e17964b2c82b6f39 - write-back: fix > nr_to_write counter > b) commit fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 - Fix page > writeback thinko, causing Berkeley DB slowdown >=20 I've had a look at this and asked around and I'm afraid there doesn't seem to be an easy answer. The most likely difference between 'before' and 'after' those patches is that more pages are being written per call to generic_writepages in the 'before' case. This would generally improve throughput, particularly with RAID5 which would get more full stripes. However that is largely a guess as the bugs which were fixed by the patch could interact in interesting ways with XFS (which decrements ->nr_to_write itself) and it isn't immediately clear to me that more pages would be written...=20 In any case, the 'after' code is clearly correct, so if throughput can really be increased, the change should be somewhere else. What might be useful would be to instrument write_cache_pages to count how many pages were written each time it calls. You could either print this number out every time or, if that creates too much noise, print out an average ever 512 calls or similar. Seeing how this differs with and without the patches in question could help understand what is going one and provide hints for how to fix it. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html