From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from racke.linbit (office.linbit [86.59.100.100]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mail.linbit.com (LINBIT Mail Daemon) with ESMTP id C49572E089A8 for ; Mon, 14 Apr 2008 23:59:30 +0200 (CEST) Date: Mon, 14 Apr 2008 23:59:59 +0200 From: Lars Ellenberg To: drbd-dev@lists.linbit.com Subject: Re: [Drbd-dev] Perf issues with DRBD when doing a lot of random I/O Message-ID: <20080414215959.GC11768@racke.local> References: <342BAC0A5467384983B586A6B0B3767108F030CD@EXNA.corp.stratus.com> <342BAC0A5467384983B586A6B0B3767108F03209@EXNA.corp.stratus.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <342BAC0A5467384983B586A6B0B3767108F03209@EXNA.corp.stratus.com> List-Id: Coordination of development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, Apr 14, 2008 at 03:21:14PM -0400, Graham, Simon wrote: > This is a follow on to the earlier conversation on the issues with the > drbd_merge_bvec function - having modified this, I am still seeing > performance with DRBD of about 66% of what I see with no DRBD. > > The specific workload is quite vicious and does a lot of random I/O > across the entire disk, so I experimented with bumping the AL cache size > up to the max; this got my performance up to 72% of 'native' - better > but still no great. > > Then I started thinking about the change I submitted a while back to > make meta data updates be barrier requests - given that this random > workload causes a lot of AL cache turns, it's also causing a lot of > meta-data activity, so a barrier request is likely to cause a lot of > stalls. > > Now, thinking more about this, I'm not so sure that a barrier is > appropriate here -- when we update the on-disk AL, we are actually > throwing away information that a given block is modified, we are throwing away information that a given block _may_ be modified, and we are _adding_ information that an other given block may be modified. > so we need to > be sure THAT block has been committed to the disk, however, it has > nothing to do with the current set of outstanding I/O to the disk (at > least, it seems so to me). > > I then tried a little test of simply commenting out the barrier in the > meta data update path and voila I was up to 88% of native perf - finally > within striking range of acceptable! > > So... the big question is whether or not having a barrier set on > meta-data updates to the on disk AL is required for correctness there is certainly room for improvement, we may be able to reduce the number of single meta data requests. but for volatile write cache, is there an other way than barrier requests for us to get FUA? I think the semantics we need for the typical al transaction, i.e. expiring one al-extent, and reusing its slot for an other one, are: - we have to be sure that all io to the region we expire has not only been "completed" (reached the volatile cache) but also "reached stable storage" (the disk itself) - we need to be sure that the al transaction reached stable storage before we start the real io to the corresponding new region which is perfectly expressed by a barrier request. note that in 8.0.12, we made the use of barriers/cache flushes configurable, you can switch it off, if you know and trust your hardware (non-volatile cache). -- : Lars Ellenberg Tel +43-1-8178292-55 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :