From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brett Russ Subject: Re: [BUG,PATCH] raid1 behind write ordering (barrier) protection Date: Thu, 12 Dec 2013 09:45:12 -0500 Message-ID: <52A9CBF8.3050004@linux.vnet.ibm.com> References: <528E72C8.7050909@linux.vnet.ibm.com> <529CBFBD.9070009@linux.vnet.ibm.com> <20131203100813.67814984@notabene.brown> <529D1941.6000507@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <529D1941.6000507@linux.vnet.ibm.com> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On 12/02/2013 06:35 PM, Brett Russ wrote: > On 12/02/2013 06:08 PM, NeilBrown wrote: >> How about just keeping a record of whether there is a BIO_FLUSH request >> outstanding on each "behind" leg. While there is we don't submit new >> requests. >> So we have a queue of bios for each leg which are waiting for a BIO_FLUSH to >> complete, and we send them on down as soon as it does. > > In these circumstances, it's MD who's created the situation, not an upper > layer's BIO_FLUSH. So, we can't key off of that. Additionally, the patch below > also fixes another issue related to BIO_FLUSH: > > >>> + /* If this is a flush/fua request don't > >>> + * ever let it go "behind". Keep all the > >>> + * mirrors in sync. > >>> + */ > >>> + if (bio_rw_flagged(bio, BIO_FLUSH | BIO_FUA)) { > >>> + set_bit(R1BIO_BehindIO, &r1_bio->state); > >>> + do_flush_fua = bio->bi_rw & (BIO_FLUSH | BIO_FUA); > >>> + } > > so we avoid the BIO_FLUSH "behind" issue that way. This probably should be a > separate patch... > > We could divide the behind write ordering problem into two: > 1) detecting the condition to protect > 2) protecting against that condition > > Solutions for (1) include: > a) keeping a list of behind writes > b) keeping a count of behind writes > c) ? One possible additional solution for (1) proposed by a colleague here is leveraging the bitmap as an indicator of an outstanding write to a region. I fear this may be an incompatible overloading the in- vs. out-of sync role of the bitmap, though. > Solutions for (2) include: > i) blocking the I/O > j) ? > > The advantages to solution (a) are: > -nothing gets blocked unless it overlaps (previously all reads would) > -list depth limited to max behind writes allowed (typically small) > > I wish there were alternatives to solution (i) but recognize that since barriers > were removed in favor of the filesystem owning the ordering problem, MD is > effectively assuming the role of the filesystem in this case. > > Thanks, > BR Additional thoughts on the above, Neil? Thanks, BR