From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: [PATCH V4 00/13] MD: a caching layer for raid5/6 Date: Tue, 14 Jul 2015 08:22:54 +1000 Message-ID: <20150714082254.3889ef43@noble> References: <20150708115636.6c972269@noble> <20150708054344.GA2709238@devbig257.prn2.facebook.com> <20150710092119.297fd9e1@noble> <20150710040847.GA1408097@devbig257.prn2.facebook.com> <20150710143656.4ee7e647@noble> <20150710045225.GA1746743@devbig257.prn2.facebook.com> <20150710151044.396f9645@noble> <20150710051815.GA1902680@devbig257.prn2.facebook.com> <20150710164209.5928d762@noble> <20150710174835.GA1837928@devbig257.prn2.facebook.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150710174835.GA1837928@devbig257.prn2.facebook.com> Sender: linux-raid-owner@vger.kernel.org To: Shaohua Li Cc: linux-raid@vger.kernel.org, songliubraving@fb.com, hch@infradead.org, dan.j.williams@intel.com, Kernel-team@fb.com List-Id: linux-raid.ids On Fri, 10 Jul 2015 10:48:45 -0700 Shaohua Li wrote: > On Fri, Jul 10, 2015 at 04:42:09PM +1000, NeilBrown wrote: > > On Thu, 9 Jul 2015 22:18:15 -0700 Shaohua Li wrote: > > > > > On Fri, Jul 10, 2015 at 03:10:44PM +1000, NeilBrown wrote: > > > > On Thu, 9 Jul 2015 21:52:43 -0700 Shaohua Li wrote: > > > > > > > > > On Fri, Jul 10, 2015 at 02:36:56PM +1000, NeilBrown wrote: > > > > > > On Thu, 9 Jul 2015 21:08:49 -0700 Shaohua Li wrote: > > > > > > > > > > > > > > > > There is also the issue of what action commits a previous transaction. > > > > > > I'm not sure what you had. I'm suggesting that each metadata block > > > > > > commits previous transactions. Is that a close-enough match to what > > > > > > you had? > > > > > > > > > > What did you mean about a transaction? In my implementation, metadata > > > > > block and followed stripe data/parity consist of an io unit. io units can > > > > > be finished out of order. but if io unit has flush request (the data has > > > > > flush/flush bio or metadata is a flush block), the io unit can only > > > > > start after all previous io units and disk cache flush finish. Such io > > > > > unit is strictly ordered. The log patch describes this behavior. Does it > > > > > match? > > > > > > > > Yes, a "transaction" is an "io unit". The flushing is the same. > > > > I just couldn't remember how, when reading the log on restart, you > > > > determined if a given "io unit" was reliably consistent, or whether it > > > > should be ignored (having possibly only partially been written). > > > > > > The metadata block has a checksum for data of the block. data/parity has > > > checksum stored in metadata block. This way we can know if metadata and > > > data is consistent. > > > > > > > OK .. though I'm not totally sold on the value of checksums. When a > > checksum doesn't match, that means something. When a checksum does > > match, it could just be a co-incidence. > > I'd rather have a process that made checksums unnecessary, and only use > > the checksums as a double-check. > > We could do something like: write metadata/data, wait, write another > metadata. the second metadata indicates the first is in disk. But this > can impact performance very much. The performance consideration is why I suggested a double-buffered approach. Write metadata1, data1, metadata2, data2, then don't write metdata3 until metdata1 and data1 has been written. I haven't actually tried that so I don't know for certain it would help. > I think checksum should be fine. It > might be just a coninsidence, but the rate should extremely low. jbd2 is > using checksum too now. Maybe I'll have a look at jbd2 - do you know what sort of checksum it uses? I'd be surprised it didn't use something quite a bit stronger than crc32 for a task like this. NeilBrown > > Thanks, > Shaohua > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html