From mboxrd@z Thu Jan 1 00:00:00 1970 From: Coly Li Subject: Re: [PATCH] bcache: recover data from backing device when read request hit clean Date: Fri, 17 Nov 2017 21:22:05 +0800 Message-ID: References: <54dbc609-b448-68f6-bd9a-676fd5d21151@ehuk.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from mx2.suse.de ([195.135.220.15]:40472 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756629AbdKQNWT (ORCPT ); Fri, 17 Nov 2017 08:22:19 -0500 In-Reply-To: <54dbc609-b448-68f6-bd9a-676fd5d21151@ehuk.net> Content-Language: en-US Sender: linux-bcache-owner@vger.kernel.org List-Id: linux-bcache@vger.kernel.org To: Eddie Chapman Cc: Rui Hua , Stefan Priebe - Profihost AG , Coly Li , Michael Lyle , Kent Overstreet , linux-bcache@vger.kernel.org, linux-block@vger.kernel.org On 17/11/2017 8:57 PM, Eddie Chapman wrote: > On 17/11/17 10:20, Rui Hua wrote: >> Hi, Stefan >> >> 2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG >> : >>> I‘m getting the same xfs error message under high load. Does this >>> patch fix >>> it? >>> >> Did you applied the patch "bcache: only permit to recovery read error >> when cache device is clean" ? >> If you did, maybe this patch can fix it. And you'd better check >> /sys/fs/bcache/XXX/internal/cache_read_races in your environment, >> meanwhile, it should not be zero when you get that err message. > > Hi all, > > I have 3 servers running a very recent 4.9 stable release, with several > recent bcache patches cherry picked, including V4 of "bcache: only > permit to recovery read error when cache device is clean". > > In the 3 weeks since using these cherry picks I've experienced a very > small number of isolated read errors in the layer above bcache, on all 3 > servers. > > On one of the servers, 2 out of the 6 bcache resources have a value of 1 > in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on these same > 2 bcache resources where one read error has occurred on the upper layer. > The other 4 bcache resources have 0 in cache_read_races and I haven't > had any read errors on the layers above them. > > On another server, I have 1 bcache resource out of 10 with a value of 5 > in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on that > bcache resource where a read error occurred on one occasion. The other 9 > bcache resources have 0 in cache_read_races, and no read errors have > occurred on the layers above any of them. > > On the 3rd server where some read errors occurred, I cannot verify if > there were positive values in cache_read_races as I moved the data from > there onto other storage, and shut down the bcache resources where the > errors occurred. > > If I can provide any other info which might help with this issue, please > let me know. Hi Eddie, This is very informative, thank you so much :-) Coly Li