From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([222.73.24.84]:19584 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1754450Ab3JXKlf (ORCPT ); Thu, 24 Oct 2013 06:41:35 -0400 Message-ID: <5268F99A.8010907@cn.fujitsu.com> Date: Thu, 24 Oct 2013 18:42:34 +0800 From: Miao Xie Reply-To: miaox@cn.fujitsu.com MIME-Version: 1.0 To: Chris Mason , Stefan Behrens , Bob Marley CC: Wang Shilong , linux-btrfs@vger.kernel.org Subject: Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers References: <1382156250-2336-1-git-send-email-wangshilong1991@gmail.com> <526247E3.9000804@giantdisaster.de> <5262914D.7030306@giantdisaster.de> <52663960.4060905@giantdisaster.de> <5266AE1F.6030304@shiftmail.org> <5268059E.707@giantdisaster.de> <20131024100842.14051.45479@localhost.localdomain> In-Reply-To: <20131024100842.14051.45479@localhost.localdomain> Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On thu, 24 Oct 2013 06:08:42 -0400, Chris Mason wrote: > Quoting Stefan Behrens (2013-10-23 13:21:34) >> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: >>> On 22/10/2013 10:37, Stefan Behrens wrote: >>>> I don't believe that this issue can ever happen. I don't believe that >>>> somewhere on the path to the flash memory, to the magnetic disc or to >>>> the drive's cache memory, someone interrupts a 4KB write in the middle >>>> of operation to read from this 4KB area. This is not an issue IMHO. >>> >>> I think I have read that unfortunately it can happen. >>> SAS and SATA specs for disks do not mandate that if a write is in-flight >>> but still not completed, reads from the same sector should return the >>> value it is being written; they can return the old value. >>> I also think that Linux does not check either. >> >> If the _old_ 4KB block is returned, that's fine and won't cause a >> checksum error. >> >> The patch in question addresses the case that Btrfs submits a write >> request for a 4KB block, and a concurrent read request for that 4KB >> block reads partially the old block and partially the new block, >> resulting in a checksum error reported in the scrub statistic counters. > > Concurrent reads and writes to the device are completely undefined, and > Any combination of old, new, random memory corruption wouldn't > surprise me...I'd rather avoid them ;) > > Doing the transaction join during the super read is probably the least > complex choice. But it can not block the log tree sync, I think using device_list_mutex is better since we should acquire this mutex when writing the super blocks and we are sure that the super blocks are on non-volatile media on completion after we unlock the mutex. Thanks Miao > > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >