From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boaz Harrosh Subject: Re: [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock Date: Wed, 12 Aug 2015 10:54:23 +0300 Message-ID: <55CAFBAF.104@plexistor.com> References: <1439219664-88088-1-git-send-email-kirill.shutemov@linux.intel.com> <1439219664-88088-3-git-send-email-kirill.shutemov@linux.intel.com> <20150811081909.GD2650@quack.suse.cz> <20150811093708.GB906@dastard> <20150811135004.GC2659@quack.suse.cz> <55CA0728.7060001@plexistor.com> <20150811152850.GA2608@node.dhcp.inet.fi> <55CA2008.7070702@plexistor.com> <20150811202639.GA1408@node.dhcp.inet.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Jan Kara , Dave Chinner , "Kirill A. Shutemov" , Andrew Morton , Matthew Wilcox , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Theodore Ts'o To: "Kirill A. Shutemov" Return-path: In-Reply-To: <20150811202639.GA1408@node.dhcp.inet.fi> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 08/11/2015 11:26 PM, Kirill A. Shutemov wrote: > On Tue, Aug 11, 2015 at 07:17:12PM +0300, Boaz Harrosh wrote: >> On 08/11/2015 06:28 PM, Kirill A. Shutemov wrote: >>> We also used lock_page() to make sure we shoot out all pages as we don't >>> exclude page faults during truncate. Consider this race: >>> >>> >>> get_block >>> check i_size >>> update i_size >>> unmap >>> setup pte >>> >> >> Please consider this senario then: >> >> >> read_lock(inode) >> >> get_block >> check i_size >> >> read_unlock(inode) >> >> write_lock(inode) >> >> update i_size >> * remove allocated blocks >> unmap >> >> write_unlock(inode) >> >> setup pte >> >> IS what you suppose to do in xfs > > Do you realize that you describe a race? :-P > > Exactly in this scenario pfn your pte point to is not belong to the file > anymore. Have fun. > Sorry yes I have written it wrong, I have now returned to read the actual code and the setup pte part is also part of the read lock inside the fault handler before the release of the r_lock. Da of course it is, it is the page_fault handler that does the vm_insert_mixed(vma,,pfn) and in the case of concurrent faults the second call to vm_insert_mixed will return -EBUSY which means all is well. So the only thing left is the fault-to-fault zero-the-page race as Matthew described and as Dave and me think we can make this part of the FS's get_block where it is more natural. Thanks Boaz