Re: [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boaz Harrosh <boaz@plexistor.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, Davidlohr Bueso <dbueso@suse.de>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock
Date: Wed, 12 Aug 2015 10:54:23 +0300	[thread overview]
Message-ID: <55CAFBAF.104@plexistor.com> (raw)
In-Reply-To: <20150811202639.GA1408@node.dhcp.inet.fi>

On 08/11/2015 11:26 PM, Kirill A. Shutemov wrote:
> On Tue, Aug 11, 2015 at 07:17:12PM +0300, Boaz Harrosh wrote:
>> On 08/11/2015 06:28 PM, Kirill A. Shutemov wrote:
>>> We also used lock_page() to make sure we shoot out all pages as we don't
>>> exclude page faults during truncate. Consider this race:
>>>
>>> 	<fault>			<truncate>
>>> 	get_block
>>> 	check i_size
>>>     				update i_size
>>> 				unmap
>>> 	setup pte
>>>
>>
>> Please consider this senario then:
>>
>>  	<fault>			<truncate>
>> 	read_lock(inode)
>>
>>  	get_block
>>  	check i_size
>> 	
>> 	read_unlock(inode)
>>
>> 				write_lock(inode)
>>
>>      				update i_size
>> 				* remove allocated blocks
>>  				unmap
>>
>> 				write_unlock(inode)
>>
>>  	setup pte
>>
>> IS what you suppose to do in xfs
> 
> Do you realize that you describe a race? :-P
> 
> Exactly in this scenario pfn your pte point to is not belong to the file
> anymore. Have fun.
> 

Sorry yes I have written it wrong, I have now returned to read the actual code
and the setup pte part is also part of the read lock inside the fault handler
before the release of the r_lock.
Da of course it is, it is the page_fault handler that does the
vm_insert_mixed(vma,,pfn) and in the case of concurrent faults the second
call to vm_insert_mixed will return -EBUSY which means all is well.

So the only thing left is the fault-to-fault zero-the-page race as Matthew described
and as Dave and me think we can make this part of the FS's get_block where it is
more natural.

Thanks
Boaz

WARNING: multiple messages have this Message-ID (diff)

From: Boaz Harrosh <boaz@plexistor.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, Davidlohr Bueso <dbueso@suse.de>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock
Date: Wed, 12 Aug 2015 10:54:23 +0300	[thread overview]
Message-ID: <55CAFBAF.104@plexistor.com> (raw)
In-Reply-To: <20150811202639.GA1408@node.dhcp.inet.fi>

On 08/11/2015 11:26 PM, Kirill A. Shutemov wrote:
> On Tue, Aug 11, 2015 at 07:17:12PM +0300, Boaz Harrosh wrote:
>> On 08/11/2015 06:28 PM, Kirill A. Shutemov wrote:
>>> We also used lock_page() to make sure we shoot out all pages as we don't
>>> exclude page faults during truncate. Consider this race:
>>>
>>> 	<fault>			<truncate>
>>> 	get_block
>>> 	check i_size
>>>     				update i_size
>>> 				unmap
>>> 	setup pte
>>>
>>
>> Please consider this senario then:
>>
>>  	<fault>			<truncate>
>> 	read_lock(inode)
>>
>>  	get_block
>>  	check i_size
>> 	
>> 	read_unlock(inode)
>>
>> 				write_lock(inode)
>>
>>      				update i_size
>> 				* remove allocated blocks
>>  				unmap
>>
>> 				write_unlock(inode)
>>
>>  	setup pte
>>
>> IS what you suppose to do in xfs
> 
> Do you realize that you describe a race? :-P
> 
> Exactly in this scenario pfn your pte point to is not belong to the file
> anymore. Have fun.
> 

Sorry yes I have written it wrong, I have now returned to read the actual code
and the setup pte part is also part of the read lock inside the fault handler
before the release of the r_lock.
Da of course it is, it is the page_fault handler that does the
vm_insert_mixed(vma,,pfn) and in the case of concurrent faults the second
call to vm_insert_mixed will return -EBUSY which means all is well.

So the only thing left is the fault-to-fault zero-the-page race as Matthew described
and as Dave and me think we can make this part of the FS's get_block where it is
more natural.

Thanks
Boaz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-08-12  7:54 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-10 15:14 [PATCH, RFC 0/2] Recover some scalability for DAX Kirill A. Shutemov
2015-08-10 15:14 ` Kirill A. Shutemov
2015-08-10 15:14 ` [PATCH, RFC 1/2] lib: Implement range locks Kirill A. Shutemov
2015-08-10 15:14   ` Kirill A. Shutemov
2015-08-10 15:14 ` [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock Kirill A. Shutemov
2015-08-10 15:14   ` Kirill A. Shutemov
2015-08-11  8:19   ` Jan Kara
2015-08-11  8:19     ` Jan Kara
2015-08-11  9:37     ` Dave Chinner
2015-08-11  9:37       ` Dave Chinner
2015-08-11 11:09       ` Boaz Harrosh
2015-08-11 11:09         ` Boaz Harrosh
2015-08-11 12:03       ` Kirill A. Shutemov
2015-08-11 12:03         ` Kirill A. Shutemov
2015-08-11 13:50       ` Jan Kara
2015-08-11 13:50         ` Jan Kara
2015-08-11 14:31         ` Boaz Harrosh
2015-08-11 14:31           ` Boaz Harrosh
2015-08-11 15:28           ` Kirill A. Shutemov
2015-08-11 15:28             ` Kirill A. Shutemov
2015-08-11 16:17             ` Boaz Harrosh
2015-08-11 16:17               ` Boaz Harrosh
2015-08-11 20:26               ` Kirill A. Shutemov
2015-08-11 20:26                 ` Kirill A. Shutemov
2015-08-12  7:54                 ` Boaz Harrosh [this message]
2015-08-12  7:54                   ` Boaz Harrosh
2015-08-11 16:51           ` Wilcox, Matthew R
2015-08-11 16:51             ` Wilcox, Matthew R
2015-08-11 18:46             ` Boaz Harrosh
2015-08-11 18:46               ` Boaz Harrosh
2015-08-11 21:48             ` Dave Chinner
2015-08-11 21:48               ` Dave Chinner
2015-08-12  8:51               ` Boaz Harrosh
2015-08-12  8:51                 ` Boaz Harrosh
2015-08-13 11:30               ` Jan Kara
2015-08-13 11:30                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55CAFBAF.104@plexistor.com \
    --to=boaz@plexistor.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=dbueso@suse.de \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.