[PATCH -next] tmpfs: fault in smaller chunks if large folio allocation not allowed

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -next] tmpfs: fault in smaller chunks if large folio allocation not allowed
@ 2024-09-14 14:06 Kefeng Wang
  2024-09-15 10:40 ` Matthew Wilcox
  2024-09-20 14:36 ` [PATCH v2] " Kefeng Wang
  0 siblings, 2 replies; 23+ messages in thread
From: Kefeng Wang @ 2024-09-14 14:06 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox
  Cc: Anna Schumaker, linux-fsdevel, linux-mm, Kefeng Wang

The tmpfs supports large folio, but there is some configurable options
to enable/disable large folio allocation, and for huge=within_size,
large folio only allowabled if it fully within i_size, so there is
performance issue when perform write without large folio, the issue is
similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for
non-large folio mappings").

Fix it by checking whether it allows large folio allocation or not
before perform write.

Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/fs.h | 2 ++
 mm/filemap.c       | 7 ++++++-
 mm/shmem.c         | 5 +++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1e25267e2e48..d642e323354b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -346,6 +346,8 @@ struct readahead_control;
 #define IOCB_DIO_CALLER_COMP	(1 << 22)
 /* kiocb is a read or write operation submitted by fs/aio.c. */
 #define IOCB_AIO_RW		(1 << 23)
+/* fault int small chunks(PAGE_SIZE) from userspace */
+#define IOCB_NO_LARGE_CHUNK	(1 << 24)
 
 /* for use in trace events */
 #define TRACE_IOCB_STRINGS \
diff --git a/mm/filemap.c b/mm/filemap.c
index 3e46ca45e13d..27e73f2680ef 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4132,9 +4132,14 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 	loff_t pos = iocb->ki_pos;
 	struct address_space *mapping = file->f_mapping;
 	const struct address_space_operations *a_ops = mapping->a_ops;
-	size_t chunk = mapping_max_folio_size(mapping);
 	long status = 0;
 	ssize_t written = 0;
+	size_t chunk;
+
+	if (iocb->ki_flags & IOCB_NO_LARGE_CHUNK)
+		chunk = PAGE_SIZE;
+	else
+		chunk = mapping_max_folio_size(mapping);
 
 	do {
 		struct folio *folio;
diff --git a/mm/shmem.c b/mm/shmem.c
index 530fe8cfcc21..2909bead774b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3228,6 +3228,7 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
 	ssize_t ret;
 
 	inode_lock(inode);
@@ -3240,6 +3241,10 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	ret = file_update_time(file);
 	if (ret)
 		goto unlock;
+
+	if (!shmem_allowable_huge_orders(inode, NULL, index, 0, false))
+		iocb->ki_flags |= IOCB_NO_LARGE_CHUNK;
+
 	ret = generic_perform_write(iocb, from);
 unlock:
 	inode_unlock(inode);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH -next] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-14 14:06 [PATCH -next] tmpfs: fault in smaller chunks if large folio allocation not allowed Kefeng Wang
@ 2024-09-15 10:40 ` Matthew Wilcox
  2024-09-18  3:55   ` Kefeng Wang
  2024-09-20 14:36 ` [PATCH v2] " Kefeng Wang
  1 sibling, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2024-09-15 10:40 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm

On Sat, Sep 14, 2024 at 10:06:13PM +0800, Kefeng Wang wrote:
> +++ b/mm/shmem.c
> @@ -3228,6 +3228,7 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file->f_mapping->host;
> +	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
>  	ssize_t ret;
>  
>  	inode_lock(inode);
> @@ -3240,6 +3241,10 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	ret = file_update_time(file);
>  	if (ret)
>  		goto unlock;
> +
> +	if (!shmem_allowable_huge_orders(inode, NULL, index, 0, false))
> +		iocb->ki_flags |= IOCB_NO_LARGE_CHUNK;

Wouldn't it be better to call mapping_set_folio_order_range() so we
don't need this IOCB flag?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-15 10:40 ` Matthew Wilcox
@ 2024-09-18  3:55   ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2024-09-18  3:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm



On 2024/9/15 18:40, Matthew Wilcox wrote:
> On Sat, Sep 14, 2024 at 10:06:13PM +0800, Kefeng Wang wrote:
>> +++ b/mm/shmem.c
>> @@ -3228,6 +3228,7 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>   {
>>   	struct file *file = iocb->ki_filp;
>>   	struct inode *inode = file->f_mapping->host;
>> +	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
>>   	ssize_t ret;
>>   
>>   	inode_lock(inode);
>> @@ -3240,6 +3241,10 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>   	ret = file_update_time(file);
>>   	if (ret)
>>   		goto unlock;
>> +
>> +	if (!shmem_allowable_huge_orders(inode, NULL, index, 0, false))
>> +		iocb->ki_flags |= IOCB_NO_LARGE_CHUNK;
> 
> Wouldn't it be better to call mapping_set_folio_order_range() so we
> don't need this IOCB flag?
> 

I think it before, but the comment of mapping_set_folio_order_range() said,

  "The filesystem should call this function in its inode constructor to
  indicate which base size (min) and maximum size (max) of folio the VFS
  can use to cache the contents of the file.  This should only be used
  if the filesystem needs special handling of folio sizes (ie there is
  something the core cannot know).
  Do not tune it based on, eg, i_size.

  Context: This should not be called while the inode is active as it
  is non-atomic."

and dynamically modify mappings without protect maybe a risk.

Or adding a new __generic_perform_write() with a chunk_size argument to
avoid to use a IOCB flag?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-14 14:06 [PATCH -next] tmpfs: fault in smaller chunks if large folio allocation not allowed Kefeng Wang
  2024-09-15 10:40 ` Matthew Wilcox
@ 2024-09-20 14:36 ` Kefeng Wang
  2024-09-22  0:35   ` Matthew Wilcox
                     ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Kefeng Wang @ 2024-09-20 14:36 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox
  Cc: Anna Schumaker, linux-fsdevel, linux-mm, Baolin Wang, Kefeng Wang

The tmpfs supports large folio, but there is some configurable options
to enable/disable large folio allocation, and for huge=within_size,
large folio only allowabled if it fully within i_size, so there is
performance issue when perform write without large folio, the issue is
similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for
non-large folio mappings").

Fix it by checking whether it allows large folio allocation or not
before perform write.

Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
v2:
- Don't use IOCB flags

 mm/filemap.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 3e46ca45e13d..b33f260fa32f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4126,13 +4126,28 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 }
 EXPORT_SYMBOL(generic_file_direct_write);
 
+static size_t generic_mapping_max_folio_size(struct address_space *mapping,
+					     loff_t pos)
+{
+	struct inode *inode = mapping->host;
+	pgoff_t index = pos >> PAGE_SHIFT;
+
+	if (!shmem_mapping(mapping))
+		goto out;
+
+	if (!shmem_allowable_huge_orders(inode, NULL, index, 0, false))
+		return PAGE_SIZE;
+out:
+	return mapping_max_folio_size(mapping);
+}
+
 ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 {
 	struct file *file = iocb->ki_filp;
 	loff_t pos = iocb->ki_pos;
 	struct address_space *mapping = file->f_mapping;
 	const struct address_space_operations *a_ops = mapping->a_ops;
-	size_t chunk = mapping_max_folio_size(mapping);
+	size_t chunk = generic_mapping_max_folio_size(mapping, pos);
 	long status = 0;
 	ssize_t written = 0;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-20 14:36 ` [PATCH v2] " Kefeng Wang
@ 2024-09-22  0:35   ` Matthew Wilcox
  2024-09-23  1:39     ` Kefeng Wang
  2024-10-11  6:59   ` [PATCH v3] tmpfs: don't enable large folios if not supported Kefeng Wang
  2024-10-17 14:17   ` [PATCH v4] " Kefeng Wang
  2 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2024-09-22  0:35 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm, Baolin Wang

On Fri, Sep 20, 2024 at 10:36:54PM +0800, Kefeng Wang wrote:
> The tmpfs supports large folio, but there is some configurable options
> to enable/disable large folio allocation, and for huge=within_size,
> large folio only allowabled if it fully within i_size, so there is
> performance issue when perform write without large folio, the issue is
> similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for
> non-large folio mappings").

No.  What's wrong with my earlier suggestion?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-22  0:35   ` Matthew Wilcox
@ 2024-09-23  1:39     ` Kefeng Wang
  2024-09-26  8:38       ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2024-09-23  1:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm, Baolin Wang



On 2024/9/22 8:35, Matthew Wilcox wrote:
> On Fri, Sep 20, 2024 at 10:36:54PM +0800, Kefeng Wang wrote:
>> The tmpfs supports large folio, but there is some configurable options
>> to enable/disable large folio allocation, and for huge=within_size,
>> large folio only allowabled if it fully within i_size, so there is
>> performance issue when perform write without large folio, the issue is
>> similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for
>> non-large folio mappings").
> 
> No.  What's wrong with my earlier suggestion?
> 

The tempfs has mount options(never/always/within_size/madvise) for large
folio, also has sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled
to deny/force large folio at runtime, as replied in v1, I think it
breaks the rules of mapping_set_folio_order_range(),

   "Do not tune it based on, eg, i_size."
   --- for tmpfs, it does choose large folio or not based on the i_size

   "Context: This should not be called while the inode is active as it 
is non-atomic."
   --- during perform write, the inode is active

So this is why I don't use mapping_set_folio_order_range() here, but
correct me if I am wrong.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-23  1:39     ` Kefeng Wang
@ 2024-09-26  8:38       ` Pankaj Raghav (Samsung)
  2024-09-26 13:52         ` Matthew Wilcox
  0 siblings, 1 reply; 23+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-09-26  8:38 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Matthew Wilcox, Andrew Morton, Hugh Dickins, Alexander Viro,
	Christian Brauner, Jan Kara, Anna Schumaker, linux-fsdevel,
	linux-mm, Baolin Wang

On Mon, Sep 23, 2024 at 09:39:07AM +0800, Kefeng Wang wrote:
> 
> 
> On 2024/9/22 8:35, Matthew Wilcox wrote:
> > On Fri, Sep 20, 2024 at 10:36:54PM +0800, Kefeng Wang wrote:
> > > The tmpfs supports large folio, but there is some configurable options
> > > to enable/disable large folio allocation, and for huge=within_size,
> > > large folio only allowabled if it fully within i_size, so there is
> > > performance issue when perform write without large folio, the issue is
> > > similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for
> > > non-large folio mappings").
> > 
> > No.  What's wrong with my earlier suggestion?
> > 
> 
> The tempfs has mount options(never/always/within_size/madvise) for large
> folio, also has sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled
> to deny/force large folio at runtime, as replied in v1, I think it
> breaks the rules of mapping_set_folio_order_range(),
> 
>   "Do not tune it based on, eg, i_size."
>   --- for tmpfs, it does choose large folio or not based on the i_size
> 
>   "Context: This should not be called while the inode is active as it is
> non-atomic."
>   --- during perform write, the inode is active
> 
> So this is why I don't use mapping_set_folio_order_range() here, but
> correct me if I am wrong.

Yeah, the inode is active here as the max folio size is decided based on
the write size, so probably mapping_set_folio_order_range() will not be
a safe option.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-26  8:38       ` Pankaj Raghav (Samsung)
@ 2024-09-26 13:52         ` Matthew Wilcox
  2024-09-26 14:20           ` Kefeng Wang
  2024-09-30  2:02           ` Baolin Wang
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2024-09-26 13:52 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Kefeng Wang, Andrew Morton, Hugh Dickins, Alexander Viro,
	Christian Brauner, Jan Kara, Anna Schumaker, linux-fsdevel,
	linux-mm, Baolin Wang

On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote:
> > So this is why I don't use mapping_set_folio_order_range() here, but
> > correct me if I am wrong.
> 
> Yeah, the inode is active here as the max folio size is decided based on
> the write size, so probably mapping_set_folio_order_range() will not be
> a safe option.

You really are all making too much of this.  Here's the patch I think we
need:

+++ b/mm/shmem.c
@@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
        cache_no_acl(inode);
        if (sbinfo->noswap)
                mapping_set_unevictable(inode->i_mapping);
-       mapping_set_large_folios(inode->i_mapping);
+       if (sbinfo->huge)
+               mapping_set_large_folios(inode->i_mapping);

        switch (mode & S_IFMT) {
        default:

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-26 13:52         ` Matthew Wilcox
@ 2024-09-26 14:20           ` Kefeng Wang
  2024-09-26 14:58             ` Matthew Wilcox
  2024-09-30  2:02           ` Baolin Wang
  1 sibling, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2024-09-26 14:20 UTC (permalink / raw)
  To: Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm, Baolin Wang



On 2024/9/26 21:52, Matthew Wilcox wrote:
> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote:
>>> So this is why I don't use mapping_set_folio_order_range() here, but
>>> correct me if I am wrong.
>>
>> Yeah, the inode is active here as the max folio size is decided based on
>> the write size, so probably mapping_set_folio_order_range() will not be
>> a safe option.
> 
> You really are all making too much of this.  Here's the patch I think we
> need:
> 
> +++ b/mm/shmem.c
> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>          cache_no_acl(inode);
>          if (sbinfo->noswap)
>                  mapping_set_unevictable(inode->i_mapping);
> -       mapping_set_large_folios(inode->i_mapping);
> +       if (sbinfo->huge)
> +               mapping_set_large_folios(inode->i_mapping);
> 

But it can't solve all issue, eg,
   mount with huge = SHMEM_HUGE_WITHIN_SIZE, or
   mount with SHMEM_HUGE_ALWAYS  +  runtime SHMEM_HUGE_DENY

and the above change will break
   mount with SHMEM_HUGE_NEVER + runtime SHMEM_HUGE_FORCE



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-26 14:20           ` Kefeng Wang
@ 2024-09-26 14:58             ` Matthew Wilcox
  2024-09-30  1:27               ` Kefeng Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2024-09-26 14:58 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Pankaj Raghav (Samsung), Andrew Morton, Hugh Dickins,
	Alexander Viro, Christian Brauner, Jan Kara, Anna Schumaker,
	linux-fsdevel, linux-mm, Baolin Wang

On Thu, Sep 26, 2024 at 10:20:54PM +0800, Kefeng Wang wrote:
> On 2024/9/26 21:52, Matthew Wilcox wrote:
> > On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote:
> > > > So this is why I don't use mapping_set_folio_order_range() here, but
> > > > correct me if I am wrong.
> > > 
> > > Yeah, the inode is active here as the max folio size is decided based on
> > > the write size, so probably mapping_set_folio_order_range() will not be
> > > a safe option.
> > 
> > You really are all making too much of this.  Here's the patch I think we
> > need:
> > 
> > -       mapping_set_large_folios(inode->i_mapping);
> > +       if (sbinfo->huge)
> > +               mapping_set_large_folios(inode->i_mapping);
> 
> But it can't solve all issue, eg,
>   mount with huge = SHMEM_HUGE_WITHIN_SIZE, or

The page cache will not create folios which overhang the end of the file
by more than the minimum folio size for that mapping.  So this is wrong.

>   mount with SHMEM_HUGE_ALWAYS  +  runtime SHMEM_HUGE_DENY

That's a tweak to this patch, not a fundamental problem with it.

> and the above change will break
>   mount with SHMEM_HUGE_NEVER + runtime SHMEM_HUGE_FORCE

Likewise.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-26 14:58             ` Matthew Wilcox
@ 2024-09-30  1:27               ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2024-09-30  1:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Pankaj Raghav (Samsung), Andrew Morton, Hugh Dickins,
	Alexander Viro, Christian Brauner, Jan Kara, Anna Schumaker,
	linux-fsdevel, linux-mm, Baolin Wang



On 2024/9/26 22:58, Matthew Wilcox wrote:
> On Thu, Sep 26, 2024 at 10:20:54PM +0800, Kefeng Wang wrote:
>> On 2024/9/26 21:52, Matthew Wilcox wrote:
>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote:
>>>>> So this is why I don't use mapping_set_folio_order_range() here, but
>>>>> correct me if I am wrong.
>>>>
>>>> Yeah, the inode is active here as the max folio size is decided based on
>>>> the write size, so probably mapping_set_folio_order_range() will not be
>>>> a safe option.
>>>
>>> You really are all making too much of this.  Here's the patch I think we
>>> need:
>>>
>>> -       mapping_set_large_folios(inode->i_mapping);
>>> +       if (sbinfo->huge)
>>> +               mapping_set_large_folios(inode->i_mapping);
>>
>> But it can't solve all issue, eg,
>>    mount with huge = SHMEM_HUGE_WITHIN_SIZE, or
> 
> The page cache will not create folios which overhang the end of the file
> by more than the minimum folio size for that mapping.  So this is wrong.


Sorry for the late, not very familiar with it, will test after back to 
the office in next few days.

> 
>>    mount with SHMEM_HUGE_ALWAYS  +  runtime SHMEM_HUGE_DENY
> 
> That's a tweak to this patch, not a fundamental problem with it.
> 
>> and the above change will break
>>    mount with SHMEM_HUGE_NEVER + runtime SHMEM_HUGE_FORCE
> 
> Likewise.
> 

But the SHMEM_HUGE_DENY/FORCE could be changed at runtime, I don't find
a better way to fix, any more suggestion will be appreciate, thanks.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-26 13:52         ` Matthew Wilcox
  2024-09-26 14:20           ` Kefeng Wang
@ 2024-09-30  2:02           ` Baolin Wang
  2024-09-30  2:30             ` Kefeng Wang
  1 sibling, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2024-09-30  2:02 UTC (permalink / raw)
  To: Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Kefeng Wang, Andrew Morton, Hugh Dickins, Alexander Viro,
	Christian Brauner, Jan Kara, Anna Schumaker, linux-fsdevel,
	linux-mm



On 2024/9/26 21:52, Matthew Wilcox wrote:
> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote:
>>> So this is why I don't use mapping_set_folio_order_range() here, but
>>> correct me if I am wrong.
>>
>> Yeah, the inode is active here as the max folio size is decided based on
>> the write size, so probably mapping_set_folio_order_range() will not be
>> a safe option.
> 
> You really are all making too much of this.  Here's the patch I think we
> need:
> 
> +++ b/mm/shmem.c
> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>          cache_no_acl(inode);
>          if (sbinfo->noswap)
>                  mapping_set_unevictable(inode->i_mapping);
> -       mapping_set_large_folios(inode->i_mapping);
> +       if (sbinfo->huge)
> +               mapping_set_large_folios(inode->i_mapping);
> 
>          switch (mode & S_IFMT) {
>          default:

IMHO, we no longer need the the 'sbinfo->huge' validation after adding 
support for large folios in the tmpfs write and fallocate paths [1].

Kefeng, can you try if the following RFC patch [1] can solve your 
problem? Thanks.
(PS: I will revise the patch according to Matthew's suggestion)

[1] 
https://lore.kernel.org/all/c03ec1cb1392332726ab265a3d826fe1c408c7e7.1727338549.git.baolin.wang@linux.alibaba.com/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-30  2:02           ` Baolin Wang
@ 2024-09-30  2:30             ` Kefeng Wang
  2024-09-30  2:52               ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2024-09-30  2:30 UTC (permalink / raw)
  To: Baolin Wang, Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm



On 2024/9/30 10:02, Baolin Wang wrote:
> 
> 
> On 2024/9/26 21:52, Matthew Wilcox wrote:
>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote:
>>>> So this is why I don't use mapping_set_folio_order_range() here, but
>>>> correct me if I am wrong.
>>>
>>> Yeah, the inode is active here as the max folio size is decided based on
>>> the write size, so probably mapping_set_folio_order_range() will not be
>>> a safe option.
>>
>> You really are all making too much of this.  Here's the patch I think we
>> need:
>>
>> +++ b/mm/shmem.c
>> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct 
>> mnt_idmap *idmap,
>>          cache_no_acl(inode);
>>          if (sbinfo->noswap)
>>                  mapping_set_unevictable(inode->i_mapping);
>> -       mapping_set_large_folios(inode->i_mapping);
>> +       if (sbinfo->huge)
>> +               mapping_set_large_folios(inode->i_mapping);
>>
>>          switch (mode & S_IFMT) {
>>          default:
> 
> IMHO, we no longer need the the 'sbinfo->huge' validation after adding 
> support for large folios in the tmpfs write and fallocate paths [1].
> 
> Kefeng, can you try if the following RFC patch [1] can solve your 
> problem? Thanks.
> (PS: I will revise the patch according to Matthew's suggestion)

Sure, will try once I come back, but [1] won't solve the issue when set
force/deny at runtime, eg, mount with always/within_size, but set deny 
when runtime, we still fault in large chunks, but we can't allocate 
large folio, the performance of write will be degradation.


> 
> [1] https://lore.kernel.org/all/ 
> c03ec1cb1392332726ab265a3d826fe1c408c7e7.1727338549.git.baolin.wang@linux.alibaba.com/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-30  2:30             ` Kefeng Wang
@ 2024-09-30  2:52               ` Baolin Wang
  2024-09-30  3:15                 ` Kefeng Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2024-09-30  2:52 UTC (permalink / raw)
  To: Kefeng Wang, Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm



On 2024/9/30 10:30, Kefeng Wang wrote:
> 
> 
> On 2024/9/30 10:02, Baolin Wang wrote:
>>
>>
>> On 2024/9/26 21:52, Matthew Wilcox wrote:
>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote:
>>>>> So this is why I don't use mapping_set_folio_order_range() here, but
>>>>> correct me if I am wrong.
>>>>
>>>> Yeah, the inode is active here as the max folio size is decided 
>>>> based on
>>>> the write size, so probably mapping_set_folio_order_range() will not be
>>>> a safe option.
>>>
>>> You really are all making too much of this.  Here's the patch I think we
>>> need:
>>>
>>> +++ b/mm/shmem.c
>>> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct 
>>> mnt_idmap *idmap,
>>>          cache_no_acl(inode);
>>>          if (sbinfo->noswap)
>>>                  mapping_set_unevictable(inode->i_mapping);
>>> -       mapping_set_large_folios(inode->i_mapping);
>>> +       if (sbinfo->huge)
>>> +               mapping_set_large_folios(inode->i_mapping);
>>>
>>>          switch (mode & S_IFMT) {
>>>          default:
>>
>> IMHO, we no longer need the the 'sbinfo->huge' validation after adding 
>> support for large folios in the tmpfs write and fallocate paths [1].
>>
>> Kefeng, can you try if the following RFC patch [1] can solve your 
>> problem? Thanks.
>> (PS: I will revise the patch according to Matthew's suggestion)
> 
> Sure, will try once I come back, but [1] won't solve the issue when set

Cool. Thanks.

> force/deny at runtime, eg, mount with always/within_size, but set deny 
> when runtime, we still fault in large chunks, but we can't allocate 
> large folio, the performance of write will be degradation.

Yes, but as Hugh mentioned before, the options 'force' and 'deny' are 
rather testing artifacts from the old ages. So I suspect if the 'deny' 
option will be set in the real products, that means do we really need 
consider this case too much?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-30  2:52               ` Baolin Wang
@ 2024-09-30  3:15                 ` Kefeng Wang
  2024-09-30  6:48                   ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2024-09-30  3:15 UTC (permalink / raw)
  To: Baolin Wang, Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm



On 2024/9/30 10:52, Baolin Wang wrote:
> 
> 
> On 2024/9/30 10:30, Kefeng Wang wrote:
>>
>>
>> On 2024/9/30 10:02, Baolin Wang wrote:
>>>
>>>
>>> On 2024/9/26 21:52, Matthew Wilcox wrote:
>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) 
>>>> wrote:
>>>>>> So this is why I don't use mapping_set_folio_order_range() here, but
>>>>>> correct me if I am wrong.
>>>>>
>>>>> Yeah, the inode is active here as the max folio size is decided 
>>>>> based on
>>>>> the write size, so probably mapping_set_folio_order_range() will 
>>>>> not be
>>>>> a safe option.
>>>>
>>>> You really are all making too much of this.  Here's the patch I 
>>>> think we
>>>> need:
>>>>
>>>> +++ b/mm/shmem.c
>>>> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct 
>>>> mnt_idmap *idmap,
>>>>          cache_no_acl(inode);
>>>>          if (sbinfo->noswap)
>>>>                  mapping_set_unevictable(inode->i_mapping);
>>>> -       mapping_set_large_folios(inode->i_mapping);
>>>> +       if (sbinfo->huge)
>>>> +               mapping_set_large_folios(inode->i_mapping);
>>>>
>>>>          switch (mode & S_IFMT) {
>>>>          default:
>>>
>>> IMHO, we no longer need the the 'sbinfo->huge' validation after 
>>> adding support for large folios in the tmpfs write and fallocate 
>>> paths [1].

Forget to mention, we still need to check sbinfo->huge, if mount with
huge=never, but we fault in large chunk, write is slower than without
9aac777aaf94, the above changes or my patch could fix it.

>>>
>>> Kefeng, can you try if the following RFC patch [1] can solve your 
>>> problem? Thanks.
>>> (PS: I will revise the patch according to Matthew's suggestion)
>>
>> Sure, will try once I come back, but [1] won't solve the issue when set
> 
> Cool. Thanks.
> 
>> force/deny at runtime, eg, mount with always/within_size, but set deny 
>> when runtime, we still fault in large chunks, but we can't allocate 
>> large folio, the performance of write will be degradation.
> 
> Yes, but as Hugh mentioned before, the options 'force' and 'deny' are 
> rather testing artifacts from the old ages. So I suspect if the 'deny' 
> option will be set in the real products, that means do we really need 
> consider this case too much?

OK, so maybe just update the document.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-30  3:15                 ` Kefeng Wang
@ 2024-09-30  6:48                   ` Baolin Wang
  2024-10-09  7:09                     ` Kefeng Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2024-09-30  6:48 UTC (permalink / raw)
  To: Kefeng Wang, Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm



On 2024/9/30 11:15, Kefeng Wang wrote:
> 
> 
> On 2024/9/30 10:52, Baolin Wang wrote:
>>
>>
>> On 2024/9/30 10:30, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/9/30 10:02, Baolin Wang wrote:
>>>>
>>>>
>>>> On 2024/9/26 21:52, Matthew Wilcox wrote:
>>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) 
>>>>> wrote:
>>>>>>> So this is why I don't use mapping_set_folio_order_range() here, but
>>>>>>> correct me if I am wrong.
>>>>>>
>>>>>> Yeah, the inode is active here as the max folio size is decided 
>>>>>> based on
>>>>>> the write size, so probably mapping_set_folio_order_range() will 
>>>>>> not be
>>>>>> a safe option.
>>>>>
>>>>> You really are all making too much of this.  Here's the patch I 
>>>>> think we
>>>>> need:
>>>>>
>>>>> +++ b/mm/shmem.c
>>>>> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct 
>>>>> mnt_idmap *idmap,
>>>>>          cache_no_acl(inode);
>>>>>          if (sbinfo->noswap)
>>>>>                  mapping_set_unevictable(inode->i_mapping);
>>>>> -       mapping_set_large_folios(inode->i_mapping);
>>>>> +       if (sbinfo->huge)
>>>>> +               mapping_set_large_folios(inode->i_mapping);
>>>>>
>>>>>          switch (mode & S_IFMT) {
>>>>>          default:
>>>>
>>>> IMHO, we no longer need the the 'sbinfo->huge' validation after 
>>>> adding support for large folios in the tmpfs write and fallocate 
>>>> paths [1].
> 
> Forget to mention, we still need to check sbinfo->huge, if mount with
> huge=never, but we fault in large chunk, write is slower than without
> 9aac777aaf94, the above changes or my patch could fix it.

My patch will allow allocating large folios in the tmpfs write and 
fallocate paths though the 'huge' option is 'never'.

My initial thought for supporting large folio is that, if the 'huge' 
option is enabled, to maintain backward compatibility, we only allow 2M 
PMD-sized order allocations. If the 'huge' option is 
disabled(huge=never), we still allow large folio allocations based on 
the write length.

Another choice is to allow the different sized large folio allocation 
based on the write length when the 'huge' option is enabled, rather than 
just the 2M PMD sized. But will force the huge orders off if 'huge' 
option is disabled.

Still need some discussions to determine which method is preferable.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-09-30  6:48                   ` Baolin Wang
@ 2024-10-09  7:09                     ` Kefeng Wang
  2024-10-09  8:52                       ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2024-10-09  7:09 UTC (permalink / raw)
  To: Baolin Wang, Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm



On 2024/9/30 14:48, Baolin Wang wrote:
> 
> 
> On 2024/9/30 11:15, Kefeng Wang wrote:
>>
>>
>> On 2024/9/30 10:52, Baolin Wang wrote:
>>>
>>>
>>> On 2024/9/30 10:30, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2024/9/30 10:02, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/9/26 21:52, Matthew Wilcox wrote:
>>>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) 
>>>>>> wrote:
>>>>>>>> So this is why I don't use mapping_set_folio_order_range() here, 
>>>>>>>> but
>>>>>>>> correct me if I am wrong.
>>>>>>>
>>>>>>> Yeah, the inode is active here as the max folio size is decided 
>>>>>>> based on
>>>>>>> the write size, so probably mapping_set_folio_order_range() will 
>>>>>>> not be
>>>>>>> a safe option.
>>>>>>
>>>>>> You really are all making too much of this.  Here's the patch I 
>>>>>> think we
>>>>>> need:
>>>>>>
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2831,7 +2831,8 @@ static struct inode 
>>>>>> *__shmem_get_inode(struct mnt_idmap *idmap,
>>>>>>          cache_no_acl(inode);
>>>>>>          if (sbinfo->noswap)
>>>>>>                  mapping_set_unevictable(inode->i_mapping);
>>>>>> -       mapping_set_large_folios(inode->i_mapping);
>>>>>> +       if (sbinfo->huge)
>>>>>> +               mapping_set_large_folios(inode->i_mapping);
>>>>>>
>>>>>>          switch (mode & S_IFMT) {
>>>>>>          default:
>>>>>
>>>>> IMHO, we no longer need the the 'sbinfo->huge' validation after 
>>>>> adding support for large folios in the tmpfs write and fallocate 
>>>>> paths [1].
>>
>> Forget to mention, we still need to check sbinfo->huge, if mount with
>> huge=never, but we fault in large chunk, write is slower than without
>> 9aac777aaf94, the above changes or my patch could fix it.
> 
> My patch will allow allocating large folios in the tmpfs write and 
> fallocate paths though the 'huge' option is 'never'.

Yes, indeed after checking your patch,

The Writing intelligently from 'Bonnie -d /mnt/tmpfs/ -s 1024' based on 
next-20241008,

1) huge=never
    the base:                                    2016438 K/Sec
    my v1/v2 or Matthew's patch :                2874504 K/Sec
    your patch with filemap_get_order() fix:     6330604 K/Sec

2) huge=always
    the write performance:                       7168917 K/Sec

Since large folios supported in the tmpfs write, we do have better 
performance shown above, that's great.

> 
> My initial thought for supporting large folio is that, if the 'huge' 
> option is enabled, to maintain backward compatibility, we only allow 2M 
> PMD-sized order allocations. If the 'huge' option is 
> disabled(huge=never), we still allow large folio allocations based on 
> the write length.
> 
> Another choice is to allow the different sized large folio allocation 
> based on the write length when the 'huge' option is enabled, rather than 
> just the 2M PMD sized. But will force the huge orders off if 'huge' 
> option is disabled.
> 

"huge=never  Do not allocate huge pages. This is the default."
 From the document, it's better not to allocate large folio, but we need
some special handle for huge=never or runtime deny/force.

> Still need some discussions to determine which method is preferable.

Personally. I like your current implementation, but it does not match 
document.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed
  2024-10-09  7:09                     ` Kefeng Wang
@ 2024-10-09  8:52                       ` Baolin Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Baolin Wang @ 2024-10-09  8:52 UTC (permalink / raw)
  To: Kefeng Wang, Matthew Wilcox, Pankaj Raghav (Samsung)
  Cc: Andrew Morton, Hugh Dickins, Alexander Viro, Christian Brauner,
	Jan Kara, Anna Schumaker, linux-fsdevel, linux-mm



On 2024/10/9 15:09, Kefeng Wang wrote:
> 
> 
> On 2024/9/30 14:48, Baolin Wang wrote:
>>
>>
>> On 2024/9/30 11:15, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/9/30 10:52, Baolin Wang wrote:
>>>>
>>>>
>>>> On 2024/9/30 10:30, Kefeng Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/9/30 10:02, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 2024/9/26 21:52, Matthew Wilcox wrote:
>>>>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) 
>>>>>>> wrote:
>>>>>>>>> So this is why I don't use mapping_set_folio_order_range() 
>>>>>>>>> here, but
>>>>>>>>> correct me if I am wrong.
>>>>>>>>
>>>>>>>> Yeah, the inode is active here as the max folio size is decided 
>>>>>>>> based on
>>>>>>>> the write size, so probably mapping_set_folio_order_range() will 
>>>>>>>> not be
>>>>>>>> a safe option.
>>>>>>>
>>>>>>> You really are all making too much of this.  Here's the patch I 
>>>>>>> think we
>>>>>>> need:
>>>>>>>
>>>>>>> +++ b/mm/shmem.c
>>>>>>> @@ -2831,7 +2831,8 @@ static struct inode 
>>>>>>> *__shmem_get_inode(struct mnt_idmap *idmap,
>>>>>>>          cache_no_acl(inode);
>>>>>>>          if (sbinfo->noswap)
>>>>>>>                  mapping_set_unevictable(inode->i_mapping);
>>>>>>> -       mapping_set_large_folios(inode->i_mapping);
>>>>>>> +       if (sbinfo->huge)
>>>>>>> +               mapping_set_large_folios(inode->i_mapping);
>>>>>>>
>>>>>>>          switch (mode & S_IFMT) {
>>>>>>>          default:
>>>>>>
>>>>>> IMHO, we no longer need the the 'sbinfo->huge' validation after 
>>>>>> adding support for large folios in the tmpfs write and fallocate 
>>>>>> paths [1].
>>>
>>> Forget to mention, we still need to check sbinfo->huge, if mount with
>>> huge=never, but we fault in large chunk, write is slower than without
>>> 9aac777aaf94, the above changes or my patch could fix it.
>>
>> My patch will allow allocating large folios in the tmpfs write and 
>> fallocate paths though the 'huge' option is 'never'.
> 
> Yes, indeed after checking your patch,
> 
> The Writing intelligently from 'Bonnie -d /mnt/tmpfs/ -s 1024' based on 
> next-20241008,
> 
> 1) huge=never
>     the base:                                    2016438 K/Sec
>     my v1/v2 or Matthew's patch :                2874504 K/Sec
>     your patch with filemap_get_order() fix:     6330604 K/Sec
> 
> 2) huge=always
>     the write performance:                       7168917 K/Sec
> 
> Since large folios supported in the tmpfs write, we do have better 
> performance shown above, that's great.

Great. Thanks for testing.

>> My initial thought for supporting large folio is that, if the 'huge' 
>> option is enabled, to maintain backward compatibility, we only allow 
>> 2M PMD-sized order allocations. If the 'huge' option is 
>> disabled(huge=never), we still allow large folio allocations based on 
>> the write length.
>>
>> Another choice is to allow the different sized large folio allocation 
>> based on the write length when the 'huge' option is enabled, rather 
>> than just the 2M PMD sized. But will force the huge orders off if 
>> 'huge' option is disabled.
>>
> 
> "huge=never  Do not allocate huge pages. This is the default."
>  From the document, it's better not to allocate large folio, but we need
> some special handle for huge=never or runtime deny/force.

Yes. I'm thinking of adding a new option (something like 'huge=mTHP') to 
allocate large folios based on the write size.

I will resend the patchset, and we can discuss it there.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3] tmpfs: don't enable large folios if not supported
  2024-09-20 14:36 ` [PATCH v2] " Kefeng Wang
  2024-09-22  0:35   ` Matthew Wilcox
@ 2024-10-11  6:59   ` Kefeng Wang
  2024-10-12  3:59     ` Baolin Wang
  2024-10-17 14:17   ` [PATCH v4] " Kefeng Wang
  2 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2024-10-11  6:59 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-fsdevel, linux-mm, Baolin Wang, David Hildenbrand,
	Kefeng Wang

The tmpfs could support large folio, but there is some configurable
options(mount options and runtime deny/force) to enable/disable large
folio allocation, so there is a performance issue when perform write
without large folio, the issue is similar to commit 4e527d5841e2
("iomap: fault in smaller chunks for non-large folio mappings").

Don't call mapping_set_large_folios() in __shmem_get_inode() when
large folio is disabled to fix it.

Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---

v3:
- don't enable large folio suppport in __shmem_get_inode() if disabled,
  suggested by Matthew.

v2:
- Don't use IOCB flags

 mm/shmem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 0a2f78c2b919..2b859ac4ddc5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2850,7 +2850,10 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
 	cache_no_acl(inode);
 	if (sbinfo->noswap)
 		mapping_set_unevictable(inode->i_mapping);
-	mapping_set_large_folios(inode->i_mapping);
+
+	if ((sbinfo->huge && shmem_huge != SHMEM_HUGE_DENY) ||
+	    shmem_huge == SHMEM_HUGE_FORCE)
+		mapping_set_large_folios(inode->i_mapping);
 
 	switch (mode & S_IFMT) {
 	default:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] tmpfs: don't enable large folios if not supported
  2024-10-11  6:59   ` [PATCH v3] tmpfs: don't enable large folios if not supported Kefeng Wang
@ 2024-10-12  3:59     ` Baolin Wang
  2024-10-14  2:36       ` Kefeng Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2024-10-12  3:59 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Hugh Dickins
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-fsdevel, linux-mm, David Hildenbrand



On 2024/10/11 14:59, Kefeng Wang wrote:
> The tmpfs could support large folio, but there is some configurable
> options(mount options and runtime deny/force) to enable/disable large
> folio allocation, so there is a performance issue when perform write
> without large folio, the issue is similar to commit 4e527d5841e2
> ("iomap: fault in smaller chunks for non-large folio mappings").
> 
> Don't call mapping_set_large_folios() in __shmem_get_inode() when
> large folio is disabled to fix it.
> 
> Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios")
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> 
> v3:
> - don't enable large folio suppport in __shmem_get_inode() if disabled,
>    suggested by Matthew.
> 
> v2:
> - Don't use IOCB flags
> 
>   mm/shmem.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0a2f78c2b919..2b859ac4ddc5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2850,7 +2850,10 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>   	cache_no_acl(inode);
>   	if (sbinfo->noswap)
>   		mapping_set_unevictable(inode->i_mapping);
> -	mapping_set_large_folios(inode->i_mapping);
> +
> +	if ((sbinfo->huge && shmem_huge != SHMEM_HUGE_DENY) ||
> +	    shmem_huge == SHMEM_HUGE_FORCE)
> +		mapping_set_large_folios(inode->i_mapping);

IMHO, I'm still a little concerned about the 'shmem_huge' validation. 
Since the 'shmem_huge' can be set at runtime, that means file mapping 
with 'huge=always' option might miss the opportunity to allocate large 
folios if the 'shmem_huge' is changed from 'deny' from 'always' at runtime.

So I'd like to drop the 'shmem_huge' validation and add some comments to 
indicate 'deny' and 'force' options are only for testing purpose and 
performence issue should not be a problem in the real production 
environments.

That's just my 2 cents:)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] tmpfs: don't enable large folios if not supported
  2024-10-12  3:59     ` Baolin Wang
@ 2024-10-14  2:36       ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2024-10-14  2:36 UTC (permalink / raw)
  To: Baolin Wang, Andrew Morton, Hugh Dickins
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-fsdevel, linux-mm, David Hildenbrand



On 2024/10/12 11:59, Baolin Wang wrote:
> 
> 
> On 2024/10/11 14:59, Kefeng Wang wrote:
>> The tmpfs could support large folio, but there is some configurable
>> options(mount options and runtime deny/force) to enable/disable large
>> folio allocation, so there is a performance issue when perform write
>> without large folio, the issue is similar to commit 4e527d5841e2
>> ("iomap: fault in smaller chunks for non-large folio mappings").
>>
>> Don't call mapping_set_large_folios() in __shmem_get_inode() when
>> large folio is disabled to fix it.
>>
>> Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to 
>> support large folios")
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>
>> v3:
>> - don't enable large folio suppport in __shmem_get_inode() if disabled,
>>    suggested by Matthew.
>>
>> v2:
>> - Don't use IOCB flags
>>
>>   mm/shmem.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 0a2f78c2b919..2b859ac4ddc5 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2850,7 +2850,10 @@ static struct inode *__shmem_get_inode(struct 
>> mnt_idmap *idmap,
>>       cache_no_acl(inode);
>>       if (sbinfo->noswap)
>>           mapping_set_unevictable(inode->i_mapping);
>> -    mapping_set_large_folios(inode->i_mapping);
>> +
>> +    if ((sbinfo->huge && shmem_huge != SHMEM_HUGE_DENY) ||
>> +        shmem_huge == SHMEM_HUGE_FORCE)
>> +        mapping_set_large_folios(inode->i_mapping);
> 
> IMHO, I'm still a little concerned about the 'shmem_huge' validation. 
> Since the 'shmem_huge' can be set at runtime, that means file mapping 
> with 'huge=always' option might miss the opportunity to allocate large 
> folios if the 'shmem_huge' is changed from 'deny' from 'always' at runtime.
> 
> So I'd like to drop the 'shmem_huge' validation and add some comments to 
> indicate 'deny' and 'force' options are only for testing purpose and 
> performence issue should not be a problem in the real production 
> environments.

No strange opinion, the previous version could cover the runtime deny/
force, but it is a little complicated as Matthew pointed, if no other
comments, I will drop the shmem_huge check.

> 
> That's just my 2 cents:)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4] tmpfs: don't enable large folios if not supported
  2024-09-20 14:36 ` [PATCH v2] " Kefeng Wang
  2024-09-22  0:35   ` Matthew Wilcox
  2024-10-11  6:59   ` [PATCH v3] tmpfs: don't enable large folios if not supported Kefeng Wang
@ 2024-10-17 14:17   ` Kefeng Wang
  2024-10-18  1:48     ` Baolin Wang
  2 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2024-10-17 14:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-fsdevel, linux-mm, Baolin Wang, David Hildenbrand,
	Kefeng Wang

The tmpfs could support large folio, but there is some configurable
options(mount options and runtime deny/force) to enable/disable large
folio allocation, so there is a performance issue when perform write
without large folio, the issue is similar to commit 4e527d5841e2
("iomap: fault in smaller chunks for non-large folio mappings").

Since 'deny' for emergencies and 'force' for testing, performence issue
should not be a problem in the real production environments, so only
don't call mapping_set_large_folios() in __shmem_get_inode() when
large folio is disabled with mount huge=never option(default policy).

Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
v4:
- only fix mount huge=never since runtime deny/force just for
  emergencies/testing, suggested by Baolin
v3:
- don't enable large folio suppport in __shmem_get_inode() if disabled,
  suggested by Matthew.
v2:
- Don't use IOCB flags

 mm/shmem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index e933327d8dac..74ef214dc1a7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2827,7 +2827,10 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
 	cache_no_acl(inode);
 	if (sbinfo->noswap)
 		mapping_set_unevictable(inode->i_mapping);
-	mapping_set_large_folios(inode->i_mapping);
+
+	/* Don't consider 'deny' for emergencies and 'force' for testing */
+	if (sbinfo->huge)
+		mapping_set_large_folios(inode->i_mapping);
 
 	switch (mode & S_IFMT) {
 	default:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] tmpfs: don't enable large folios if not supported
  2024-10-17 14:17   ` [PATCH v4] " Kefeng Wang
@ 2024-10-18  1:48     ` Baolin Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Baolin Wang @ 2024-10-18  1:48 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Hugh Dickins
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-fsdevel, linux-mm, David Hildenbrand



On 2024/10/17 22:17, Kefeng Wang wrote:
> The tmpfs could support large folio, but there is some configurable
> options(mount options and runtime deny/force) to enable/disable large
> folio allocation, so there is a performance issue when perform write
> without large folio, the issue is similar to commit 4e527d5841e2
> ("iomap: fault in smaller chunks for non-large folio mappings").
> 
> Since 'deny' for emergencies and 'force' for testing, performence issue
> should not be a problem in the real production environments, so only
> don't call mapping_set_large_folios() in __shmem_get_inode() when
> large folio is disabled with mount huge=never option(default policy).
> 
> Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios")
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

LGTM. Thanks.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
> v4:
> - only fix mount huge=never since runtime deny/force just for
>    emergencies/testing, suggested by Baolin
> v3:
> - don't enable large folio suppport in __shmem_get_inode() if disabled,
>    suggested by Matthew.
> v2:
> - Don't use IOCB flags
> 
>   mm/shmem.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index e933327d8dac..74ef214dc1a7 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2827,7 +2827,10 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>   	cache_no_acl(inode);
>   	if (sbinfo->noswap)
>   		mapping_set_unevictable(inode->i_mapping);
> -	mapping_set_large_folios(inode->i_mapping);
> +
> +	/* Don't consider 'deny' for emergencies and 'force' for testing */
> +	if (sbinfo->huge)
> +		mapping_set_large_folios(inode->i_mapping);
>   
>   	switch (mode & S_IFMT) {
>   	default:

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-10-18  1:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-14 14:06 [PATCH -next] tmpfs: fault in smaller chunks if large folio allocation not allowed Kefeng Wang
2024-09-15 10:40 ` Matthew Wilcox
2024-09-18  3:55   ` Kefeng Wang
2024-09-20 14:36 ` [PATCH v2] " Kefeng Wang
2024-09-22  0:35   ` Matthew Wilcox
2024-09-23  1:39     ` Kefeng Wang
2024-09-26  8:38       ` Pankaj Raghav (Samsung)
2024-09-26 13:52         ` Matthew Wilcox
2024-09-26 14:20           ` Kefeng Wang
2024-09-26 14:58             ` Matthew Wilcox
2024-09-30  1:27               ` Kefeng Wang
2024-09-30  2:02           ` Baolin Wang
2024-09-30  2:30             ` Kefeng Wang
2024-09-30  2:52               ` Baolin Wang
2024-09-30  3:15                 ` Kefeng Wang
2024-09-30  6:48                   ` Baolin Wang
2024-10-09  7:09                     ` Kefeng Wang
2024-10-09  8:52                       ` Baolin Wang
2024-10-11  6:59   ` [PATCH v3] tmpfs: don't enable large folios if not supported Kefeng Wang
2024-10-12  3:59     ` Baolin Wang
2024-10-14  2:36       ` Kefeng Wang
2024-10-17 14:17   ` [PATCH v4] " Kefeng Wang
2024-10-18  1:48     ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).