linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: yangerkun <yangerkun@huaweicloud.com>
To: Jan Kara <jack@suse.cz>, yangerkun <yangerkun@huawei.com>
Cc: hch@infradead.org, chuck.lever@oracle.com, brauner@kernel.org,
	viro@zeniv.linux.org.uk, hughd@google.com, zlang@kernel.org,
	fdmanana@suse.com, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH] libfs: fix infinite directory reads for offset dir
Date: Wed, 31 Jul 2024 20:51:05 +0800	[thread overview]
Message-ID: <57de6354-f53d-d106-aed8-9dff3e88efa6@huaweicloud.com> (raw)
In-Reply-To: <20240731115134.tkiklyu72lwnhbxg@quack3>

Hi!

在 2024/7/31 19:51, Jan Kara 写道:
> On Wed 31-07-24 12:38:35, yangerkun wrote:
>> After we switch tmpfs dir operations from simple_dir_operations to
>> simple_offset_dir_operations, every rename happened will fill new dentry
>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
>> key starting with octx->newx_offset, and then set newx_offset equals to
>> free key + 1. This will lead to infinite readdir combine with rename
>> happened at the same time, which fail generic/736 in xfstests(detail show
>> as below).
>>
>> 1. create 5000 files(1 2 3...) under one dir
>> 2. call readdir(man 3 readdir) once, and get one entry
>> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
>> 4. loop 2~3, until readdir return nothing or we loop too many
>>     times(tmpfs break test with the second condition)
>>
>> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
>> directory reads") to fix it, record the last_index when we open dir, and
>> do not emit the entry which index >= last_index. The file->private_data
>> now used in offset dir can use directly to do this, and we also update
>> the last_index when we llseek the dir file.
> 
> The patch looks good! Just I'm not sure about the llseek part. As far as I
> understand it was added due to this sentence in the standard:
> 
> "If a file is removed from or added to the directory after the most recent
> call to opendir() or rewinddir(), whether a subsequent call to readdir()
> returns an entry for that file is unspecified."
> 
> So if the offset used in offset_dir_llseek() is 0, then we should update
> last_index. But otherwise I'd leave it alone because IMHO it would do more
> harm than good.

IIUC, what you means is that we should only reset the private_data to
new last_index when we call rewinddir(which will call lseek to set
offset of dir file to 0)?

Yeah, I prefer the logic you describle! Besides, we may also change
btrfs that do the same(e60aa5da14d0 ("btrfs: refresh dir last index
during a rewinddir(3) call")). Filipe, how do you think?

Thanks,
Erkun.

> 								Honza
> 
>>
>> Fixes: a2e459555c5f ("shmem: stable directory offsets")
>> Signed-off-by: yangerkun <yangerkun@huawei.com>
>> ---
>>   fs/libfs.c | 34 +++++++++++++++++++++++-----------
>>   1 file changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/libfs.c b/fs/libfs.c
>> index 8aa34870449f..38b306738c00 100644
>> --- a/fs/libfs.c
>> +++ b/fs/libfs.c
>> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>   	mtree_destroy(&octx->mt);
>>   }
>>   
>> +static int offset_dir_open(struct inode *inode, struct file *file)
>> +{
>> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>> +
>> +	file->private_data = (void *)ctx->next_offset;
>> +	return 0;
>> +}
>> +
>>   /**
>>    * offset_dir_llseek - Advance the read position of a directory descriptor
>>    * @file: an open directory whose position is to be updated
>> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>    */
>>   static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>   {
>> +	struct inode *inode = file->f_inode;
>> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>> +
>>   	switch (whence) {
>>   	case SEEK_CUR:
>>   		offset += file->f_pos;
>> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>   	}
>>   
>>   	/* In this case, ->private_data is protected by f_pos_lock */
>> -	file->private_data = NULL;
>> +	file->private_data = (void *)ctx->next_offset;
>>   	return vfs_setpos(file, offset, LONG_MAX);
>>   }
>>   
>> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>>   			  inode->i_ino, fs_umode_to_dtype(inode->i_mode));
>>   }
>>   
>> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
>>   {
>>   	struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
>>   	struct dentry *dentry;
>> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>   	while (true) {
>>   		dentry = offset_find_next(octx, ctx->pos);
>>   		if (!dentry)
>> -			return ERR_PTR(-ENOENT);
>> +			return;
>> +
>> +		if (dentry2offset(dentry) >= last_index) {
>> +			dput(dentry);
>> +			return;
>> +		}
>>   
>>   		if (!offset_dir_emit(ctx, dentry)) {
>>   			dput(dentry);
>> -			break;
>> +			return;
>>   		}
>>   
>>   		ctx->pos = dentry2offset(dentry) + 1;
>>   		dput(dentry);
>>   	}
>> -	return NULL;
>>   }
>>   
>>   /**
>> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>   static int offset_readdir(struct file *file, struct dir_context *ctx)
>>   {
>>   	struct dentry *dir = file->f_path.dentry;
>> +	long last_index = (long)file->private_data;
>>   
>>   	lockdep_assert_held(&d_inode(dir)->i_rwsem);
>>   
>>   	if (!dir_emit_dots(file, ctx))
>>   		return 0;
>>   
>> -	/* In this case, ->private_data is protected by f_pos_lock */
>> -	if (ctx->pos == DIR_OFFSET_MIN)
>> -		file->private_data = NULL;
>> -	else if (file->private_data == ERR_PTR(-ENOENT))
>> -		return 0;
>> -	file->private_data = offset_iterate_dir(d_inode(dir), ctx);
>> +	offset_iterate_dir(d_inode(dir), ctx, last_index);
>>   	return 0;
>>   }
>>   
>>   const struct file_operations simple_offset_dir_operations = {
>> +	.open		= offset_dir_open,
>>   	.llseek		= offset_dir_llseek,
>>   	.iterate_shared	= offset_readdir,
>>   	.read		= generic_read_dir,
>> -- 
>> 2.39.2
>>



  reply	other threads:[~2024-07-31 12:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31  4:38 [PATCH] libfs: fix infinite directory reads for offset dir yangerkun
2024-07-31 11:51 ` Jan Kara
2024-07-31 12:51   ` yangerkun [this message]
2024-07-31 13:04     ` Jan Kara
2024-07-31 13:10     ` Filipe Manana
2024-08-01  3:15       ` yangerkun
2024-07-31 13:44 ` Chuck Lever
2024-07-31 14:16 ` Christian Brauner
2024-07-31 15:37   ` Jan Kara
2024-08-01  3:32   ` yangerkun
2024-08-01 13:30     ` Jan Kara
2024-08-01 13:38       ` yangerkun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57de6354-f53d-d106-aed8-9dff3e88efa6@huaweicloud.com \
    --to=yangerkun@huaweicloud.com \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=fdmanana@suse.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yangerkun@huawei.com \
    --cc=zlang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).