All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Li Zefan <lizefan@huawei.com>
Cc: linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Theodore Ts'o <tytso@mit.edu>,
	Andrew Morton <akpm@linux-foundation.org>,
	andi@firstfloor.org, Wuqixuan <wuqixuan@huawei.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	gregkh@linuxfoundation.org
Subject: Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex
Date: Tue, 19 Feb 2013 20:33:44 +0800	[thread overview]
Message-ID: <20130219123344.GA18350@gmail.com> (raw)
In-Reply-To: <5122D3E0.6070800@huawei.com>

On Tue, Feb 19, 2013 at 09:22:40AM +0800, Li Zefan wrote:
> There's a long long-standing bug...As long as I don't know when it dates
> from.
> 
> I've written and attached a simple program to reproduce this bug, and it can
> immediately trigger the bug in my box. It uses two threads, one keeps calling
> read(), and the other calling readdir(), both on the same directory fd.

Hi Zefan,

Out of curiosity, why do you call read(2) on a directory fd?  I only
open(2) a directory in order to execute a flush operation to make sure
that a file is really created.

Regards,
                                                - Zheng

> 
> When I ran it on ext3 (can be replaced with ext2/ext4) which has _dir_index_
> feature disabled, I got this:
> 
> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0
> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0
> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0
> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0
> ...
> 
> If we configured errors=remount-ro, the filesystem will become read-only.
> 
> SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
> {
> 	...
> 		loff_t pos = file_pos_read(file);
> 		ret = vfs_read(file, buf, count, &pos);
> 		file_pos_write(file, pos);
> 		fput_light(file, fput_needed);
> 	...
> }
> 
> While readdir() is protected with i_mutex, f_pos can be changed without any locking
> in various read()/write() syscalls, which leads to this bug.
> 
> What makes things worse is Andi removed i_mutex from generic_file_llseek, so you
> can trigger the same bug by replacing read() with lseek() in the test program.
> 
> commit ef3d0fd27e90f67e35da516dafc1482c82939a60
> Author: Andi Kleen <ak@linux.intel.com>
> Date:   Thu Sep 15 16:06:48 2011 -0700
> 
>     vfs: do (nearly) lockless generic_file_llseek
> 
> I've tested ext3 with dir_index enabled and btrfs, nothing bad happened, but there
> should be some other vulnerabilities. For example, running the test program on /sys
> for a few minutes triggered this warning:
> 
> [  917.994600] ------------[ cut here ]------------
> [  917.994614] WARNING: at fs/sysfs/sysfs.h:195 sysfs_readdir+0x24c/0x260()
> [  917.994621] Hardware name: Tecal RH2285
> ...
> [  917.994725] Pid: 8754, comm: a.out Not tainted 3.8.0-rc2-tj-0.7-default+ #69
> [  917.994731] Call Trace:
> [  917.994736]  [<ffffffff81205c6c>] ? sysfs_readdir+0x24c/0x260
> [  917.994743]  [<ffffffff81205c6c>] ? sysfs_readdir+0x24c/0x260
> [  917.994752]  [<ffffffff81041fff>] warn_slowpath_common+0x7f/0xc0
> [  917.994759]  [<ffffffff8104205a>] warn_slowpath_null+0x1a/0x20
> [  917.994766]  [<ffffffff81205c6c>] sysfs_readdir+0x24c/0x260
> [  917.994774]  [<ffffffff8119cbd0>] ? sys_ioctl+0x90/0x90
> [  917.994780]  [<ffffffff8119cbd0>] ? sys_ioctl+0x90/0x90
> [  917.994787]  [<ffffffff8119cfc1>] vfs_readdir+0xb1/0xd0
> [  917.994794]  [<ffffffff8119d07b>] sys_getdents64+0x9b/0x110
> [  917.994803]  [<ffffffff814a45d9>] system_call_fastpath+0x16/0x1b
> [  917.994809] ---[ end trace 6efe15a65b89022a ]---
> [  917.994816] ida_remove called for id=13073 which is not allocated.
> 
> 
> We can fix this bug in each filesystem, but can't we just make sure i_mutex is
> acquired in lseek(), read(), write() and readdir() for directory file operations?
> 
> (the patch is for demonstration only)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index bb34af3..41f76e5 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -218,14 +218,25 @@ EXPORT_SYMBOL(default_llseek);
>  
>  loff_t vfs_llseek(struct file *file, loff_t offset, int whence)
>  {
> +	struct inode *inode = file->f_path.dentry->d_inode;
>  	loff_t (*fn)(struct file *, loff_t, int);
> +	int ret;
>  
>  	fn = no_llseek;
>  	if (file->f_mode & FMODE_LSEEK) {
>  		if (file->f_op && file->f_op->llseek)
>  			fn = file->f_op->llseek;
>  	}
> -	return fn(file, offset, whence);
> +
> +	if (S_ISDIR(inode->i_mode)) {
> +		mutex_lock(&inode->i_mutex);
> +		ret = fn(file, offset, whence);
> +		mutex_unlock(&inode->i_mutex);
> +	} else {
> +		ret = fn(file, offset, whence);
> +	}
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL(vfs_llseek);
>  
> @@ -442,12 +453,32 @@ EXPORT_SYMBOL(vfs_write);
>  
>  static inline loff_t file_pos_read(struct file *file)
>  {
> -	return file->f_pos;
> +	struct inode *inode = file->f_path.dentry->d_inode;
> +	loff_t pos;
> +
> +	if (S_ISDIR(inode->i_mode)) {
> +		mutex_lock(&inode->i_mutex);
> +		pos = file->f_pos;
> +		mutex_unlock(&inode->i_mutex);
> +	} else {
> +		pos = file->f_pos;
> +	}
> +
> +	return pos;
>  }
>  
>  static inline void file_pos_write(struct file *file, loff_t pos)
>  {
> -	file->f_pos = pos;
> +	struct inode *inode = file->f_path.dentry->d_inode;
> +
> +	if (S_ISDIR(inode->i_mode)) {
> +		mutex_lock(&inode->i_mutex);
> +		file->f_pos = pos;
> +		file->f_version = 0;
> +		mutex_unlock(&inode->i_mutex);
> +	} else {
> +		file->f_pos = pos;
> +	}
>  }
>  
>  SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
> 
> 

> #include <unistd.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <dirent.h>
> #include <fcntl.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> 
> int main(int argc, char *argv[])
> {
> 	int fd;
> 	int ret;
> 	DIR *dir;
> 	struct dirent *ptr;
> 
> 	if (argc != 2)
> 		errx(1, "Please specify a directory");
> 
> 	dir = opendir(argv[1]);
> 	if (!dir)
> 		err(1, "Failed to open directory %s", argv[1]);
> 
> 	fd = dirfd(dir);
> 	if (fd < 0)
> 		err(1, "Failed to get dirfd");
> 
> 	ret = fork();
> 	if (ret == 0) {
> 		char buf[100];
> 
> 		while (1)
> 			read(fd, buf, 100);
> 	} else {
> 		int ret2;
> 
> 		while (1) {
> 			ret2 = lseek(fd, 0, SEEK_SET);
> 			if (ret2 < -1)
> 				err(1, "seek failed");
> 
> 			while (ptr = readdir(dir))
> 				;
> 		}
> 	}	
> 
> 	closedir(dir);
> 	return 0;
> }

  parent reply	other threads:[~2013-02-19 12:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-19  1:22 [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex Li Zefan
2013-02-19  4:06 ` Miao Xie
2013-02-19  9:19 ` Jan Kara
2013-02-19 11:47   ` Li Zefan
2013-02-19 12:59     ` Jan Kara
2013-02-20  1:49       ` Li Zefan
2013-02-19 11:48   ` Li Zefan
2013-02-19 12:33 ` Zheng Liu [this message]
2013-02-19 12:43   ` Li Zefan
2013-02-23 17:35 ` [RFC] f_pos in readdir() (was Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex) Al Viro
2013-02-25  6:09   ` Li Zefan
2013-02-25 18:25   ` Zach Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130219123344.GA18350@gmail.com \
    --to=gnehzuil.liu@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=tytso@mit.edu \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=wuqixuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.