linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled.
@ 2013-03-28 10:34 Tao Ma
  2013-03-28 10:34 ` [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based Tao Ma
  2013-03-28 18:33 ` [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled Zach Brown
  0 siblings, 2 replies; 7+ messages in thread
From: Tao Ma @ 2013-03-28 10:34 UTC (permalink / raw)
  To: linux-ext4; +Cc: zab

From: Tao Ma <boyu.mt@taobao.com>

Zach reported a problem that if inline data is enabled, we don't
tell the difference between the offset of '.' and '..'. And a
getdents will fail if the user only want to get '.'.

This patch adds a new offset EXT4_INLINE_DOTDOT_OFFSET which
indicates the offset of inline "..", and now 0 is for the "."
and EXT4_INLINE_DOTDOT_OFFSET is for "..".

Reported-by: Zach Brown <zab@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/inline.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index c0fd1a1..9c09dd5 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -19,7 +19,8 @@
 
 #define EXT4_XATTR_SYSTEM_DATA	"data"
 #define EXT4_MIN_INLINE_DATA_SIZE	((sizeof(__le32) * EXT4_N_BLOCKS))
-#define EXT4_INLINE_DOTDOT_SIZE	4
+#define EXT4_INLINE_DOTDOT_SIZE		4
+#define EXT4_INLINE_DOTDOT_OFFSET	2
 
 int ext4_get_inline_size(struct inode *inode)
 {
@@ -1330,6 +1331,7 @@ int ext4_read_inline_dir(struct file *filp,
 	sb = inode->i_sb;
 	stored = 0;
 	parent_ino = le32_to_cpu(((struct ext4_dir_entry_2 *)dir_buf)->inode);
+	offset = filp->f_pos;
 
 	while (!error && !stored && filp->f_pos < inode->i_size) {
 revalidate:
@@ -1342,9 +1344,15 @@ revalidate:
 		if (filp->f_version != inode->i_version) {
 			for (i = 0;
 			     i < inode->i_size && i < offset;) {
+				/*
+				 * "." is with offset 0 and
+				 * ".." is EXT4_INLINE_DOTDOT_OFFSET.
+				 */
 				if (!i) {
-					/* skip "." and ".." if needed. */
-					i += EXT4_INLINE_DOTDOT_SIZE;
+					i = EXT4_INLINE_DOTDOT_OFFSET;
+					continue;
+				} else if (i == EXT4_INLINE_DOTDOT_OFFSET) {
+					i = EXT4_INLINE_DOTDOT_SIZE;
 					continue;
 				}
 				de = (struct ext4_dir_entry_2 *)
@@ -1373,7 +1381,11 @@ revalidate:
 				if (error)
 					break;
 				stored++;
+				filp->f_pos = EXT4_INLINE_DOTDOT_OFFSET;
+				continue;
+			}
 
+			if (filp->f_pos == EXT4_INLINE_DOTDOT_OFFSET) {
 				error = filldir(dirent, "..", 2, 0, parent_ino,
 						DT_DIR);
 				if (error)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based.
  2013-03-28 10:34 [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled Tao Ma
@ 2013-03-28 10:34 ` Tao Ma
  2013-03-28 18:44   ` Zach Brown
  2013-03-28 18:33 ` [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled Zach Brown
  1 sibling, 1 reply; 7+ messages in thread
From: Tao Ma @ 2013-03-28 10:34 UTC (permalink / raw)
  To: linux-ext4; +Cc: zab

From: Tao Ma <boyu.mt@taobao.com>

Zach reported that if a dir is inlined, the offset is within the inode, while
if we have done the conversion, the dir now will have a block offset or even
a hashed pos. The good thing is that ext4 is also prepared to handle some
situation that the dir is changed during many calls of getdents.

This patch just increased the inode->i_version in dir conversion so that normal
ext4 readdir codes can work properly to handle this issue.

Reported-by: Zach Brown <zab@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/inline.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 9c09dd5..b1379fc 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1130,6 +1130,7 @@ static int ext4_finish_convert_inline_dir(handle_t *handle,
 				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
 		csum_size = sizeof(struct ext4_dir_entry_tail);
 
+	inode->i_version++;
 	inode->i_size = inode->i_sb->s_blocksize;
 	i_size_write(inode, inode->i_sb->s_blocksize);
 	EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled.
  2013-03-28 10:34 [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled Tao Ma
  2013-03-28 10:34 ` [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based Tao Ma
@ 2013-03-28 18:33 ` Zach Brown
  2013-03-29  1:34   ` Tao Ma
  1 sibling, 1 reply; 7+ messages in thread
From: Zach Brown @ 2013-03-28 18:33 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4

On Thu, Mar 28, 2013 at 06:34:58PM +0800, Tao Ma wrote:
> From: Tao Ma <boyu.mt@taobao.com>
> 
> Zach reported a problem that if inline data is enabled, we don't
> tell the difference between the offset of '.' and '..'. And a
> getdents will fail if the user only want to get '.'.
> 
> This patch adds a new offset EXT4_INLINE_DOTDOT_OFFSET which
> indicates the offset of inline "..", and now 0 is for the "."
> and EXT4_INLINE_DOTDOT_OFFSET is for "..".

Yeah, this fixes the problem.  I confirmed that my little test that got
a single dirent from getdents() now properly sees . and .. and exits
rather than spinning.

Tested-by: Zach Brown <zab@redhat.com>

>  
> +			if (filp->f_pos == EXT4_INLINE_DOTDOT_OFFSET) {
>  				error = filldir(dirent, "..", 2, 0, parent_ino,
>  						DT_DIR);
>  				if (error)
> -- 
> 1.7.0.4

Though I think you should change the fourth argument (offset) of the
second flildir() from 0 to EXT4_INLINE_DOTDOT_OFFSET.

- z

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based.
  2013-03-28 10:34 ` [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based Tao Ma
@ 2013-03-28 18:44   ` Zach Brown
  2013-03-29  2:03     ` Tao Ma
  0 siblings, 1 reply; 7+ messages in thread
From: Zach Brown @ 2013-03-28 18:44 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4

> Zach reported that if a dir is inlined, the offset is within the inode, while
> if we have done the conversion, the dir now will have a block offset or even
> a hashed pos. The good thing is that ext4 is also prepared to handle some
> situation that the dir is changed during many calls of getdents.

This doesn't fix the problem.  The problem isn't using the right code
path within ext4 for either inline or normal block directories.

The problem is that offsets for existing files change.  Yeah, ext4 also
has this problem when it converts from classic linear dirents to hashed
dirents, but I bet that basically doesn't happen any more.  Inline dirs
are making the problem happen for every single directory as it grows.

There's two ways to experience the bug:

1) nfs clients getting the wrong entry because the offset has changed
from the time that they got it from the server

2) more worryingly: a concurrent readdir() can see duplicate entries
from simply advancing f_pos as it does normally

Here's a quick little demonstration of the second case:

d_off: 2 d_name: ., f_pos 2
d_off: 4 d_name: .., f_pos 4
d_off: 16 d_name: a, f_pos 16
d_off: 28 d_name: b, f_pos 28
d_off: 40 d_name: c, f_pos 40
d_off: 371778706554281332 d_name: .., f_pos 18446744071750344052
d_off: 1068979911240654558 d_name: b, f_pos 18446744072795659998
d_off: 6187216788877381273 d_name: c, f_pos 1633586841
d_off: 6280769109141524706 d_name: e, f_pos 1386254562

Run the following in a newly created empty dir with inline_data:

#include <stdlib.h>
#include <dirent.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <sys/syscall.h>

struct linux_dirent {
	long           d_ino;
	off_t          d_off;
	unsigned short d_reclen;
	char           d_name[];
};

int main(int argc, char **argv)
{
	struct linux_dirent dent;
	char name[2] = {0,};
	int i;
	int ret;
	int fd;

	fd = open(".", O_RDONLY | O_DIRECTORY);
	if (fd < 0) {
		printf("open(\".\", O_RDONLY|O_DIRECTORY) failed: %u (%s)\n",
		       errno, strerror(errno));
		exit(1);
	}
	
	for (i = 0; i < 26; i++) {
		name[0] = 'a' + i;
		mknod(name, S_IFREG|0755, 0);
		ret = syscall(SYS_getdents, fd, &dent, sizeof(dent));
		if (ret < 1)
			break;
		printf("d_off: %llu d_name: %s, f_pos %llu\n",
		       (unsigned long long)dent.d_off,
		       dent.d_name,
		       (unsigned long long)lseek(fd, 0, SEEK_CUR));
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled.
  2013-03-28 18:33 ` [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled Zach Brown
@ 2013-03-29  1:34   ` Tao Ma
  2013-04-08 17:11     ` Theodore Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Tao Ma @ 2013-03-29  1:34 UTC (permalink / raw)
  To: Zach Brown; +Cc: linux-ext4

On 03/29/2013 02:33 AM, Zach Brown wrote:
> On Thu, Mar 28, 2013 at 06:34:58PM +0800, Tao Ma wrote:
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> Zach reported a problem that if inline data is enabled, we don't
>> tell the difference between the offset of '.' and '..'. And a
>> getdents will fail if the user only want to get '.'.
>>
>> This patch adds a new offset EXT4_INLINE_DOTDOT_OFFSET which
>> indicates the offset of inline "..", and now 0 is for the "."
>> and EXT4_INLINE_DOTDOT_OFFSET is for "..".
> 
> Yeah, this fixes the problem.  I confirmed that my little test that got
> a single dirent from getdents() now properly sees . and .. and exits
> rather than spinning.
> 
> Tested-by: Zach Brown <zab@redhat.com>
> 
>>  
>> +			if (filp->f_pos == EXT4_INLINE_DOTDOT_OFFSET) {
>>  				error = filldir(dirent, "..", 2, 0, parent_ino,
>>  						DT_DIR);
>>  				if (error)
>> -- 
>> 1.7.0.4
> 
> Though I think you should change the fourth argument (offset) of the
> second flildir() from 0 to EXT4_INLINE_DOTDOT_OFFSET.
My fault, will change it in the next version.

Thanks,
Tao

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based.
  2013-03-28 18:44   ` Zach Brown
@ 2013-03-29  2:03     ` Tao Ma
  0 siblings, 0 replies; 7+ messages in thread
From: Tao Ma @ 2013-03-29  2:03 UTC (permalink / raw)
  To: Zach Brown; +Cc: linux-ext4

On 03/29/2013 02:44 AM, Zach Brown wrote:
>> Zach reported that if a dir is inlined, the offset is within the inode, while
>> if we have done the conversion, the dir now will have a block offset or even
>> a hashed pos. The good thing is that ext4 is also prepared to handle some
>> situation that the dir is changed during many calls of getdents.
> 
> This doesn't fix the problem.  The problem isn't using the right code
> path within ext4 for either inline or normal block directories.
> 
> The problem is that offsets for existing files change.  Yeah, ext4 also
> has this problem when it converts from classic linear dirents to hashed
> dirents, but I bet that basically doesn't happen any more.  Inline dirs
> are making the problem happen for every single directory as it grows.
Thanks for the explanation. I just looked deep into the problem and yes,
the code is really tricky for an old linear dir. Now it also uses the
ext4_dx_readdir, so the situation you described doesn't happen...

Maybe I will also need to pretend as if inline dir is hashed like the
normal linear dir and return the hash value as the pos.

Thanks,
Tao
> 
> There's two ways to experience the bug:
> 
> 1) nfs clients getting the wrong entry because the offset has changed
> from the time that they got it from the server
> 
> 2) more worryingly: a concurrent readdir() can see duplicate entries
> from simply advancing f_pos as it does normally
> 
> Here's a quick little demonstration of the second case:
> 
> d_off: 2 d_name: ., f_pos 2
> d_off: 4 d_name: .., f_pos 4
> d_off: 16 d_name: a, f_pos 16
> d_off: 28 d_name: b, f_pos 28
> d_off: 40 d_name: c, f_pos 40
> d_off: 371778706554281332 d_name: .., f_pos 18446744071750344052
> d_off: 1068979911240654558 d_name: b, f_pos 18446744072795659998
> d_off: 6187216788877381273 d_name: c, f_pos 1633586841
> d_off: 6280769109141524706 d_name: e, f_pos 1386254562
> 
> Run the following in a newly created empty dir with inline_data:
> 
> #include <stdlib.h>
> #include <dirent.h>
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <errno.h>
> #include <string.h>
> #include <sys/syscall.h>
> 
> struct linux_dirent {
> 	long           d_ino;
> 	off_t          d_off;
> 	unsigned short d_reclen;
> 	char           d_name[];
> };
> 
> int main(int argc, char **argv)
> {
> 	struct linux_dirent dent;
> 	char name[2] = {0,};
> 	int i;
> 	int ret;
> 	int fd;
> 
> 	fd = open(".", O_RDONLY | O_DIRECTORY);
> 	if (fd < 0) {
> 		printf("open(\".\", O_RDONLY|O_DIRECTORY) failed: %u (%s)\n",
> 		       errno, strerror(errno));
> 		exit(1);
> 	}
> 	
> 	for (i = 0; i < 26; i++) {
> 		name[0] = 'a' + i;
> 		mknod(name, S_IFREG|0755, 0);
> 		ret = syscall(SYS_getdents, fd, &dent, sizeof(dent));
> 		if (ret < 1)
> 			break;
> 		printf("d_off: %llu d_name: %s, f_pos %llu\n",
> 		       (unsigned long long)dent.d_off,
> 		       dent.d_name,
> 		       (unsigned long long)lseek(fd, 0, SEEK_CUR));
> 	}
> 
> 	return 0;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled.
  2013-03-29  1:34   ` Tao Ma
@ 2013-04-08 17:11     ` Theodore Ts'o
  0 siblings, 0 replies; 7+ messages in thread
From: Theodore Ts'o @ 2013-04-08 17:11 UTC (permalink / raw)
  To: Tao Ma; +Cc: Zach Brown, linux-ext4

On Fri, Mar 29, 2013 at 09:34:58AM +0800, Tao Ma wrote:
> My fault, will change it in the next version.

Hi Tao,

Will you be able to send out a new version of this patch series in the
next few days?  The next merge window will be coming fairly quickly.

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-04-08 17:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-28 10:34 [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled Tao Ma
2013-03-28 10:34 ` [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based Tao Ma
2013-03-28 18:44   ` Zach Brown
2013-03-29  2:03     ` Tao Ma
2013-03-28 18:33 ` [PATCH 1/2] ext4: Return proper offset for '..' if inline_data enabled Zach Brown
2013-03-29  1:34   ` Tao Ma
2013-04-08 17:11     ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).