* [RFC PATCH 0/3] copy-on-write extents mapping @ 2013-02-20 3:59 Jeff Liu 2013-02-21 15:25 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: Jeff Liu @ 2013-02-20 3:59 UTC (permalink / raw) To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Joel Becker, Jan Kara, Chris Mason, Christoph Hellwig, ocfs2-devel Hello, We have the user requests to show the real disk usage for OCFS2/Btrfs with reflinked/cloned files. AFAICS, integrate the existing fiemap interface to du(1) is fine to solve this issue because OCFS2 can return an extent in FIEMAP_EXTENT_SHARED state which is used to indicate the extent is reflinked, and Btrfs can be improved in the similar approach in the future. Now another issue is regarding the performance when call fiemap ioctl(2) against a large file(like virtual disk images). Assuming we created a 20Gb reflinked file, the first 19Gb has been written(COWed), and the left 1Gb is still in shared status, the user space has to call fiemap for multiple times to fetch the ending shared extents, that is not good if the target disk have many reflinked files in such situations. I'd like to introduce a new flag FIEMAP_FLAG_COW to the fiemap interface, if this flag is set, the kernel space will only return the mapped extents in shared state, as a result, we can reduce the overheads for calling fiemap again an again. Test program to verify the FIEMAP_FLAG_COW flag: https://github.com/pibroch/fiemap_cow/blob/master/cow_test.c Create reflink file on OCFS2: https://github.com/pibroch/fiemap_cow/blob/master/ocfs2_reflink.c Any comments are appreciated, thanks! -Jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] copy-on-write extents mapping 2013-02-20 3:59 [RFC PATCH 0/3] copy-on-write extents mapping Jeff Liu @ 2013-02-21 15:25 ` Jan Kara 2013-02-21 18:00 ` Zach Brown 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2013-02-21 15:25 UTC (permalink / raw) To: Jeff Liu Cc: linux-fsdevel@vger.kernel.org, Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Joel Becker, Jan Kara, Chris Mason, Christoph Hellwig, ocfs2-devel Hello, On Wed 20-02-13 11:59:17, Jeff Liu wrote: > We have the user requests to show the real disk usage for OCFS2/Btrfs > with reflinked/cloned files. AFAICS, integrate the existing fiemap > interface to du(1) is fine to solve this issue because OCFS2 can return > an extent in FIEMAP_EXTENT_SHARED state which is used to indicate the > extent is reflinked, and Btrfs can be improved in the similar approach in > the future. > > Now another issue is regarding the performance when call fiemap ioctl(2) > against a large file (like virtual disk images). Assuming we created a > 20Gb reflinked file, the first 19Gb has been written(COWed), and the left > 1Gb is still in shared status, the user space has to call fiemap for > multiple times to fetch the ending shared extents, that is not good if > the target disk have many reflinked files in such situations. Can you gather some performance numbers please - i.e. how long does it take to map such file without FIEMAP_FLAG_COW and how long with it? I'm not completely convinced it will make such a huge difference in practice (given du(1) isn't very performance critical application). > I'd like to introduce a new flag FIEMAP_FLAG_COW to the fiemap interface, > if this flag is set, the kernel space will only return the mapped extents > in shared state, as a result, we can reduce the overheads for calling > fiemap again an again. I'm a bit uneasy about this 'filtering' function of flags. But I guess there aren't that many extent types so that flags couldn't accomodate that. So if you show something like this is necessary to make du(1) application practical, then I guess I can bear it. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] copy-on-write extents mapping 2013-02-21 15:25 ` Jan Kara @ 2013-02-21 18:00 ` Zach Brown 2013-02-24 13:42 ` Jeff Liu 0 siblings, 1 reply; 8+ messages in thread From: Zach Brown @ 2013-02-21 18:00 UTC (permalink / raw) To: Jan Kara Cc: Jeff Liu, linux-fsdevel@vger.kernel.org, Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Joel Becker, Chris Mason, Christoph Hellwig, ocfs2-devel > Can you gather some performance numbers please - i.e. how long does it take > to map such file without FIEMAP_FLAG_COW and how long with it? I'm not > completely convinced it will make such a huge difference in practice (given > du(1) isn't very performance critical application). Seconded. I'd like to see measurements (wall time, cpu, ios) of the time it takes to find shared extents on a giant file *on a fresh uncached mount*. Because this interface doesn't help the file system do the work more efficiently, the kernel still has to walk everything to see if its shared. It just saves some syscalls and copying. That's noise compared to the io/cache footprint of the operation. - z ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] copy-on-write extents mapping 2013-02-21 18:00 ` Zach Brown @ 2013-02-24 13:42 ` Jeff Liu 2013-02-25 13:28 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: Jeff Liu @ 2013-02-24 13:42 UTC (permalink / raw) To: Jan Kara Cc: zab, linux-fsdevel@vger.kernel.org, Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Joel Becker, Chris Mason, Christoph Hellwig, ocfs2-devel Hi Jan and Zach, Thanks for both of your comments and sorry for my too late response since I have to think it over and run tests to gather the performance statistics. On 02/22/2013 02:00 AM, Zach Brown wrote: >> Can you gather some performance numbers please - i.e. how long does it take >> to map such file without FIEMAP_FLAG_COW and how long with it? I'm not >> completely convinced it will make such a huge difference in practice (given >> du(1) isn't very performance critical application). > > Seconded. > > I'd like to see measurements (wall time, cpu, ios) of the time it takes > to find shared extents on a giant file *on a fresh uncached mount*. > > Because this interface doesn't help the file system do the work more > efficiently, the kernel still has to walk everything to see if its > shared. It just saves some syscalls and copying. > > That's noise compared to the io/cache footprint of the operation. Firstly, the results is really frustrating to me as there basically has no performance improved against a 50GB file on OCFS2. The result collected on a single node OCFS2: /dev/sda5 on /ocfs2 type ocfs2 (rw,sync,_netdev,heartbeat=local) Create a 50GB file, and create a reflinked file from it: $ dd if=/dev/zero of=testfile bs=1M count=50000 $ ./ocfs2_reflink testfile testfile_reflinked Make the first 48GB COWed: $ dd if=/dev/zero of=testfile_reflinked bs=1M count=46000 seek=0 conv=notrunc 46000+0 records in 46000+0 records out 48234496000 bytes (48 GB) copied, 1593.44 s, 30.3 MB/s The original file has 968 shared extents: $ ./cow_test testfile Find 968 COW extents After COWed, the target reflinked file has 101 extents in shared state: The latest 101 extents are in shared state: $ ./cow_test testfile_reflinked Find 101 COW extents No matter kernel is patched or not, there basically no performance improvements although 12 times fiemap ioctl(2) are reduced: Kernel non-patched: $ time ./cow_test testfile_reflinked Find 101 COW extents real 0m0.006s user 0m0.000s sys 0m0.004s Kernel patched: $ time ./cow_test testfile_reflinked Find 101 COW extents real 0m0.006s user 0m0.000s sys 0m0.000s Kernel non-patched: $ strace -c ./cow_test testfile Find 101 COW extents % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 74.36 0.000174 58 3 open 25.64 0.000060 20 3 fstat 0.00 0.000000 0 1 read 0.00 0.000000 0 1 write 0.00 0.000000 0 3 close 0.00 0.000000 0 9 mmap 0.00 0.000000 0 4 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 1 brk 0.00 0.000000 0 16 ioctl 0.00 0.000000 0 3 3 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000234 47 3 total Kernel patched: $ strace -c ./cow_test testfile Find 101 COW extents % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002727 1364 2 ioctl 0.00 0.000000 0 1 read 0.00 0.000000 0 1 write 0.00 0.000000 0 3 open 0.00 0.000000 0 3 close 0.00 0.000000 0 3 fstat 0.00 0.000000 0 9 mmap 0.00 0.000000 0 4 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 1 brk 0.00 0.000000 0 3 3 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002727 33 3 total But I have another idea regarding the performance if considering the practical situations. Generally, the end user would run du(1) against a partition with not only the reflinked files but also includes normal files which are not contains any shared extents, or if the user check up the shared extents for a previous reflinked file, but maybe this file has already totally COWed, that is, now it does not contains any shared extent at all. In either case, du(1) has to call fiemap to look through the extents against this kind of files no matter it contains shared extents or not, that's would be an overhead(Yes, du(1) is not a very performance critical application). But with a prejudegement approach, we can bypass the normal files and lookup shared extents against the COW file only. On OCFS2, the reflinked file is indicated via OCFS2_HAS_REFCOUNT_FL flag insides inodes, here is a proof-of-concept patch for OCFS2 on top of my previous patches, it was wrote for a quick demo purpose only: /* * Don't trying to lookup shared extents for non-reflinked file. */ diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c index d75a731..a381041 100644 --- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -774,6 +774,12 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, down_read(&OCFS2_I(inode)->ip_alloc_sem); + if ((fieinfo->fi_flags & FIEMAP_FLAG_COW) && + !(OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL)) { + ret = -ENODATA; + goto out_unlock; + } + For a 100Gb OCFS2 partition(This is the max partition I can created on my laptop), $ ls -lh /ocfs2/ total 99G -rwxrwxr-x+ 1 jeff jeff 13K Feb 24 16:54 cow_test_after -rwxrwxr-x+ 1 jeff jeff 13K Feb 24 18:38 cow_test_default -rwxrwxr-x+ 1 jeff jeff 459K Feb 24 20:14 du_non_patched -rwxrwxr-x+ 1 jeff jeff 459K Feb 24 20:14 du_patched drwxr-xr-x 2 jeff jeff 3.9K Feb 22 17:10 lost+found -rw-rw-r--+ 1 jeff jeff 30G Feb 24 17:10 testfile -rw-rw-r--+ 1 jeff jeff 9.8G Feb 24 19:03 testfile_02 -rw-rw-r--+ 1 jeff jeff 9.8G Feb 24 19:06 testfile_03 -rw-rw-r--+ 1 jeff jeff 9.8G Feb 24 19:10 testfile_04 -rw-rw-r--+ 1 jeff jeff 9.8G Feb 24 19:16 testfile_05 -rw-rw-r-- 1 jeff jeff 30G Feb 24 20:02 testfile_reflinked Before patching du(1) to aware of FIEMAP_FLAG_COW: $ perf stat ./src/du_non_patched -E -sh /ocfs2/ 99G (59G) /ocfs2/ 70G footprint Performance counter stats for './src/du_patched -E -sh /ocfs2/': 7.443270 task-clock # 0.042 CPUs utilized 32 context-switches # 0.004 M/sec 2 cpu-migrations # 0.269 K/sec 321 page-faults # 0.043 M/sec 16,314,337 cycles # 2.192 GHz 9,659,617 stalled-cycles-frontend # 59.21% frontend cycles idle <not supported> stalled-cycles-backend 14,734,763 instructions # 0.90 insns per cycle # 0.66 stalled cycles per insn 3,256,351 branches # 437.489 M/sec 38,433 branch-misses # 1.18% of all branches 0.175917908 seconds time elapsed After patching du(1): $ perf stat ./src/du_patched -E -sh /ocfs2/ 99G (59G) /ocfs2/ 70G footprint Performance counter stats for './src/du_patched -E -sh /ocfs2/': 8.935251 task-clock # 0.095 CPUs utilized 16 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 320 page-faults # 0.036 M/sec 11,661,240 cycles # 1.305 GHz 6,007,876 stalled-cycles-frontend # 51.52% frontend cycles idle <not supported> stalled-cycles-backend 12,848,387 instructions # 1.10 insns per cycle # 0.47 stalled cycles per insn 2,944,853 branches # 329.577 M/sec 35,148 branch-misses # 1.19% of all branches 0.093799219 seconds time elapsed For individual files, both testfile_02 and testfile_03 are 10GB normal files without shared extents: $ ls -l testfile_02 testfile_03 -rw-rw-r--+ 1 jeff jeff 10485760000 Feb 24 19:03 testfile_02 -rw-rw-r--+ 1 jeff jeff 10485760000 Feb 24 19:06 testfile_03 Before patching du(1): $ perf stat ./du_non_patched testfile_02 10240000 testfile_02 Performance counter stats for './du_non_patched testfile_02': 2.154475 task-clock # 0.035 CPUs utilized 7 context-switches # 0.003 M/sec 0 cpu-migrations # 0.000 K/sec 297 page-faults # 0.138 M/sec 4,889,482 cycles # 2.269 GHz 3,448,039 stalled-cycles-frontend # 70.52% frontend cycles idle <not supported> stalled-cycles-backend 2,811,093 instructions # 0.57 insns per cycle # 1.23 stalled cycles per insn 500,471 branches # 232.294 M/sec 13,712 branch-misses # 2.74% of all branches 0.061926381 seconds time elapsed After patching du(1): $ perf stat ./du_patched testfile_03 10240000 testfile_03 Performance counter stats for './du_patched testfile_03': 2.321336 task-clock # 0.059 CPUs utilized 7 context-switches # 0.003 M/sec 0 cpu-migrations # 0.000 K/sec 297 page-faults # 0.128 M/sec 5,044,049 cycles # 2.173 GHz 3,596,109 stalled-cycles-frontend # 71.29% frontend cycles idle <not supported> stalled-cycles-backend 2,810,123 instructions # 0.56 insns per cycle # 1.28 stalled cycles per insn 500,889 branches # 215.776 M/sec 13,713 branch-misses # 2.74% of all branches 0.039634019 seconds time elapsed Does the results above looks make sense? If yes, I still felt that it's not a formal approach to detect reflinked files. IMHO, if we can improve the stat(2)->getattr() to fill the mode member with a flag to indicate that a file is reflinked/cow or not, it would be more convenient to check as like S_ISREFLINK(stat.st_mode) from the user space since du(1) always fetching the statistics per file disk space accounting. Thanks, -Jeff ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] copy-on-write extents mapping 2013-02-24 13:42 ` Jeff Liu @ 2013-02-25 13:28 ` Jan Kara 2013-02-25 14:19 ` Jeff Liu ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Jan Kara @ 2013-02-25 13:28 UTC (permalink / raw) To: Jeff Liu Cc: Jan Kara, zab, linux-fsdevel@vger.kernel.org, Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Joel Becker, Chris Mason, Christoph Hellwig, ocfs2-devel Hi Jeff, On Sun 24-02-13 21:42:30, Jeff Liu wrote: > Thanks for both of your comments and sorry for my too late response since > I have to think it over and run tests to gather the performance > statistics. Sure, no problem. > On 02/22/2013 02:00 AM, Zach Brown wrote: > >> Can you gather some performance numbers please - i.e. how long does it take > >> to map such file without FIEMAP_FLAG_COW and how long with it? I'm not > >> completely convinced it will make such a huge difference in practice (given > >> du(1) isn't very performance critical application). > > > > Seconded. > > > > I'd like to see measurements (wall time, cpu, ios) of the time it takes > > to find shared extents on a giant file *on a fresh uncached mount*. > > > > Because this interface doesn't help the file system do the work more > > efficiently, the kernel still has to walk everything to see if its > > shared. It just saves some syscalls and copying. > > > > That's noise compared to the io/cache footprint of the operation. > Firstly, the results is really frustrating to me as there basically has no performance > improved against a 50GB file on OCFS2. > > The result collected on a single node OCFS2: > /dev/sda5 on /ocfs2 type ocfs2 (rw,sync,_netdev,heartbeat=local) > > Create a 50GB file, and create a reflinked file from it: > $ dd if=/dev/zero of=testfile bs=1M count=50000 > $ ./ocfs2_reflink testfile testfile_reflinked > > Make the first 48GB COWed: > $ dd if=/dev/zero of=testfile_reflinked bs=1M count=46000 seek=0 conv=notrunc > 46000+0 records in > 46000+0 records out > 48234496000 bytes (48 GB) copied, 1593.44 s, 30.3 MB/s > > The original file has 968 shared extents: > $ ./cow_test testfile > Find 968 COW extents > > After COWed, the target reflinked file has 101 extents in shared state: > The latest 101 extents are in shared state: > $ ./cow_test testfile_reflinked > Find 101 COW extents > > No matter kernel is patched or not, there basically no performance > improvements although 12 times fiemap ioctl(2) are reduced <snip> Yeah, I suspected that. As Zach said, kernel has to do all the work anyway so you just save some small overhead of additional syscalls. But those are rather cheap compared to other stuff you need to do. > But I have another idea regarding the performance if considering the > practical situations. Generally, the end user would run du(1) against a > partition with not only the reflinked files but also includes normal > files which are not contains any shared extents, or if the user check up > the shared extents for a previous reflinked file, but maybe this file has > already totally COWed, that is, now it does not contains any shared > extent at all. > > In either case, du(1) has to call fiemap to look through the extents > against this kind of files no matter it contains shared extents or not, > that's would be an overhead(Yes, du(1) is not a very performance critical > application). > > But with a prejudegement approach, we can bypass the normal files and > lookup shared extents against the COW file only. Yes, that would be useful and as you showed it can bring noticeable speedup. > Does the results above looks make sense? If yes, I still felt that it's > not a formal approach to detect reflinked files. IMHO, if we can improve > the stat(2)->getattr() to fill the mode member with a flag to indicate > that a file is reflinked/cow or not, it would be more convenient to check > as like S_ISREFLINK(stat.st_mode) from the user space since du(1) always > fetching the statistics per file disk space accounting. I agree that adding filtering to FIEMAP just to accomodate the only practical use case of checking whether a file has any shared extent is really an overkill. But changing stat(2) the way you describe is ugly hack. st_mode has logically nothing to do with whether file has shared extents or not. If anything you could use ioctl IOC_GETFLAGS for that. I'm not 100% sure that's the right interface but at least it isn't that ugly. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] copy-on-write extents mapping 2013-02-25 13:28 ` Jan Kara @ 2013-02-25 14:19 ` Jeff Liu 2013-02-25 17:14 ` Zach Brown 2013-03-02 10:46 ` Joel Becker 2 siblings, 0 replies; 8+ messages in thread From: Jeff Liu @ 2013-02-25 14:19 UTC (permalink / raw) To: Jan Kara Cc: zab, linux-fsdevel@vger.kernel.org, Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Joel Becker, Chris Mason, Christoph Hellwig, ocfs2-devel On 02/25/2013 09:28 PM, Jan Kara wrote: > Hi Jeff, > > On Sun 24-02-13 21:42:30, Jeff Liu wrote: >> Thanks for both of your comments and sorry for my too late response since >> I have to think it over and run tests to gather the performance >> statistics. > Sure, no problem. > >> >> No matter kernel is patched or not, there basically no performance >> improvements although 12 times fiemap ioctl(2) are reduced > <snip> > Yeah, I suspected that. As Zach said, kernel has to do all the work > anyway so you just save some small overhead of additional syscalls. But > those are rather cheap compared to other stuff you need to do. > >> But I have another idea regarding the performance if considering the >> practical situations. Generally, the end user would run du(1) against a >> partition with not only the reflinked files but also includes normal >> files which are not contains any shared extents, or if the user check up >> the shared extents for a previous reflinked file, but maybe this file has >> already totally COWed, that is, now it does not contains any shared >> extent at all. >> >> In either case, du(1) has to call fiemap to look through the extents >> against this kind of files no matter it contains shared extents or not, >> that's would be an overhead(Yes, du(1) is not a very performance critical >> application). >> >> But with a prejudegement approach, we can bypass the normal files and >> lookup shared extents against the COW file only. > Yes, that would be useful and as you showed it can bring noticeable > speedup. > >> Does the results above looks make sense? If yes, I still felt that it's >> not a formal approach to detect reflinked files. IMHO, if we can improve >> the stat(2)->getattr() to fill the mode member with a flag to indicate >> that a file is reflinked/cow or not, it would be more convenient to check >> as like S_ISREFLINK(stat.st_mode) from the user space since du(1) always >> fetching the statistics per file disk space accounting. > I agree that adding filtering to FIEMAP just to accomodate the only > practical use case of checking whether a file has any shared extent is > really an overkill. But changing stat(2) the way you describe is ugly hack. > st_mode has logically nothing to do with whether file has shared extents or > not. If anything you could use ioctl IOC_GETFLAGS for that. I'm not 100% > sure that's the right interface but at least it isn't that ugly. Hi Jan, Thanks for your quick response and thanks for pointing me out, I have not realized this interface before, looks it's very nice in this situation, I'll try it out. :) Regards, -Jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] copy-on-write extents mapping 2013-02-25 13:28 ` Jan Kara 2013-02-25 14:19 ` Jeff Liu @ 2013-02-25 17:14 ` Zach Brown 2013-03-02 10:46 ` Joel Becker 2 siblings, 0 replies; 8+ messages in thread From: Zach Brown @ 2013-02-25 17:14 UTC (permalink / raw) To: Jan Kara Cc: Jeff Liu, linux-fsdevel@vger.kernel.org, Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Joel Becker, Chris Mason, Christoph Hellwig, ocfs2-devel > > Does the results above looks make sense? If yes, I still felt that it's > > not a formal approach to detect reflinked files. IMHO, if we can improve > > the stat(2)->getattr() to fill the mode member with a flag to indicate > > that a file is reflinked/cow or not, it would be more convenient to check > > as like S_ISREFLINK(stat.st_mode) from the user space since du(1) always > > fetching the statistics per file disk space accounting. > I agree that adding filtering to FIEMAP just to accomodate the only > practical use case of checking whether a file has any shared extent is > really an overkill. But changing stat(2) the way you describe is ugly hack. > st_mode has logically nothing to do with whether file has shared extents or > not. If anything you could use ioctl IOC_GETFLAGS for that. I'm not 100% > sure that's the right interface but at least it isn't that ugly. Agreed: avoiding the fiemap extent walk entirely is reasonable, and st_mode is a strange place to put a flag that indicates that some extents might have the SHARED bit :). GETFLAGS doesn't seem so bad. It seems like the real fix, though, is to have the fs track shared logical file offset regions with a counter in the inode like it does for size and blocks. Extending stat() to report this to userspace would probably be very annoying; maybe a synthetic xattr would work. - z ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] copy-on-write extents mapping 2013-02-25 13:28 ` Jan Kara 2013-02-25 14:19 ` Jeff Liu 2013-02-25 17:14 ` Zach Brown @ 2013-03-02 10:46 ` Joel Becker 2 siblings, 0 replies; 8+ messages in thread From: Joel Becker @ 2013-03-02 10:46 UTC (permalink / raw) To: Jan Kara Cc: Jeff Liu, zab, linux-fsdevel@vger.kernel.org, Alexander Viro, Andreas Dilger, Dave Chinner, Mark Fasheh, Chris Mason, Christoph Hellwig, ocfs2-devel On Mon, Feb 25, 2013 at 02:28:44PM +0100, Jan Kara wrote: > Hi Jeff, > > On Sun 24-02-13 21:42:30, Jeff Liu wrote: > > Thanks for both of your comments and sorry for my too late response since > > I have to think it over and run tests to gather the performance > > statistics. > Sure, no problem. > > > On 02/22/2013 02:00 AM, Zach Brown wrote: > > >> Can you gather some performance numbers please - i.e. how long does it take > > >> to map such file without FIEMAP_FLAG_COW and how long with it? I'm not > > >> completely convinced it will make such a huge difference in practice (given > > >> du(1) isn't very performance critical application). > > > > > > Seconded. > > > > > > I'd like to see measurements (wall time, cpu, ios) of the time it takes > > > to find shared extents on a giant file *on a fresh uncached mount*. > > > > > > Because this interface doesn't help the file system do the work more > > > efficiently, the kernel still has to walk everything to see if its > > > shared. It just saves some syscalls and copying. > > > > > > That's noise compared to the io/cache footprint of the operation. > > Firstly, the results is really frustrating to me as there basically has no performance > > improved against a 50GB file on OCFS2. > > > > The result collected on a single node OCFS2: > > /dev/sda5 on /ocfs2 type ocfs2 (rw,sync,_netdev,heartbeat=local) > > > > Create a 50GB file, and create a reflinked file from it: > > $ dd if=/dev/zero of=testfile bs=1M count=50000 > > $ ./ocfs2_reflink testfile testfile_reflinked > > > > Make the first 48GB COWed: > > $ dd if=/dev/zero of=testfile_reflinked bs=1M count=46000 seek=0 conv=notrunc > > 46000+0 records in > > 46000+0 records out > > 48234496000 bytes (48 GB) copied, 1593.44 s, 30.3 MB/s > > > > The original file has 968 shared extents: > > $ ./cow_test testfile > > Find 968 COW extents > > > > After COWed, the target reflinked file has 101 extents in shared state: > > The latest 101 extents are in shared state: > > $ ./cow_test testfile_reflinked > > Find 101 COW extents > > > > No matter kernel is patched or not, there basically no performance > > improvements although 12 times fiemap ioctl(2) are reduced > <snip> > Yeah, I suspected that. As Zach said, kernel has to do all the work > anyway so you just save some small overhead of additional syscalls. But > those are rather cheap compared to other stuff you need to do. > > > But I have another idea regarding the performance if considering the > > practical situations. Generally, the end user would run du(1) against a > > partition with not only the reflinked files but also includes normal > > files which are not contains any shared extents, or if the user check up > > the shared extents for a previous reflinked file, but maybe this file has > > already totally COWed, that is, now it does not contains any shared > > extent at all. > > > > In either case, du(1) has to call fiemap to look through the extents > > against this kind of files no matter it contains shared extents or not, > > that's would be an overhead(Yes, du(1) is not a very performance critical > > application). > > > > But with a prejudegement approach, we can bypass the normal files and > > lookup shared extents against the COW file only. > Yes, that would be useful and as you showed it can bring noticeable > speedup. > > > Does the results above looks make sense? If yes, I still felt that it's > > not a formal approach to detect reflinked files. IMHO, if we can improve > > the stat(2)->getattr() to fill the mode member with a flag to indicate > > that a file is reflinked/cow or not, it would be more convenient to check > > as like S_ISREFLINK(stat.st_mode) from the user space since du(1) always > > fetching the statistics per file disk space accounting. > I agree that adding filtering to FIEMAP just to accomodate the only > practical use case of checking whether a file has any shared extent is > really an overkill. But changing stat(2) the way you describe is ugly hack. > st_mode has logically nothing to do with whether file has shared extents or > not. If anything you could use ioctl IOC_GETFLAGS for that. I'm not 100% > sure that's the right interface but at least it isn't that ugly. Jumping in, because I'm now back in town and paying attention. I'm going to respond to a bunch of points in the thread. - If we were going to filter, I'd like to see something more generic. There can be shared extents that are not COW. FIEMAP_FLAG_COW doesn't fit this. FIEMAP_FLAG_SHARED is more aligned with how we describe the results in the response structure. - Specific filter flags in FIEMAP strike me as a bad idea. We all seem to agree on that. - The right thing is for du(1) and similar programs to just ignore files that have no shared extents. The kernel shouldn't be trying to be smart about this. - Whatever way we present userspace with "this file has shared extents" should be generic so that all filesystems supporting shared extents report the same thing. btrfs' handling of FS_IOC_GETFLAGS kind of works like this. - The more I think about it, though, I'm liking zab's synthetic xattr. Why not feature flags ala processors? Imagine the xattr "fs:file-features" reporting "shared-extents,immutable" or somesuch. Free-form strings allow us to add things without the header hoops. Joel -- Life's Little Instruction Book #197 "Don't forget, a person's greatest emotional need is to feel appreciated." http://www.jlbec.org/ jlbec@evilplan.org ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-03-02 10:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-20 3:59 [RFC PATCH 0/3] copy-on-write extents mapping Jeff Liu 2013-02-21 15:25 ` Jan Kara 2013-02-21 18:00 ` Zach Brown 2013-02-24 13:42 ` Jeff Liu 2013-02-25 13:28 ` Jan Kara 2013-02-25 14:19 ` Jeff Liu 2013-02-25 17:14 ` Zach Brown 2013-03-02 10:46 ` Joel Becker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).