* Re: After unlinking a large file on ext4, the process stalls for a long time [not found] ` <53C6B38A.3000100@free.fr> @ 2014-07-17 3:37 ` Andreas Dilger 2014-07-17 10:30 ` Mason 0 siblings, 1 reply; 12+ messages in thread From: Andreas Dilger @ 2014-07-17 3:37 UTC (permalink / raw) To: Mason; +Cc: John Stoffel, Ext4 Developers List, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 4103 bytes --] On Jul 16, 2014, at 11:16 AM, Mason <mpeg.blue@free.fr> wrote: > (I hope you'll forgive me for reformatting the quote characters > to my taste.) Thank you. > On 16/07/2014 17:16, John Stoffel wrote: >> Mason wrote: >>> I'm using Linux (3.1.10 at the moment) on a embedded system >>> similar in spec to a desktop PC from 15 years ago (256 MB RAM, >>> 800-MHz CPU, USB). >> >> Sounds like a Raspberry Pi... And have you investigated using >> something like XFS as your filesystem instead? > > The system is a set-top box (DVB-S2 receiver). The system CPU is > MIPS 74K, not ARM (not that it matters, in this case). > > No, I have not investigated other file systems (yet). > >>> I need to be able to create large files (50-1000 GB) "as fast >>> as possible". These files are created on an external hard disk >>> drive, connected over Hi-Speed USB (typical throughput 30 MB/s). >> >> Really... so you just need to create allocations of space as quickly >> as possible, > > I may not have been clear. The creation needs to be fast (in UX terms, > so less than 5-10 seconds), but it only occurs a few times during the > lifetime of the system. > >> which will then be filled in later with actual data? > > Yes. In fact, I use the loopback device to format the file as an > ext4 partition. > > The use case is > - allocate a large file > - stick a file system on it > - store stuff (typically video files) inside this "private" FS > - when the user decides he doesn't need it anymore, unmount and unlink > (I also have a resize operation in there, but I wanted to get the > basics before taking the hard stuff head on.) > > So, in the limit, we don't store anything at all: just create and > immediately delete. This was my test. I would agree that LVM is the real solution that you want to use. It is specifically designed for this, and has much less overhead than a filesystem on a loopback device on a file on another filesystem. The amount of space overhead is tuneable, but typically the volumes are allocated in multiples of 4MB chunks. That said, I think you've found some kind of strange performance problem, and it is worthwhile to figure this out. >>> /tmp # time ./foo /mnt/hdd/xxx 5 >>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [68 ms] >>> unlink(filename): 0 [0 ms] >>> 0.00user 1.86system 0:01.92elapsed 97%CPU (0avgtext+0avgdata 528maxresident)k >>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps >>> >>> /tmp # time ./foo /mnt/hdd/xxx 10 >>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [141 ms] >>> unlink(filename): 0 [0 ms] >>> 0.00user 3.71system 0:03.83elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k >>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps >>> >>> /tmp # time ./foo /mnt/hdd/xxx 100 >>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [1882 ms] >>> unlink(filename): 0 [0 ms] >>> 0.00user 37.12system 0:38.93elapsed 95%CPU (0avgtext+0avgdata 528maxresident)k >>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps >>> >>> /tmp # time ./foo /mnt/hdd/xxx 300 >>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [3883 ms] >>> unlink(filename): 0 [0 ms] >>> 0.00user 111.38system 1:55.04elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k >>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps Firstly, have you tried using "fallocate()" directly, instead of posix_fallocate()? It may be (depending on your userspace) that posix_fallocate() is writing zeroes to the file instead of using the fallocate() syscall, and the kernel is busy cleaning up all of the dirty pages when the file is unlinked. You could try using strace to see what system calls are actually being used. Secondly, where is the process actually stuck? From your output above, the unlink() call takes no measurable time before returning, so I don't see where it is actually stuck. Again, running your test with "strace -tt -T ./foo /mnt/hdd/xxx 300" will show which syscall is actually taking so much time to complete. I don't think it is unlink(). Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-07-17 3:37 ` After unlinking a large file on ext4, the process stalls for a long time Andreas Dilger @ 2014-07-17 10:30 ` Mason 2014-07-17 10:40 ` Lukáš Czerner 0 siblings, 1 reply; 12+ messages in thread From: Mason @ 2014-07-17 10:30 UTC (permalink / raw) To: Andreas Dilger; +Cc: Ext4 Developers List, linux-fsdevel Hello, Andreas Dilger wrote: > Mason wrote: > >> The use case is >> - allocate a large file >> - stick a file system on it >> - store stuff (typically video files) inside this "private" FS >> - when the user decides he doesn't need it anymore, unmount and unlink >> (I also have a resize operation in there, but I wanted to get the >> basics before taking the hard stuff head on.) >> >> So, in the limit, we don't store anything at all: just create and >> immediately delete. This was my test. > > I would agree that LVM is the real solution that you want to use. > It is specifically designed for this, and has much less overhead than > a filesystem on a loopback device on a file on another filesystem. > The amount of space overhead is tuneable, but typically the volumes > are allocated in multiples of 4MB chunks. I'll take a look at LVM. (But, at this point, it's too late to change the architecture of the system.) > That said, I think you've found some kind of strange performance problem, > and it is worthwhile to figure this out. > >>>> /tmp # time ./foo /mnt/hdd/xxx 5 >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [68 ms] >>>> unlink(filename): 0 [0 ms] >>>> 0.00user 1.86system 0:01.92elapsed 97%CPU (0avgtext+0avgdata 528maxresident)k >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps >>>> >>>> /tmp # time ./foo /mnt/hdd/xxx 10 >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [141 ms] >>>> unlink(filename): 0 [0 ms] >>>> 0.00user 3.71system 0:03.83elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps >>>> >>>> /tmp # time ./foo /mnt/hdd/xxx 100 >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [1882 ms] >>>> unlink(filename): 0 [0 ms] >>>> 0.00user 37.12system 0:38.93elapsed 95%CPU (0avgtext+0avgdata 528maxresident)k >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps >>>> >>>> /tmp # time ./foo /mnt/hdd/xxx 300 >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [3883 ms] >>>> unlink(filename): 0 [0 ms] >>>> 0.00user 111.38system 1:55.04elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps Preliminary info: The partition was created/mounted with $ mkfs.ext4 -m 0 -i 1024000 -L ZOZO -O ^has_journal,^huge_file /dev/sda1 $ mount -t ext4 /dev/sda1 /mnt/hdd -o noexec,noatime (mount is busybox, in case it matters) mke2fs 1.42.10 (18-May-2014) /dev/sda1 contains a ext4 file system labelled 'ZOZO' last mounted on /mnt/hdd on Wed Jul 16 15:40:40 2014 Proceed anyway? (y,n) y Creating filesystem with 104857600 4k blocks and 460800 inodes Filesystem UUID: 8c12c8fe-6ab8-4888-b9a3-6f28c86020eb Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000 Allocating group tables: done Writing inode tables: done Writing superblocks and filesystem accounting information: done /dev/sda1 on /mnt/hdd type ext4 (rw,noexec,noatime,barrier=1) /* No support for xattr in this kernel */ # dumpe2fs -h /dev/sda1 dumpe2fs 1.42.10 (18-May-2014) Filesystem volume name: ZOZO Last mounted on: <not available> Filesystem UUID: 8c12c8fe-6ab8-4888-b9a3-6f28c86020eb Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: not clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 460800 Block count: 104857600 Reserved block count: 0 Free blocks: 104803944 Free inodes: 460789 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 999 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 144 Inode blocks per group: 9 Flex block group size: 16 Filesystem created: Thu Jul 17 11:14:27 2014 Last mount time: Thu Jul 17 11:14:29 2014 Last write time: Thu Jul 17 11:14:29 2014 Mount count: 1 Maximum mount count: -1 Last checked: Thu Jul 17 11:14:27 2014 Check interval: 0 (<none>) Lifetime writes: 4883 kB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group unknown) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Default directory hash: half_md4 Directory Hash Seed: 157f2107-76fc-417b-9a07-491951c873b7 > Firstly, have you tried using "fallocate()" directly, instead of > posix_fallocate()? It may be (depending on your userspace) that > posix_fallocate() is writing zeroes to the file instead of using > the fallocate() syscall, and the kernel is busy cleaning up all > of the dirty pages when the file is unlinked. You could try using > strace to see what system calls are actually being used. Unfortunately, I'm using a prehistoric version of glibc (2.8) that doesn't support the fallocate wrapper (imported in 2.10). I'm 70% sure that posix_fallocate() is not actually writing zeros to the file, because when I tested it on ext2, creating a 300-GB file took hours, literally (approx. 3 hours). The same operation on ext4 takes a few seconds. (Although, now that I think of it, it could be working asynchronously, or defer some operation, that I eventually have to pay for on deletion.) # time strace -tt -T ./foo /mnt/hdd/xxx 300 2> strace.out posix_fallocate(fd, 0, size_in_GiB << 30): 0 [414 ms] unlink(filename): 0 [1 ms] 12:23:27.218838 open("/mnt/hdd/xxx", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 3 <0.000486> 12:23:27.220121 clock_gettime(CLOCK_MONOTONIC, {79879, 926227018}) = 0 <0.000105> 12:23:27.221029 SYS_4320() = 0 <0.412013> 12:23:27.633673 clock_gettime(CLOCK_MONOTONIC, {79880, 339646593}) = 0 <0.000104> 12:23:27.634657 fstat64(1, {st_mode=S_IFCHR|0755, st_rdev=makedev(4, 64), ...}) = 0 <0.000116> 12:23:27.636187 ioctl(1, TIOCNXCL, {B115200 opost isig icanon echo ...}) = 0 <0.000146> 12:23:27.637509 old_mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x77248000 <0.000143> 12:23:27.638306 write(1, "posix_fallocate(fd, 0, size_in_G"..., 54) = 54 <0.000237> 12:23:27.639496 clock_gettime(CLOCK_MONOTONIC, {79880, 345448452}) = 0 <0.000102> 12:23:27.640168 unlink("/mnt/hdd/xxx") = 0 <0.000231> 12:23:27.641174 clock_gettime(CLOCK_MONOTONIC, {79880, 347202581}) = 0 <0.000100> 12:23:27.641984 write(1, "unlink(filename): 0 [1 ms]\n", 27) = 27 <0.000157> 12:23:27.643056 exit_group(0) = ? 0.02user 111.51system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 864maxresident)k 0inputs+0outputs (0major+459minor)pagefaults 0swaps AFAICT, SYS_4320() is fallocate. /* * Linux o32 style syscalls are in the range from 4000 to 4999. */ #define __NR_Linux 4000 #define __NR_fallocate (__NR_Linux + 320) Where is the process stalling? That is a mystery. Seems it's stuck in exit_group(), waiting for the kernel to clean up on its behalf? Maybe I need ftrace, or something to profile the kernel? > Secondly, where is the process actually stuck? From your output > above, the unlink() call takes no measurable time before returning, > so I don't see where it is actually stuck. Again, running your > test with "strace -tt -T ./foo /mnt/hdd/xxx 300" will show which > syscall is actually taking so much time to complete. I don't > think it is unlink(). See above, the process is stalled, but I don't know where! -- Regards. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-07-17 10:30 ` Mason @ 2014-07-17 10:40 ` Lukáš Czerner 2014-07-17 11:17 ` Mason 0 siblings, 1 reply; 12+ messages in thread From: Lukáš Czerner @ 2014-07-17 10:40 UTC (permalink / raw) To: Mason; +Cc: Andreas Dilger, Ext4 Developers List, linux-fsdevel On Thu, 17 Jul 2014, Mason wrote: > Date: Thu, 17 Jul 2014 12:30:34 +0200 > From: Mason <mpeg.blue@free.fr> > To: Andreas Dilger <adilger@dilger.ca> > Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>, > linux-fsdevel <linux-fsdevel@vger.kernel.org> > Subject: Re: After unlinking a large file on ext4, > the process stalls for a long time > > Hello, > > Andreas Dilger wrote: > > > Mason wrote: > > > >> The use case is > >> - allocate a large file > >> - stick a file system on it > >> - store stuff (typically video files) inside this "private" FS > >> - when the user decides he doesn't need it anymore, unmount and unlink > >> (I also have a resize operation in there, but I wanted to get the > >> basics before taking the hard stuff head on.) > >> > >> So, in the limit, we don't store anything at all: just create and > >> immediately delete. This was my test. > > > > I would agree that LVM is the real solution that you want to use. > > It is specifically designed for this, and has much less overhead than > > a filesystem on a loopback device on a file on another filesystem. > > The amount of space overhead is tuneable, but typically the volumes > > are allocated in multiples of 4MB chunks. > > I'll take a look at LVM. (But, at this point, it's too late to change > the architecture of the system.) > > > That said, I think you've found some kind of strange performance problem, > > and it is worthwhile to figure this out. > > > >>>> /tmp # time ./foo /mnt/hdd/xxx 5 > >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [68 ms] > >>>> unlink(filename): 0 [0 ms] > >>>> 0.00user 1.86system 0:01.92elapsed 97%CPU (0avgtext+0avgdata 528maxresident)k > >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps > >>>> > >>>> /tmp # time ./foo /mnt/hdd/xxx 10 > >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [141 ms] > >>>> unlink(filename): 0 [0 ms] > >>>> 0.00user 3.71system 0:03.83elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k > >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps > >>>> > >>>> /tmp # time ./foo /mnt/hdd/xxx 100 > >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [1882 ms] > >>>> unlink(filename): 0 [0 ms] > >>>> 0.00user 37.12system 0:38.93elapsed 95%CPU (0avgtext+0avgdata 528maxresident)k > >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps > >>>> > >>>> /tmp # time ./foo /mnt/hdd/xxx 300 > >>>> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [3883 ms] > >>>> unlink(filename): 0 [0 ms] > >>>> 0.00user 111.38system 1:55.04elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k > >>>> 0inputs+0outputs (0major+168minor)pagefaults 0swaps > > Preliminary info: > > The partition was created/mounted with > $ mkfs.ext4 -m 0 -i 1024000 -L ZOZO -O ^has_journal,^huge_file /dev/sda1 > $ mount -t ext4 /dev/sda1 /mnt/hdd -o noexec,noatime > (mount is busybox, in case it matters) > > mke2fs 1.42.10 (18-May-2014) > /dev/sda1 contains a ext4 file system labelled 'ZOZO' > last mounted on /mnt/hdd on Wed Jul 16 15:40:40 2014 > Proceed anyway? (y,n) y > Creating filesystem with 104857600 4k blocks and 460800 inodes > Filesystem UUID: 8c12c8fe-6ab8-4888-b9a3-6f28c86020eb > Superblock backups stored on blocks: > 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, > 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, > 102400000 > > Allocating group tables: done > Writing inode tables: done > Writing superblocks and filesystem accounting information: done > > /dev/sda1 on /mnt/hdd type ext4 (rw,noexec,noatime,barrier=1) > /* No support for xattr in this kernel */ > > # dumpe2fs -h /dev/sda1 > dumpe2fs 1.42.10 (18-May-2014) > Filesystem volume name: ZOZO > Last mounted on: <not available> > Filesystem UUID: 8c12c8fe-6ab8-4888-b9a3-6f28c86020eb > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file uninit_bg dir_nlink extra_isize > Filesystem flags: signed_directory_hash > Default mount options: user_xattr acl > Filesystem state: not clean > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 460800 > Block count: 104857600 > Reserved block count: 0 > Free blocks: 104803944 > Free inodes: 460789 > First block: 0 > Block size: 4096 > Fragment size: 4096 > Reserved GDT blocks: 999 > Blocks per group: 32768 > Fragments per group: 32768 > Inodes per group: 144 > Inode blocks per group: 9 > Flex block group size: 16 > Filesystem created: Thu Jul 17 11:14:27 2014 > Last mount time: Thu Jul 17 11:14:29 2014 > Last write time: Thu Jul 17 11:14:29 2014 > Mount count: 1 > Maximum mount count: -1 > Last checked: Thu Jul 17 11:14:27 2014 > Check interval: 0 (<none>) > Lifetime writes: 4883 kB > Reserved blocks uid: 0 (user root) > Reserved blocks gid: 0 (group unknown) > First inode: 11 > Inode size: 256 > Required extra isize: 28 > Desired extra isize: 28 > Default directory hash: half_md4 > Directory Hash Seed: 157f2107-76fc-417b-9a07-491951c873b7 > > > Firstly, have you tried using "fallocate()" directly, instead of > > posix_fallocate()? It may be (depending on your userspace) that > > posix_fallocate() is writing zeroes to the file instead of using > > the fallocate() syscall, and the kernel is busy cleaning up all > > of the dirty pages when the file is unlinked. You could try using > > strace to see what system calls are actually being used. > > Unfortunately, I'm using a prehistoric version of glibc (2.8) > that doesn't support the fallocate wrapper (imported in 2.10). > > I'm 70% sure that posix_fallocate() is not actually writing zeros > to the file, because when I tested it on ext2, creating a 300-GB > file took hours, literally (approx. 3 hours). The same operation > on ext4 takes a few seconds. (Although, now that I think of it, > it could be working asynchronously, or defer some operation, that > I eventually have to pay for on deletion.) > > # time strace -tt -T ./foo /mnt/hdd/xxx 300 2> strace.out > posix_fallocate(fd, 0, size_in_GiB << 30): 0 [414 ms] > unlink(filename): 0 [1 ms] > > > 12:23:27.218838 open("/mnt/hdd/xxx", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 3 <0.000486> > 12:23:27.220121 clock_gettime(CLOCK_MONOTONIC, {79879, 926227018}) = 0 <0.000105> > 12:23:27.221029 SYS_4320() = 0 <0.412013> > 12:23:27.633673 clock_gettime(CLOCK_MONOTONIC, {79880, 339646593}) = 0 <0.000104> > 12:23:27.634657 fstat64(1, {st_mode=S_IFCHR|0755, st_rdev=makedev(4, 64), ...}) = 0 <0.000116> > 12:23:27.636187 ioctl(1, TIOCNXCL, {B115200 opost isig icanon echo ...}) = 0 <0.000146> > 12:23:27.637509 old_mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x77248000 <0.000143> > 12:23:27.638306 write(1, "posix_fallocate(fd, 0, size_in_G"..., 54) = 54 <0.000237> > 12:23:27.639496 clock_gettime(CLOCK_MONOTONIC, {79880, 345448452}) = 0 <0.000102> > 12:23:27.640168 unlink("/mnt/hdd/xxx") = 0 <0.000231> > 12:23:27.641174 clock_gettime(CLOCK_MONOTONIC, {79880, 347202581}) = 0 <0.000100> > 12:23:27.641984 write(1, "unlink(filename): 0 [1 ms]\n", 27) = 27 <0.000157> > 12:23:27.643056 exit_group(0) = ? > 0.02user 111.51system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 864maxresident)k > 0inputs+0outputs (0major+459minor)pagefaults 0swaps So it really does not seem to be stalling in fallocate, nor unlink. Can you add close() before unlink, just to be sure what's happening there ? Thanks! -Lukas > > > AFAICT, SYS_4320() is fallocate. > > /* > * Linux o32 style syscalls are in the range from 4000 to 4999. > */ > #define __NR_Linux 4000 > #define __NR_fallocate (__NR_Linux + 320) > > > Where is the process stalling? That is a mystery. Seems it's stuck > in exit_group(), waiting for the kernel to clean up on its behalf? > Maybe I need ftrace, or something to profile the kernel? > > > Secondly, where is the process actually stuck? From your output > > above, the unlink() call takes no measurable time before returning, > > so I don't see where it is actually stuck. Again, running your > > test with "strace -tt -T ./foo /mnt/hdd/xxx 300" will show which > > syscall is actually taking so much time to complete. I don't > > think it is unlink(). > > See above, the process is stalled, but I don't know where! > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-07-17 10:40 ` Lukáš Czerner @ 2014-07-17 11:17 ` Mason 2014-07-17 13:37 ` Theodore Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Mason @ 2014-07-17 11:17 UTC (permalink / raw) To: Lukáš Czerner Cc: Andreas Dilger, Ext4 Developers List, linux-fsdevel Lukáš Czerner wrote: > So it really does not seem to be stalling in fallocate, nor unlink. > Can you add close() before unlink, just to be sure what's happening > there ? Doh! Good catch! Unlinking was fast because the ref count didn't drop to 0 on unlink, it did so on the implicit close done on exit, which would explain why the process stalled "at the end". If I unlink a closed file, it is indeed unlink that stalls. [BTW, some of the e2fsprogs devs may be reading this. I suppose you already know, but the cross-compile build was broken in 1.4.10. I wrote a trivial patch to fix it (cf. the end of this message) although I'm not sure I did it the canonical way.] # time strace -T ./foo /mnt/hdd/xxx 300 2> strace.out posix_fallocate(fd, 0, size_in_GiB << 30): 0 [412 ms] close(fd): 0 [0 ms] unlink(filename): 0 [111481 ms] open("/mnt/hdd/xxx", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 3 <0.000456> clock_gettime(CLOCK_MONOTONIC, {82152, 251657385}) = 0 <0.000085> SYS_4320() = 0 <0.411628> clock_gettime(CLOCK_MONOTONIC, {82152, 664179762}) = 0 <0.000089> fstat64(1, {st_mode=S_IFCHR|0755, st_rdev=makedev(4, 64), ...}) = 0 <0.000094> ioctl(1, TIOCNXCL, {B115200 opost isig icanon echo ...}) = 0 <0.000128> old_mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x773e4000 <0.000195> write(1, "posix_fallocate(fd, 0, size_in_G"..., 54) = 54 <0.000281> clock_gettime(CLOCK_MONOTONIC, {82152, 668413115}) = 0 <0.000077> close(3) = 0 <0.000119> clock_gettime(CLOCK_MONOTONIC, {82152, 669249479}) = 0 <0.000129> write(1, "close(fd): 0 [0 ms]\n", 20) = 20 <0.000145> clock_gettime(CLOCK_MONOTONIC, {82152, 670361133}) = 0 <0.000078> unlink("/mnt/hdd/xxx") = 0 <111.479283> clock_gettime(CLOCK_MONOTONIC, {82264, 150551496}) = 0 <0.000080> write(1, "unlink(filename): 0 [111481 ms]\n", 32) = 32 <0.000225> exit_group(0) = ? 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k 0inputs+0outputs (0major+434minor)pagefaults 0swaps For reference, here's my minimal test case: #define _FILE_OFFSET_BITS 64 #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <fcntl.h> #include <time.h> #define BENCH(op) do { \ struct timespec t0; clock_gettime(CLOCK_MONOTONIC, &t0); \ int err = op; \ struct timespec t1; clock_gettime(CLOCK_MONOTONIC, &t1); \ int ms = (t1.tv_sec-t0.tv_sec)*1000 + (t1.tv_nsec-t0.tv_nsec)/1000000; \ printf("%s: %d [%d ms]\n", #op, err, ms); } while(0) int main(int argc, char **argv) { if (argc != 3) { puts("Usage: prog filename size"); return 42; } char *filename = argv[1]; int fd = open(filename, O_CREAT | O_EXCL | O_WRONLY, 0600); if (fd < 0) { perror("open"); return 1; } long long size_in_GiB = atoi(argv[2]); BENCH(posix_fallocate(fd, 0, size_in_GiB << 30)); BENCH(close(fd)); BENCH(unlink(filename)); return 0; } $ cat e2fsprogs-1.42.10.patch diff -ur a/util/Makefile.in b/util/Makefile.in --- a/util/Makefile.in 2014-05-15 19:04:08.000000000 +0200 +++ b/util/Makefile.in 2014-07-10 15:31:04.819352596 +0200 @@ -15,7 +15,7 @@ .c.o: $(E) " CC $<" - $(Q) $(BUILD_CC) -c $(BUILD_CFLAGS) $< -o $@ + $(Q) $(BUILD_CC) $(CPPFLAGS) -c $(BUILD_CFLAGS) $< -o $@ $(Q) $(CHECK_CMD) $(ALL_CFLAGS) $< PROGS= subst symlinks -- Regards. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-07-17 11:17 ` Mason @ 2014-07-17 13:37 ` Theodore Ts'o 2014-07-17 16:07 ` Mason 0 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2014-07-17 13:37 UTC (permalink / raw) To: Mason Cc: Lukáš Czerner, Andreas Dilger, Ext4 Developers List, linux-fsdevel On Thu, Jul 17, 2014 at 01:17:11PM +0200, Mason wrote: > unlink("/mnt/hdd/xxx") = 0 <111.479283> > > 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k > 0inputs+0outputs (0major+434minor)pagefaults 0swaps ... and we're CPU bound inside the kernel. Can you run perf so we can see exactly where we're spending the CPU? You're not using a journal, so I'm pretty sure what you will find is that we're spending all of our time in mb_free_blocks(), when it is updating the internal mballoc buddy bitmaps. With a journal, this work done by mb_free_blocks() is hidden in the kjournal thread, and happens after the commit is completed, so it won't block other file system operations (other than burning some extra CPU on one of the multiple cores available on a typical x86 CPU). Also, I suspect the CPU overhead is *much* less on an x86 CPU, which has native bit test/set/clear instructions, whereas the MIPS architecture was designed by Prof. Hennessy at Stanford, who was a doctrinaire RISC fanatic, so there would be no bitop instructions. Even though I'm pretty sure what we'll find, knowing exactly *where* in mb_free_blocks() or the function it calls would be helpful in knowing what we need to optimize. So if you could try using perf (assuming that the perf is supported MIPS; not sure if it does) that would be really helpful. Thanks, - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-07-17 13:37 ` Theodore Ts'o @ 2014-07-17 16:07 ` Mason 2014-07-17 16:32 ` Mason 2014-07-18 9:29 ` Lukáš Czerner 0 siblings, 2 replies; 12+ messages in thread From: Mason @ 2014-07-17 16:07 UTC (permalink / raw) To: Theodore Ts'o Cc: Lukáš Czerner, Andreas Dilger, Ext4 Developers List, linux-fsdevel Theodore Ts'o wrote: > Mason wrote: > >> unlink("/mnt/hdd/xxx") = 0 <111.479283> >> >> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k >> 0inputs+0outputs (0major+434minor)pagefaults 0swaps > > ... and we're CPU bound inside the kernel. > > Can you run perf so we can see exactly where we're spending the CPU? > You're not using a journal, so I'm pretty sure what you will find is > that we're spending all of our time in mb_free_blocks(), when it is > updating the internal mballoc buddy bitmaps. > > With a journal, this work done by mb_free_blocks() is hidden in the > kjournal thread, and happens after the commit is completed, so it > won't block other file system operations (other than burning some > extra CPU on one of the multiple cores available on a typical x86 > CPU). > > Also, I suspect the CPU overhead is *much* less on an x86 CPU, which > has native bit test/set/clear instructions, whereas the MIPS > architecture was designed by Prof. Hennessy at Stanford, who was a > doctrinaire RISC fanatic, so there would be no bitop instructions. > > Even though I'm pretty sure what we'll find, knowing exactly *where* > in mb_free_blocks() or the function it calls would be helpful in > knowing what we need to optimize. So if you could try using perf > (assuming that the perf is supported MIPS; not sure if it does) that > would be really helpful. Is perf "better" than oprofile? (For some metric) I have enabled: CONFIG_PERF_EVENTS=y CONFIG_PROFILING=y CONFIG_TRACEPOINTS=y CONFIG_OPROFILE=y CONFIG_HAVE_OPROFILE=y CONFIG_KPROBES=y CONFIG_KRETPROBES=y What command-line do you suggest I run to get the output you expect? (I'll try to get it done, but I might have to wait two weeks before I can run these tests.) -- Regards. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-07-17 16:07 ` Mason @ 2014-07-17 16:32 ` Mason 2014-07-18 9:29 ` Lukáš Czerner 1 sibling, 0 replies; 12+ messages in thread From: Mason @ 2014-07-17 16:32 UTC (permalink / raw) To: Theodore Ts'o Cc: Lukáš Czerner, Andreas Dilger, Ext4 Developers List, linux-fsdevel On 17/07/2014 18:07, Mason wrote: > Theodore Ts'o wrote: > >> Mason wrote: >> >>> unlink("/mnt/hdd/xxx") = 0 <111.479283> >>> >>> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k >>> 0inputs+0outputs (0major+434minor)pagefaults 0swaps >> >> ... and we're CPU bound inside the kernel. >> >> Can you run perf so we can see exactly where we're spending the CPU? >> You're not using a journal, so I'm pretty sure what you will find is >> that we're spending all of our time in mb_free_blocks(), when it is >> updating the internal mballoc buddy bitmaps. >> >> With a journal, this work done by mb_free_blocks() is hidden in the >> kjournal thread, and happens after the commit is completed, so it >> won't block other file system operations (other than burning some >> extra CPU on one of the multiple cores available on a typical x86 >> CPU). >> >> Also, I suspect the CPU overhead is *much* less on an x86 CPU, which >> has native bit test/set/clear instructions, whereas the MIPS >> architecture was designed by Prof. Hennessy at Stanford, who was a >> doctrinaire RISC fanatic, so there would be no bitop instructions. >> >> Even though I'm pretty sure what we'll find, knowing exactly *where* >> in mb_free_blocks() or the function it calls would be helpful in >> knowing what we need to optimize. So if you could try using perf >> (assuming that the perf is supported MIPS; not sure if it does) that >> would be really helpful. > > Is perf "better" than oprofile? (For some metric) > > I have enabled: > > CONFIG_PERF_EVENTS=y > CONFIG_PROFILING=y > CONFIG_TRACEPOINTS=y > CONFIG_OPROFILE=y > CONFIG_HAVE_OPROFILE=y > CONFIG_KPROBES=y > CONFIG_KRETPROBES=y > > What command-line do you suggest I run to get the output you expect? > (I'll try to get it done, but I might have to wait two weeks before > I can run these tests.) So much for oprofile... CC arch/mips/oprofile/../../../drivers/oprofile/oprof.o arch/mips/oprofile/../../../drivers/oprofile/oprof.c: In function 'oprofile_init': arch/mips/oprofile/../../../drivers/oprofile/oprof.c:316: error: 'timer' undeclared (first use in this function) arch/mips/oprofile/../../../drivers/oprofile/oprof.c:316: error: (Each undeclared identifier is reported only once arch/mips/oprofile/../../../drivers/oprofile/oprof.c:316: error: for each function it appears in.) arch/mips/oprofile/../../../drivers/oprofile/oprof.c: In function '__check_timer': arch/mips/oprofile/../../../drivers/oprofile/oprof.c:373: error: 'timer' undeclared (first use in this function) arch/mips/oprofile/../../../drivers/oprofile/oprof.c: At top level: arch/mips/oprofile/../../../drivers/oprofile/oprof.c:373: error: 'timer' undeclared here (not in a function) cc1: warnings being treated as errors arch/mips/oprofile/../../../drivers/oprofile/oprof.c:373: error: type defaults to 'int' in declaration of 'type name' make[1]: *** [arch/mips/oprofile/../../../drivers/oprofile/oprof.o] Error 1 make: *** [arch/mips/oprofile] Error 2 Dunno if this happens on vanilla kernels, or if the ODM messed something up (again). $ ll tools/perf/arch/ drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 arm/ drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 powerpc/ drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 s390/ drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 sh/ drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 sparc/ drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 x86/ I'm not sure perf supports MIPS... Or maybe it does $ g -rni mips . ./Makefile:45: -e s/ppc.*/powerpc/ -e s/mips.*/mips/ \ Binary file ./.Makefile.swp matches ./perf.h:76:#ifdef __mips__ ./perf.h:77:#include "../../arch/mips/include/asm/unistd.h" ./perf.h:79: ".set mips2\n\t" \ ./perf.h:81: ".set mips0" \ -- Regards. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-07-17 16:07 ` Mason 2014-07-17 16:32 ` Mason @ 2014-07-18 9:29 ` Lukáš Czerner [not found] ` <53DF9918.3010206@free.fr> 1 sibling, 1 reply; 12+ messages in thread From: Lukáš Czerner @ 2014-07-18 9:29 UTC (permalink / raw) To: Mason Cc: Theodore Ts'o, Andreas Dilger, Ext4 Developers List, linux-fsdevel [-- Attachment #1: Type: TEXT/PLAIN, Size: 2556 bytes --] On Thu, 17 Jul 2014, Mason wrote: > Date: Thu, 17 Jul 2014 18:07:30 +0200 > From: Mason <mpeg.blue@free.fr> > To: Theodore Ts'o <tytso@mit.edu> > Cc: Lukáš Czerner <lczerner@redhat.com>, Andreas Dilger <adilger@dilger.ca>, > Ext4 Developers List <linux-ext4@vger.kernel.org>, > linux-fsdevel <linux-fsdevel@vger.kernel.org> > Subject: Re: After unlinking a large file on ext4, > the process stalls for a long time > > Theodore Ts'o wrote: > > > Mason wrote: > > > >> unlink("/mnt/hdd/xxx") = 0 <111.479283> > >> > >> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k > >> 0inputs+0outputs (0major+434minor)pagefaults 0swaps > > > > ... and we're CPU bound inside the kernel. > > > > Can you run perf so we can see exactly where we're spending the CPU? > > You're not using a journal, so I'm pretty sure what you will find is > > that we're spending all of our time in mb_free_blocks(), when it is > > updating the internal mballoc buddy bitmaps. > > > > With a journal, this work done by mb_free_blocks() is hidden in the > > kjournal thread, and happens after the commit is completed, so it > > won't block other file system operations (other than burning some > > extra CPU on one of the multiple cores available on a typical x86 > > CPU). > > > > Also, I suspect the CPU overhead is *much* less on an x86 CPU, which > > has native bit test/set/clear instructions, whereas the MIPS > > architecture was designed by Prof. Hennessy at Stanford, who was a > > doctrinaire RISC fanatic, so there would be no bitop instructions. > > > > Even though I'm pretty sure what we'll find, knowing exactly *where* > > in mb_free_blocks() or the function it calls would be helpful in > > knowing what we need to optimize. So if you could try using perf > > (assuming that the perf is supported MIPS; not sure if it does) that > > would be really helpful. > > Is perf "better" than oprofile? (For some metric) > > I have enabled: > > CONFIG_PERF_EVENTS=y > CONFIG_PROFILING=y > CONFIG_TRACEPOINTS=y > CONFIG_OPROFILE=y > CONFIG_HAVE_OPROFILE=y > CONFIG_KPROBES=y > CONFIG_KRETPROBES=y > > What command-line do you suggest I run to get the output you expect? > (I'll try to get it done, but I might have to wait two weeks before > I can run these tests.) If perf works on your system you can record data with perf record -g ./test file <size> and then report with perf report --stdio That should yield some interesting information about where we spend the most time in kernel. Thanks! -Lukas ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <53DF9918.3010206@free.fr>]
* Re: After unlinking a large file on ext4, the process stalls for a long time [not found] ` <53DF9918.3010206@free.fr> @ 2014-08-04 22:55 ` Andreas Dilger 2014-08-05 2:33 ` Theodore Ts'o 2014-08-05 12:06 ` Mason 0 siblings, 2 replies; 12+ messages in thread From: Andreas Dilger @ 2014-08-04 22:55 UTC (permalink / raw) To: Mason; +Cc: Lukáš Czerner, Theodore Ts'o, Ext4 Developers List It would be possible to optimize mb_free_blocks() by having it clear a whole word at a time instead of a series if bits. I thought that was done already, but it doesn't appear to be the case. Also, it isn't clear that the bit "normalization" is needed anymore. This was done back in the aniceint times when the buddy bitmaps were stored on disk instead of being regenerated only at mount time. Cheers, Andreas > On Aug 4, 2014, at 16:30, Mason <mpeg.blue@free.fr> wrote: > >> On 18/07/2014 11:29, Lukáš Czerner wrote: >> >> Mason wrote: >> >>> Theodore Ts'o wrote: >>> >>>> Mason wrote: >>>> >>>>> unlink("/mnt/hdd/xxx") = 0 <111.479283> >>>>> >>>>> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k >>>>> 0inputs+0outputs (0major+434minor)pagefaults 0swaps >>>> >>>> ... and we're CPU bound inside the kernel. >>>> >>>> Can you run perf so we can see exactly where we're spending the CPU? >>>> You're not using a journal, so I'm pretty sure what you will find is >>>> that we're spending all of our time in mb_free_blocks(), when it is >>>> updating the internal mballoc buddy bitmaps. >>>> >>>> With a journal, this work done by mb_free_blocks() is hidden in the >>>> kjournal thread, and happens after the commit is completed, so it >>>> won't block other file system operations (other than burning some >>>> extra CPU on one of the multiple cores available on a typical x86 >>>> CPU). >>>> >>>> Also, I suspect the CPU overhead is *much* less on an x86 CPU, which >>>> has native bit test/set/clear instructions, whereas the MIPS >>>> architecture was designed by Prof. Hennessy at Stanford, who was a >>>> doctrinaire RISC fanatic, so there would be no bitop instructions. > > I've attached the output of "mips-linux-gnu-objdump -xd mballoc.o" > in case someone wants to peek at the generated code. > >>>> Even though I'm pretty sure what we'll find, knowing exactly *where* >>>> in mb_free_blocks() or the function it calls would be helpful in >>>> knowing what we need to optimize. So if you could try using perf >>>> (assuming that the perf is supported MIPS; not sure if it does) that >>>> would be really helpful. > > How do you get perf to tell you where in mb_free_blocks we are spending > the most time? > >>> What command-line do you suggest I run to get the output you expect? >> >> If perf works on your system you can record data with >> >> perf record -g ./test file <size> >> >> and then report with >> >> perf report --stdio >> >> That should yield some interesting information about where we spend >> the most time in kernel. > > I've no idea why, but the unlink operation, which used to take > 111 seconds to run, now only takes 53... > > Anyway, here is the requested output. > > # time perf record -g foo /mnt/hdd/xxx 300 > [ perf record: Woken up 8 times to write data ] > [ perf record: Captured and wrote 1.909 MB perf.data (~83406 samples) ] > 0.04user 0.08system 0:53.54elapsed 0%CPU (0avgtext+0avgdata 3616maxresident)k > 0inputs+0outputs (0major+984minor)pagefaults 0swaps > > # perf report --stdio > report.txt > (Complete report attached as report.txt.xz) > > What can I do to improve the latency of unlinking large files? > Would sparse_super2 help at all? > > > # Events: 14K cycles > # > # Overhead Command Shared Object Symbol > # ........ ....... ................. ............................ > # > 33.94% foo [kernel.kallsyms] [k] mb_free_blocks > | > --- mb_free_blocks > ext4_free_blocks > ext4_ext_rm_leaf > ext4_ext_truncate > ext4_truncate > ext4_evict_inode > evict > do_unlinkat > stack_done > > 21.11% foo [kernel.kallsyms] [k] __find_get_block > | > --- __find_get_block > | > |--99.94%-- ext4_free_blocks > | ext4_ext_rm_leaf > | ext4_ext_truncate > | ext4_truncate > | ext4_evict_inode > | evict > | do_unlinkat > | stack_done > --0.06%-- [...] > > 8.33% foo [kernel.kallsyms] [k] radix_tree_lookup_slot > | > --- radix_tree_lookup_slot > find_get_page > __find_get_block_slow > __find_get_block > ext4_free_blocks > ext4_ext_rm_leaf > ext4_ext_truncate > ext4_truncate > ext4_evict_inode > evict > do_unlinkat > stack_done > > 6.99% foo [kernel.kallsyms] [k] mb_find_buddy > | > --- mb_find_buddy > mb_free_blocks > ext4_free_blocks > ext4_ext_rm_leaf > ext4_ext_truncate > ext4_truncate > ext4_evict_inode > evict > do_unlinkat > stack_done > > 4.21% foo [kernel.kallsyms] [k] trace_preempt_off > | > --- trace_preempt_off > | > |--99.99%-- __find_get_block > | ext4_free_blocks > | ext4_ext_rm_leaf > | ext4_ext_truncate > | ext4_truncate > | ext4_evict_inode > | evict > | do_unlinkat > | stack_done > --0.01%-- [...] > > 4.19% foo [kernel.kallsyms] [k] ext4_free_blocks > | > --- ext4_free_blocks > ext4_ext_rm_leaf > ext4_ext_truncate > ext4_truncate > ext4_evict_inode > evict > do_unlinkat > stack_done > > 4.14% foo [kernel.kallsyms] [k] sub_preempt_count > | > --- sub_preempt_count > | > |--99.69%-- __find_get_block > | ext4_free_blocks > | ext4_ext_rm_leaf > | ext4_ext_truncate > | ext4_truncate > | ext4_evict_inode > | evict > | do_unlinkat > | stack_done > --0.31%-- [...] > > 3.97% foo [kernel.kallsyms] [k] __find_get_block_slow > | > --- __find_get_block_slow > __find_get_block > ext4_free_blocks > ext4_ext_rm_leaf > ext4_ext_truncate > ext4_truncate > ext4_evict_inode > evict > do_unlinkat > stack_done > > 3.53% foo [kernel.kallsyms] [k] __rcu_read_unlock > | > --- __rcu_read_unlock > | > |--100.00%-- find_get_page > | __find_get_block_slow > | __find_get_block > | ext4_free_blocks > | ext4_ext_rm_leaf > | ext4_ext_truncate > | ext4_truncate > | ext4_evict_inode > | evict > | do_unlinkat > | stack_done > --0.00%-- [...] > > 3.26% foo [kernel.kallsyms] [k] trace_preempt_on > | > --- trace_preempt_on > sub_preempt_count > | > |--100.00%-- __find_get_block > | ext4_free_blocks > | ext4_ext_rm_leaf > | ext4_ext_truncate > | ext4_truncate > | ext4_evict_inode > | evict > | do_unlinkat > | stack_done > --0.00%-- [...] > > 2.06% foo [kernel.kallsyms] [k] find_get_page > | > --- find_get_page > | > |--100.00%-- __find_get_block_slow > | __find_get_block > | ext4_free_blocks > | ext4_ext_rm_leaf > | ext4_ext_truncate > | ext4_truncate > | ext4_evict_inode > | evict > | do_unlinkat > | stack_done > --0.00%-- [...] > > 1.39% foo [kernel.kallsyms] [k] add_preempt_count > | > --- add_preempt_count > | > |--99.99%-- __find_get_block > | ext4_free_blocks > | ext4_ext_rm_leaf > | ext4_ext_truncate > | ext4_truncate > | ext4_evict_inode > | evict > | do_unlinkat > | stack_done > --0.01%-- [...] > > 1.26% foo [kernel.kallsyms] [k] __rcu_read_lock > | > --- __rcu_read_lock > find_get_page > __find_get_block_slow > __find_get_block > ext4_free_blocks > ext4_ext_rm_leaf > ext4_ext_truncate > ext4_truncate > ext4_evict_inode > evict > do_unlinkat > stack_done > > -- > Regards. > > <report.txt.xz> > <mballoc.dump.xz> -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-08-04 22:55 ` Andreas Dilger @ 2014-08-05 2:33 ` Theodore Ts'o 2014-08-05 21:54 ` Andreas Dilger 2014-08-05 12:06 ` Mason 1 sibling, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2014-08-05 2:33 UTC (permalink / raw) To: Andreas Dilger; +Cc: Mason, Lukáš Czerner, Ext4 Developers List On Tue, Aug 05, 2014 at 12:55:14AM +0200, Andreas Dilger wrote: > It would be possible to optimize mb_free_blocks() by having it > clear a whole word at a time instead of a series if bits. It looks like we're doing this already in mb_test_and_clear_bits(), aren't we? > I thought that was done already, but it doesn't appear to be the case. > Also, it isn't clear that the bit "normalization" is needed anymore. > This was done back in the aniceint times when the buddy bitmaps were stored on disk instead of being regenerated only at mount time. I'm not sure what you mean by this; the only reference I can find normalization is with normalizing requests? - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-08-05 2:33 ` Theodore Ts'o @ 2014-08-05 21:54 ` Andreas Dilger 0 siblings, 0 replies; 12+ messages in thread From: Andreas Dilger @ 2014-08-05 21:54 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Mason, Lukáš Czerner, Ext4 Developers List On Aug 5, 2014, at 4:33, Theodore Ts'o <tytso@mit.edu> wrote: > >> On Tue, Aug 05, 2014 at 12:55:14AM +0200, Andreas Dilger wrote: >> It would be possible to optimize mb_free_blocks() by having it >> clear a whole word at a time instead of a series if bits. > > It looks like we're doing this already in mb_test_and_clear_bits(), > aren't we? Sorry, I didn't see mb_test_and_clear_bits(), I was only looking at mb_clear_bit() to see if it be the multi-bit optimization. >> I thought that was done already, but it doesn't appear to be the case. >> Also, it isn't clear that the bit "normalization" is needed anymore. >> This was done back in the aniceint times when the buddy bitmaps were stored on disk instead of being regenerated only at mount time. > > I'm not sure what you mean by this; the only reference I can find > normalization is with normalizing requests? I meant mb_correct_addr_and_bit(). Cheers, Andreas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: After unlinking a large file on ext4, the process stalls for a long time 2014-08-04 22:55 ` Andreas Dilger 2014-08-05 2:33 ` Theodore Ts'o @ 2014-08-05 12:06 ` Mason 1 sibling, 0 replies; 12+ messages in thread From: Mason @ 2014-08-05 12:06 UTC (permalink / raw) To: Andreas Dilger Cc: Lukáš Czerner, Theodore Ts'o, Ext4 Developers List On 05/08/2014 00:55, Andreas Dilger wrote: > It would be possible to optimize mb_free_blocks() by having it > clear a whole word at a time instead of a series of bits. > > I thought that was done already, but it doesn't appear to be the case. > Also, it isn't clear that the bit "normalization" is needed anymore. > This was done back in the ancient times when the buddy bitmaps were > stored on disk instead of being regenerated only at mount time. Are there any other tests you'd like me to run? (I will be permanently losing access to this platform in a few days.) -- Regards. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-08-05 21:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <53C687B1.30809@free.fr>
[not found] ` <21446.38705.190786.631403@quad.stoffel.home>
[not found] ` <53C6B38A.3000100@free.fr>
2014-07-17 3:37 ` After unlinking a large file on ext4, the process stalls for a long time Andreas Dilger
2014-07-17 10:30 ` Mason
2014-07-17 10:40 ` Lukáš Czerner
2014-07-17 11:17 ` Mason
2014-07-17 13:37 ` Theodore Ts'o
2014-07-17 16:07 ` Mason
2014-07-17 16:32 ` Mason
2014-07-18 9:29 ` Lukáš Czerner
[not found] ` <53DF9918.3010206@free.fr>
2014-08-04 22:55 ` Andreas Dilger
2014-08-05 2:33 ` Theodore Ts'o
2014-08-05 21:54 ` Andreas Dilger
2014-08-05 12:06 ` Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).