* discard and data=writeback @ 2020-12-18 18:40 Matteo Croce 2020-12-21 3:04 ` Theodore Y. Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Matteo Croce @ 2020-12-18 18:40 UTC (permalink / raw) To: linux-ext4 Hi, I noticed a big slowdown on file removal, so I tried to remove the discard option, and it helped a lot. Obviously discarding blocks will have an overhead, but the strange thing is that it only does when using data=writeback: Ordered: $ dmesg |grep EXT4 [ 0.243372] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null) $ grep -w / /proc/mounts /dev/root / ext4 rw,noatime 0 0 $ time rm -rf linux-5.10 real 0m0.454s user 0m0.029s sys 0m0.409s $ grep -w / /proc/mounts /dev/root / ext4 rw,noatime,discard 0 0 $ time rm -rf linux-5.10 real 0m0.554s user 0m0.051s sys 0m0.403s Writeback: $ dmesg |grep EXT4 [ 0.243909] EXT4-fs (vda1): mounted filesystem with writeback data mode. Opts: (null) $ grep -w / /proc/mounts /dev/root / ext4 rw,noatime 0 0 $ time rm -rf linux-5.10 real 0m0.440s user 0m0.030s sys 0m0.407s $ grep -w / /proc/mounts /dev/root / ext4 rw,noatime,discard 0 0 $ time rm -rf linux-5.10 real 0m3.763s user 0m0.030s sys 0m0.876s It seems that ext4_issue_discard() is called ~300 times with data=ordered and ~50k times with data=writeback. I'm using vanilla 5.10.1 kernel. Any thoughts? Regards, -- per aspera ad upstream ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-18 18:40 discard and data=writeback Matteo Croce @ 2020-12-21 3:04 ` Theodore Y. Ts'o 2020-12-22 14:59 ` Matteo Croce 0 siblings, 1 reply; 13+ messages in thread From: Theodore Y. Ts'o @ 2020-12-21 3:04 UTC (permalink / raw) To: Matteo Croce; +Cc: linux-ext4 On Fri, Dec 18, 2020 at 07:40:09PM +0100, Matteo Croce wrote: > > I noticed a big slowdown on file removal, so I tried to remove the > discard option, and it helped > a lot. > Obviously discarding blocks will have an overhead, but the strange > thing is that it only > does when using data=writeback: If data=ordered mount option is enabled, when you have allocating buffered writes pending, the data block writes are forced out *before* we write out the journal blocks, followed by a cache flush, followed by the commit block (which is either written with the Forced Unit Attention bit set if the storage device supports this, or the commit block is followed by another cache flush). After the journal commit block is written out, then if the discard mount option is enabled, then all blocks that were released during the last joutnal transaction are then discarded. If data=writeback is enabled, then we do *not* flush out any dirty pages in the page cache that were allocated during the previous transaction. This means that if you crash, it is possible that freshly inodes that contain freshly allocated blocks may have stale data in those new allocated blocks. This blocks might include some other users' e-mails, medical records, cryptographic keys, or other PII. Which is why data=ordered is the default. So if data=ordered and data=writeback makes any difference, the first question I'd have to ask is whether any dirty pages in the page cache, or any background writes happening in parallel with the rm -rf command. > It seems that ext4_issue_discard() is called ~300 times with data=ordered > and ~50k times with data=writeback. ext4_issue_discard() gets called for each contiguous set of blocks that were released in a particular jbd2 transaction. So if you are deleting 100 files, and all of those files are unlinked in a single transaction, and all of those blocks belonging to those files belong to a single contiguous block region, then ext4_issue_discard() will be called only once. If you delete a single file, but all of its blocks are heavily fragmented, then ext4_issue_discard() be called a thousand times. If you delete 100 files, all of which are contiguous, but each file is in a different part of the disk, then ext4_issue_discard() might be called 100 times. So that implies that your experiment may not be repeatable; did you make sure the file system was freshly reformatted before you wrote out the files in the directory you are deleting? And was the directory written out in exactly the same way? And did you make sure all of the writes were flushed out to disk before you tried timing the "rm -rf" command? And did you make sure that there weren't any other processes running that might be issuing other file system operations (either data or metadata heavy) that might be interfering with the "rm -rf" operation? What kind of storage device were you using? (An SSD; a USB thumb drive; some kind of Cloud emulated block device?) Note that benchmarking the file system operations is *hard*. When I worked with a graduate student working on a paper describing a prototype of a file system enhancement to ext4 to optimize ext4 for drive-managed SMR drives[1], the graduate student spent *way* more time getting reliable, repeatable benchmarks than making changes to ext4 for the prototype. (It turns out the SMR GC operations caused variations in write speeds, which meant the writeback throughput measurements would fluctuate wildly, which then influenced the writeback cache ratio, which in turn massively influenced the how aggressively the writeback threads would behave, which in turn massively influenced the filebench and postmark numbers.) [1] https://www.usenix.org/conference/fast17/technical-sessions/presentation/aghayev So there can be variability caused by how blocks are allocated at the file system; how the SSD is assigning blocks to flash erase blocks; how the SSD's GC operation influences its write speed, which can in turn influence the kernel's measured writeback throughput; different SSD's or Cloud block devices can have very different discard performance that can vary based on past write history, yadda, yadda, yadda. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-21 3:04 ` Theodore Y. Ts'o @ 2020-12-22 14:59 ` Matteo Croce 2020-12-22 16:34 ` Theodore Y. Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Matteo Croce @ 2020-12-22 14:59 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: linux-ext4 On Mon, Dec 21, 2020 at 4:04 AM Theodore Y. Ts'o <tytso@mit.edu> wrote: > > So that implies that your experiment may not be repeatable; did you > make sure the file system was freshly reformatted before you wrote out > the files in the directory you are deleting? And was the directory > written out in exactly the same way? And did you make sure all of the > writes were flushed out to disk before you tried timing the "rm -rf" > command? And did you make sure that there weren't any other processes > running that might be issuing other file system operations (either > data or metadata heavy) that might be interfering with the "rm -rf" > operation? What kind of storage device were you using? (An SSD; a > USB thumb drive; some kind of Cloud emulated block device?) > I got another machine with a faster NVME disk. I discarded the whole drive before partitioning it, this drive is very fast in discarding blocks: # time blkdiscard -f /dev/nvme0n1p1 real 0m1.356s user 0m0.003s sys 0m0.000s Also, the drive is pretty big compared to the dataset size, so it's unlikely to be fragmented: # lsblk /dev/nvme0n1 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 1.7T 0 disk └─nvme0n1p1 259:1 0 1.7T 0 part /media # df -h /media Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p1 1.8T 1.2G 1.7T 1% /media # du -sh /media/linux-5.10/ 1.1G /media/linux-5.10/ I'm issuing sync + sleep(10) after the extraction, so the writes should all be flushed. Also, I repeated the test three times, with very similar results: # dmesg |grep EXT4-fs [12807.847559] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: data=ordered,discard # tar xf ~/linux-5.10.tar ; sync ; sleep 10 # time rm -rf linux-5.10/ real 0m1.607s user 0m0.048s sys 0m1.559s # tar xf ~/linux-5.10.tar ; sync ; sleep 10 # time rm -rf linux-5.10/ real 0m1.634s user 0m0.080s sys 0m1.553s # tar xf ~/linux-5.10.tar ; sync ; sleep 10 # time rm -rf linux-5.10/ real 0m1.604s user 0m0.052s sys 0m1.552s # dmesg |grep EXT4-fs [13133.953978] EXT4-fs (nvme0n1p1): mounted filesystem with writeback data mode. Opts: data=writeback,discard # tar xf ~/linux-5.10.tar ; sync ; sleep 10 # time rm -rf linux-5.10/ real 1m29.443s user 0m0.073s sys 0m2.520s # tar xf ~/linux-5.10.tar ; sync ; sleep 10 # time rm -rf linux-5.10/ real 1m29.409s user 0m0.081s sys 0m2.518s # tar xf ~/linux-5.10.tar ; sync ; sleep 10 # time rm -rf linux-5.10/ real 1m19.283s user 0m0.068s sys 0m2.505s > Note that benchmarking the file system operations is *hard*. When I > worked with a graduate student working on a paper describing a > prototype of a file system enhancement to ext4 to optimize ext4 for > drive-managed SMR drives[1], the graduate student spent *way* more > time getting reliable, repeatable benchmarks than making changes to > ext4 for the prototype. (It turns out the SMR GC operations caused > variations in write speeds, which meant the writeback throughput > measurements would fluctuate wildly, which then influenced the > writeback cache ratio, which in turn massively influenced the how > aggressively the writeback threads would behave, which in turn > massively influenced the filebench and postmark numbers.) > > [1] https://www.usenix.org/conference/fast17/technical-sessions/presentation/aghayev > Interesting! Cheers, -- per aspera ad upstream ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-22 14:59 ` Matteo Croce @ 2020-12-22 16:34 ` Theodore Y. Ts'o 2020-12-22 22:53 ` Andreas Dilger 2020-12-23 0:47 ` Matteo Croce 0 siblings, 2 replies; 13+ messages in thread From: Theodore Y. Ts'o @ 2020-12-22 16:34 UTC (permalink / raw) To: Matteo Croce; +Cc: linux-ext4 On Tue, Dec 22, 2020 at 03:59:29PM +0100, Matteo Croce wrote: > > I'm issuing sync + sleep(10) after the extraction, so the writes > should all be flushed. > Also, I repeated the test three times, with very similar results: So that means the problem is not due to page cache writeback interfering with the discards. So it's most likely that the problem is due to how the blocks are allocated and laid out when using data=ordered vs data=writeback. Some experiments to try next. After extracting the files with data=ordered and data=writeback on a freshly formatted file system, use "e2freefrag" to see how the free space is fragmented. This will tell us how the file system is doing from a holistic perspective, in terms of blocks allocated to the extracted files. (E2freefrag is showing you the blocks *not* allocated, of course, but that's a mirror image dual of the blocks that *are* allocated, especially if you start from an identical known state; hence the use of a freshly formatted file system.) Next, we can see how individual files look like with respect to fragmentation. This can be done via using filefrag on all of the files, e.g: find . -type f -print0 | xargs -0 filefrag Another way to get similar (although not identical) information is via running "e2fsck -E fragcheck" on a file system. How they differ is especially more of a big deal on ext3 file systems without extents and flex_bg, since filefrag tries to take into account metadata blocks such as indirect blocks and extent tree blocks, and e2fsck -E fragcheck does not; but it's good enough for getting a good gestalt for the files' overall fragmentation --- and note that as long as the average fragment size is at least a megabyte or two, some fragmentation really isn't that much of a problem from a real-world performance perspective. People can get way too invested in trying to get to perfection with 100% fragmentation-free files. The problem with doing this at the expense of all else is that you can end up making the overall free space fragmentation worse as the file system ages, at which point the file system performance really dives through the floor as the file system approaches 100%, or even 80-90% full, especially on HDD's. For SSD's fragmentation doesn't matter quite so much, unless the average fragment size is *really* small, and when you are discarded freed blocks. Even if the files are showing no substantial difference in fragmentation, and the free space is equally A-OK with respect to fragmentation, the other possibility is the *layout* of the blocks are such that the order in which they are deleted using rm -rf ends up being less friendly from a discard perspective. This can happen if the directory hierarchy is big enough, and/or the journal size is small enough, that the rm -rf requires multiple journal transactions to complete. That's because with mount -o discard, we do the discards after each transaction commit, and it might be that even though the used blocks are perfectly contiguous, because of the order in which the files end up getting deleted, we end up needing to discard them in smaller chunks. For example, one could imagine a case where you have a million 4k files, and they are allocated contiguously, but if you get super-unlucky, such that in the first transaction you delete all of the odd-numbered files, and in second transaction you delete all of the even-numbered files, you might need to do a million 4k discards --- but if all of the deletes could fit into a single transaction, you would only need to do a single million block discard operation. Finally, you may want to consider whether or not mount -o discard really makes sense or not. For most SSD's, especially high-end SSD's, it probably doesn't make that much difference. That's because when you overwrite a sector, the SSD knows (or should know; this might not be some really cheap, crappy low-end flash devices; but on those devices, discard might not be making uch of a difference anyway), that the old contents of the sector is no longer needed. Hence an overwrite effectively is an "implied discard". So long as there is a sufficient number of free erase blocks, the SSD might be able to keep up doing the GC for those "implied discards", and so accelerating the process by sending explicit discards after every journal transaction might not be necessary. Or, maybe it's sufficient to run "fstrim" every week at Sunday 3am local time; or maybe even fstrim once a night or fstrim once a month --- your mileage may vary. It's going to vary from SSD to SSD and from workload to workload, but you might find that mount -o discard isn't buying you all that much --- if you run a random write workload, and you don't notice any performance degradation, and you don't notice an increase in the SSD's write amplification numbers (if they are provided by your SSD), then you might very well find that it's not worth it to use mount -o discard. I personally don't bother using mount -o discard, and instead periodically run fstrim, on my personal machines. Part of that is because I'm mostly just reading and replying to emails, building kernels and editing text files, and that is not nearly as stressful on the FTL as a full-blown random write workload (for example, if you were running a database supporting a transaction processing workload). Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-22 16:34 ` Theodore Y. Ts'o @ 2020-12-22 22:53 ` Andreas Dilger 2020-12-23 1:25 ` Matteo Croce 2020-12-23 0:47 ` Matteo Croce 1 sibling, 1 reply; 13+ messages in thread From: Andreas Dilger @ 2020-12-22 22:53 UTC (permalink / raw) To: Matteo Croce; +Cc: Ext4, Wang Shilong, Theodore Y. Ts'o [-- Attachment #1: Type: text/plain, Size: 7171 bytes --] On Dec 22, 2020, at 9:34 AM, Theodore Y. Ts'o <tytso@MIT.EDU> wrote: > > On Tue, Dec 22, 2020 at 03:59:29PM +0100, Matteo Croce wrote: >> >> I'm issuing sync + sleep(10) after the extraction, so the writes >> should all be flushed. >> Also, I repeated the test three times, with very similar results: > > So that means the problem is not due to page cache writeback > interfering with the discards. So it's most likely that the problem > is due to how the blocks are allocated and laid out when using > data=ordered vs data=writeback. > > Some experiments to try next. After extracting the files with > data=ordered and data=writeback on a freshly formatted file system, > use "e2freefrag" to see how the free space is fragmented. This will > tell us how the file system is doing from a holistic perspective, in > terms of blocks allocated to the extracted files. (E2freefrag is > showing you the blocks *not* allocated, of course, but that's a mirror > image dual of the blocks that *are* allocated, especially if you start > from an identical known state; hence the use of a freshly formatted > file system.) > > Next, we can see how individual files look like with respect to > fragmentation. This can be done via using filefrag on all of the > files, e.g: > > find . -type f -print0 | xargs -0 filefrag > > Another way to get similar (although not identical) information is via > running "e2fsck -E fragcheck" on a file system. How they differ is > especially more of a big deal on ext3 file systems without extents and > flex_bg, since filefrag tries to take into account metadata blocks > such as indirect blocks and extent tree blocks, and e2fsck -E > fragcheck does not; but it's good enough for getting a good gestalt > for the files' overall fragmentation --- and note that as long as the > average fragment size is at least a megabyte or two, some > fragmentation really isn't that much of a problem from a real-world > performance perspective. People can get way too invested in trying to > get to perfection with 100% fragmentation-free files. The problem > with doing this at the expense of all else is that you can end up > making the overall free space fragmentation worse as the file system > ages, at which point the file system performance really dives through > the floor as the file system approaches 100%, or even 80-90% full, > especially on HDD's. For SSD's fragmentation doesn't matter quite so > much, unless the average fragment size is *really* small, and when you > are discarded freed blocks. > > Even if the files are showing no substantial difference in > fragmentation, and the free space is equally A-OK with respect to > fragmentation, the other possibility is the *layout* of the blocks are > such that the order in which they are deleted using rm -rf ends up > being less friendly from a discard perspective. This can happen if > the directory hierarchy is big enough, and/or the journal size is > small enough, that the rm -rf requires multiple journal transactions > to complete. That's because with mount -o discard, we do the discards > after each transaction commit, and it might be that even though the > used blocks are perfectly contiguous, because of the order in which > the files end up getting deleted, we end up needing to discard them in > smaller chunks. > > For example, one could imagine a case where you have a million 4k > files, and they are allocated contiguously, but if you get > super-unlucky, such that in the first transaction you delete all of > the odd-numbered files, and in second transaction you delete all of > the even-numbered files, you might need to do a million 4k discards > --- but if all of the deletes could fit into a single transaction, you > would only need to do a single million block discard operation. > > Finally, you may want to consider whether or not mount -o discard > really makes sense or not. For most SSD's, especially high-end SSD's, > it probably doesn't make that much difference. That's because when > you overwrite a sector, the SSD knows (or should know; this might not > be some really cheap, crappy low-end flash devices; but on those > devices, discard might not be making uch of a difference anyway), that > the old contents of the sector is no longer needed. Hence an > overwrite effectively is an "implied discard". So long as there is a > sufficient number of free erase blocks, the SSD might be able to keep > up doing the GC for those "implied discards", and so accelerating the > process by sending explicit discards after every journal transaction > might not be necessary. Or, maybe it's sufficient to run "fstrim" > every week at Sunday 3am local time; or maybe even fstrim once a night > or fstrim once a month --- your mileage may vary. > > It's going to vary from SSD to SSD and from workload to workload, but > you might find that mount -o discard isn't buying you all that much > --- if you run a random write workload, and you don't notice any > performance degradation, and you don't notice an increase in the SSD's > write amplification numbers (if they are provided by your SSD), then > you might very well find that it's not worth it to use mount -o > discard. > > I personally don't bother using mount -o discard, and instead > periodically run fstrim, on my personal machines. Part of that is > because I'm mostly just reading and replying to emails, building > kernels and editing text files, and that is not nearly as stressful on > the FTL as a full-blown random write workload (for example, if you > were running a database supporting a transaction processing workload). The problem (IMHO) with "-o discard" is that if it is only trimming *blocks* that were deleted, these may be too small to effectively be processed by the underlying device (e.g. the "super-unlucky" example above where interleaved 4KB file deletes result in 1M separate 4KB trim requests to the device, even when the *space* that is freed by the unlinks could be handled with far fewer large trim requests. There was a discussion previously ("introduce EXT4_BG_WAS_TRIMMED ...") https://patchwork.ozlabs.org/project/linux-ext4/patch/1592831677-13945-1-git-send-email-wangshilong1991@gmail.com/ about leveraging the persistent EXT4_BG_WAS_TRIMMED flag in the group descriptors, and having "-o discard" only track trim on a per-group basis rather than its current mode of doing trim on a per-block basis, and then use the same code internally as fstrim to do a trim of free blocks in that block group. Using EXT4_BG_WAS_TRIMMED and tracking *groups* to be trimmed would be a bit more lazy than the current "-o discard" implementation, but would be more memory efficient, and also more efficient for the device (fewer, larger trim requests submitted). It would only need to track groups that have at least a reasonable amount of free space to be trimmed. If the group doesn't have enough free blocks to trim now, it will be checked again in the future when more blocks are freed. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-22 22:53 ` Andreas Dilger @ 2020-12-23 1:25 ` Matteo Croce 0 siblings, 0 replies; 13+ messages in thread From: Matteo Croce @ 2020-12-23 1:25 UTC (permalink / raw) To: Andreas Dilger; +Cc: Ext4, Wang Shilong, Theodore Y. Ts'o On Tue, Dec 22, 2020 at 11:53 PM Andreas Dilger <adilger@dilger.ca> wrote: > > On Dec 22, 2020, at 9:34 AM, Theodore Y. Ts'o <tytso@MIT.EDU> wrote: > > > > On Tue, Dec 22, 2020 at 03:59:29PM +0100, Matteo Croce wrote: > >> > >> I'm issuing sync + sleep(10) after the extraction, so the writes > >> should all be flushed. > >> Also, I repeated the test three times, with very similar results: > > > > So that means the problem is not due to page cache writeback > > interfering with the discards. So it's most likely that the problem > > is due to how the blocks are allocated and laid out when using > > data=ordered vs data=writeback. > > > > Some experiments to try next. After extracting the files with > > data=ordered and data=writeback on a freshly formatted file system, > > use "e2freefrag" to see how the free space is fragmented. This will > > tell us how the file system is doing from a holistic perspective, in > > terms of blocks allocated to the extracted files. (E2freefrag is > > showing you the blocks *not* allocated, of course, but that's a mirror > > image dual of the blocks that *are* allocated, especially if you start > > from an identical known state; hence the use of a freshly formatted > > file system.) > > > > Next, we can see how individual files look like with respect to > > fragmentation. This can be done via using filefrag on all of the > > files, e.g: > > > > find . -type f -print0 | xargs -0 filefrag > > > > Another way to get similar (although not identical) information is via > > running "e2fsck -E fragcheck" on a file system. How they differ is > > especially more of a big deal on ext3 file systems without extents and > > flex_bg, since filefrag tries to take into account metadata blocks > > such as indirect blocks and extent tree blocks, and e2fsck -E > > fragcheck does not; but it's good enough for getting a good gestalt > > for the files' overall fragmentation --- and note that as long as the > > average fragment size is at least a megabyte or two, some > > fragmentation really isn't that much of a problem from a real-world > > performance perspective. People can get way too invested in trying to > > get to perfection with 100% fragmentation-free files. The problem > > with doing this at the expense of all else is that you can end up > > making the overall free space fragmentation worse as the file system > > ages, at which point the file system performance really dives through > > the floor as the file system approaches 100%, or even 80-90% full, > > especially on HDD's. For SSD's fragmentation doesn't matter quite so > > much, unless the average fragment size is *really* small, and when you > > are discarded freed blocks. > > > > Even if the files are showing no substantial difference in > > fragmentation, and the free space is equally A-OK with respect to > > fragmentation, the other possibility is the *layout* of the blocks are > > such that the order in which they are deleted using rm -rf ends up > > being less friendly from a discard perspective. This can happen if > > the directory hierarchy is big enough, and/or the journal size is > > small enough, that the rm -rf requires multiple journal transactions > > to complete. That's because with mount -o discard, we do the discards > > after each transaction commit, and it might be that even though the > > used blocks are perfectly contiguous, because of the order in which > > the files end up getting deleted, we end up needing to discard them in > > smaller chunks. > > > > For example, one could imagine a case where you have a million 4k > > files, and they are allocated contiguously, but if you get > > super-unlucky, such that in the first transaction you delete all of > > the odd-numbered files, and in second transaction you delete all of > > the even-numbered files, you might need to do a million 4k discards > > --- but if all of the deletes could fit into a single transaction, you > > would only need to do a single million block discard operation. > > > > Finally, you may want to consider whether or not mount -o discard > > really makes sense or not. For most SSD's, especially high-end SSD's, > > it probably doesn't make that much difference. That's because when > > you overwrite a sector, the SSD knows (or should know; this might not > > be some really cheap, crappy low-end flash devices; but on those > > devices, discard might not be making uch of a difference anyway), that > > the old contents of the sector is no longer needed. Hence an > > overwrite effectively is an "implied discard". So long as there is a > > sufficient number of free erase blocks, the SSD might be able to keep > > up doing the GC for those "implied discards", and so accelerating the > > process by sending explicit discards after every journal transaction > > might not be necessary. Or, maybe it's sufficient to run "fstrim" > > every week at Sunday 3am local time; or maybe even fstrim once a night > > or fstrim once a month --- your mileage may vary. > > > > It's going to vary from SSD to SSD and from workload to workload, but > > you might find that mount -o discard isn't buying you all that much > > --- if you run a random write workload, and you don't notice any > > performance degradation, and you don't notice an increase in the SSD's > > write amplification numbers (if they are provided by your SSD), then > > you might very well find that it's not worth it to use mount -o > > discard. > > > > I personally don't bother using mount -o discard, and instead > > periodically run fstrim, on my personal machines. Part of that is > > because I'm mostly just reading and replying to emails, building > > kernels and editing text files, and that is not nearly as stressful on > > the FTL as a full-blown random write workload (for example, if you > > were running a database supporting a transaction processing workload). > > The problem (IMHO) with "-o discard" is that if it is only trimming > *blocks* that were deleted, these may be too small to effectively be > processed by the underlying device (e.g. the "super-unlucky" example > above where interleaved 4KB file deletes result in 1M separate 4KB > trim requests to the device, even when the *space* that is freed by > the unlinks could be handled with far fewer large trim requests. > > There was a discussion previously ("introduce EXT4_BG_WAS_TRIMMED ...") > > https://patchwork.ozlabs.org/project/linux-ext4/patch/1592831677-13945-1-git-send-email-wangshilong1991@gmail.com/ > > about leveraging the persistent EXT4_BG_WAS_TRIMMED flag in the group > descriptors, and having "-o discard" only track trim on a per-group > basis rather than its current mode of doing trim on a per-block basis, > and then use the same code internally as fstrim to do a trim of free > blocks in that block group. > > Using EXT4_BG_WAS_TRIMMED and tracking *groups* to be trimmed would be > a bit more lazy than the current "-o discard" implementation, but would > be more memory efficient, and also more efficient for the device (fewer, > larger trim requests submitted). It would only need to track groups > that have at least a reasonable amount of free space to be trimmed. If > the group doesn't have enough free blocks to trim now, it will be checked > again in the future when more blocks are freed. > Hi, I gave it a quick run refreshing it for 5.10, but it doesn't seem to help. Are there actions needed other than the patch itself? Regards, -- per aspera ad upstream ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-22 16:34 ` Theodore Y. Ts'o 2020-12-22 22:53 ` Andreas Dilger @ 2020-12-23 0:47 ` Matteo Croce 2020-12-23 18:12 ` Theodore Y. Ts'o 1 sibling, 1 reply; 13+ messages in thread From: Matteo Croce @ 2020-12-23 0:47 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: linux-ext4 On Tue, Dec 22, 2020 at 5:34 PM Theodore Y. Ts'o <tytso@mit.edu> wrote: > > On Tue, Dec 22, 2020 at 03:59:29PM +0100, Matteo Croce wrote: > > > > I'm issuing sync + sleep(10) after the extraction, so the writes > > should all be flushed. > > Also, I repeated the test three times, with very similar results: > > So that means the problem is not due to page cache writeback > interfering with the discards. So it's most likely that the problem > is due to how the blocks are allocated and laid out when using > data=ordered vs data=writeback. > > Some experiments to try next. After extracting the files with > data=ordered and data=writeback on a freshly formatted file system, > use "e2freefrag" to see how the free space is fragmented. This will > tell us how the file system is doing from a holistic perspective, in > terms of blocks allocated to the extracted files. (E2freefrag is > showing you the blocks *not* allocated, of course, but that's a mirror > image dual of the blocks that *are* allocated, especially if you start > from an identical known state; hence the use of a freshly formatted > file system.) > This is with data=ordered: # e2freefrag /dev/nvme0n1p1 Device: /dev/nvme0n1p1 Blocksize: 4096 bytes Total blocks: 468843350 Free blocks: 460922366 (98.3%) Min. free extent: 4 KB Max. free extent: 2064256 KB Avg. free extent: 1976084 KB Num. free extent: 933 # e2freefrag /dev/nvme0n1p1 Device: /dev/nvme0n1p1 Blocksize: 4096 bytes Total blocks: 468843350 Free blocks: 460922365 (98.3%) Min. free extent: 4 KB Max. free extent: 2064256 KB Avg. free extent: 1976084 KB Num. free extent: 933 HISTOGRAM OF FREE EXTENT SIZES: Extent Size Range : Free extents Free Blocks Percent 4K... 8K- : 1 1 0.00% 8K... 16K- : 2 5 0.00% 16K... 32K- : 1 7 0.00% 2M... 4M- : 3 2400 0.00% 32M... 64M- : 2 16384 0.00% 64M... 128M- : 11 267085 0.06% 128M... 256M- : 11 650037 0.14% 256M... 512M- : 3 314957 0.07% 512M... 1024M- : 7 1387580 0.30% 1G... 2G- : 892 458283909 99.43% and this data=writeback: # e2freefrag /dev/nvme0n1p1 Device: /dev/nvme0n1p1 Blocksize: 4096 bytes Total blocks: 468843350 Free blocks: 460922366 (98.3%) Min. free extent: 4 KB Max. free extent: 2064256 KB Avg. free extent: 1976084 KB Num. free extent: 933 HISTOGRAM OF FREE EXTENT SIZES: Extent Size Range : Free extents Free Blocks Percent 4K... 8K- : 1 1 0.00% 8K... 16K- : 2 5 0.00% 16K... 32K- : 1 7 0.00% 2M... 4M- : 3 2400 0.00% 32M... 64M- : 2 16384 0.00% 64M... 128M- : 11 267085 0.06% 128M... 256M- : 11 650038 0.14% 256M... 512M- : 3 314957 0.07% 512M... 1024M- : 7 1387580 0.30% 1G... 2G- : 892 458283909 99.43% > Next, we can see how individual files look like with respect to > fragmentation. This can be done via using filefrag on all of the > files, e.g: > > find . -type f -print0 | xargs -0 filefrag > data=ordered: # find /media -type f -print0 | xargs -0 filefrag |awk -F: '{print$2}' |sort |uniq -c 32 0 extents found 70570 1 extent found data=writeback: # find /media -type f -print0 | xargs -0 filefrag |awk -F: '{print$2}' |sort |uniq -c 32 0 extents found 70570 1 extent found > Another way to get similar (although not identical) information is via > running "e2fsck -E fragcheck" on a file system. How they differ is > especially more of a big deal on ext3 file systems without extents and > flex_bg, since filefrag tries to take into account metadata blocks > such as indirect blocks and extent tree blocks, and e2fsck -E > fragcheck does not; but it's good enough for getting a good gestalt > for the files' overall fragmentation > data=ordered: # e2fsck -fE fragcheck /dev/nvme0n1p1 e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes 69341844(d): expecting 277356746 actual extent phys 277356748 log 1 len 2 69342337(d): expecting 277356766 actual extent phys 277356768 log 1 len 2 69346374(d): expecting 277357037 actual extent phys 277357094 log 1 len 2 69469890(d): expecting 277880969 actual extent phys 277880975 log 1 len 2 69473971(d): expecting 277881215 actual extent phys 277881219 log 1 len 2 69606373(d): expecting 278405580 actual extent phys 278405581 log 1 len 2 69732356(d): expecting 278929541 actual extent phys 278929543 log 1 len 2 69868308(d): expecting 279454129 actual extent phys 279454245 log 1 len 2 69999150(d): expecting 279978430 actual extent phys 279978439 log 1 len 2 69999150(d): expecting 279978441 actual extent phys 279978457 log 3 len 1 69999150(d): expecting 279978458 actual extent phys 279978459 log 4 len 1 69999150(d): expecting 279978460 actual extent phys 279978502 log 5 len 1 69999150(d): expecting 279978503 actual extent phys 279978511 log 6 len 2 69999150(d): expecting 279978513 actual extent phys 279978517 log 8 len 1 70000685(d): expecting 279978520 actual extent phys 279978523 log 1 len 2 70124788(d): expecting 280502371 actual extent phys 280502381 log 1 len 2 70124788(d): expecting 280502383 actual extent phys 280502394 log 3 len 1 70124788(d): expecting 280502395 actual extent phys 280502399 log 4 len 1 70126301(d): expecting 280502445 actual extent phys 280502459 log 1 len 2 70127963(d): expecting 280502526 actual extent phys 280502528 log 1 len 2 70256678(d): expecting 281026905 actual extent phys 281026913 log 1 len 2 Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/nvme0n1p1: 75365/117211136 files (0.0% non-contiguous), 7920985/468843350 blocks data=writeback: # e2fsck -fE fragcheck /dev/nvme0n1p1 e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes 91755156(d): expecting 367009992 actual extent phys 367009994 log 1 len 2 91755649(d): expecting 367010012 actual extent phys 367010014 log 1 len 2 91759686(d): expecting 367010283 actual extent phys 367010340 log 1 len 2 91883202(d): expecting 367534217 actual extent phys 367534223 log 1 len 2 91887283(d): expecting 367534463 actual extent phys 367534467 log 1 len 2 92019685(d): expecting 368058828 actual extent phys 368058829 log 1 len 2 92145668(d): expecting 368582789 actual extent phys 368582791 log 1 len 2 92281620(d): expecting 369107377 actual extent phys 369107493 log 1 len 2 92412462(d): expecting 369631678 actual extent phys 369631687 log 1 len 2 92412462(d): expecting 369631689 actual extent phys 369631705 log 3 len 1 92412462(d): expecting 369631706 actual extent phys 369631707 log 4 len 1 92412462(d): expecting 369631708 actual extent phys 369631757 log 5 len 1 92412462(d): expecting 369631758 actual extent phys 369631759 log 6 len 2 92412462(d): expecting 369631761 actual extent phys 369631766 log 8 len 1 92413997(d): expecting 369631768 actual extent phys 369631771 log 1 len 2 92538100(d): expecting 370155619 actual extent phys 370155629 log 1 len 2 92538100(d): expecting 370155631 actual extent phys 370155642 log 3 len 1 92538100(d): expecting 370155643 actual extent phys 370155647 log 4 len 1 92539613(d): expecting 370155693 actual extent phys 370155707 log 1 len 2 92541275(d): expecting 370155774 actual extent phys 370155776 log 1 len 2 92669990(d): expecting 370680153 actual extent phys 370680161 log 1 len 2 Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/nvme0n1p1: 75365/117211136 files (0.0% non-contiguous), 7920984/468843350 blocks As an extra test I extracted the archive with data=ordered, remounted with data=writeback and timed the rm -rf and viceversa. The mount option is the one that counts, the one using during extraction doesn't matter. As extra extra test I also tried data=journal, which is as fast as ordered. > Even if the files are showing no substantial difference in > fragmentation, and the free space is equally A-OK with respect to > fragmentation, the other possibility is the *layout* of the blocks are > such that the order in which they are deleted using rm -rf ends up > being less friendly from a discard perspective. This can happen if > the directory hierarchy is big enough, and/or the journal size is > small enough, that the rm -rf requires multiple journal transactions > to complete. That's because with mount -o discard, we do the discards > after each transaction commit, and it might be that even though the > used blocks are perfectly contiguous, because of the order in which > the files end up getting deleted, we end up needing to discard them in > smaller chunks. > > For example, one could imagine a case where you have a million 4k > files, and they are allocated contiguously, but if you get > super-unlucky, such that in the first transaction you delete all of > the odd-numbered files, and in second transaction you delete all of > the even-numbered files, you might need to do a million 4k discards > --- but if all of the deletes could fit into a single transaction, you > would only need to do a single million block discard operation. > > Finally, you may want to consider whether or not mount -o discard > really makes sense or not. For most SSD's, especially high-end SSD's, > it probably doesn't make that much difference. That's because when > you overwrite a sector, the SSD knows (or should know; this might not > be some really cheap, crappy low-end flash devices; but on those > devices, discard might not be making uch of a difference anyway), that > the old contents of the sector is no longer needed. Hence an > overwrite effectively is an "implied discard". So long as there is a > sufficient number of free erase blocks, the SSD might be able to keep > up doing the GC for those "implied discards", and so accelerating the > process by sending explicit discards after every journal transaction > might not be necessary. Or, maybe it's sufficient to run "fstrim" > every week at Sunday 3am local time; or maybe even fstrim once a night > or fstrim once a month --- your mileage may vary. > > It's going to vary from SSD to SSD and from workload to workload, but > you might find that mount -o discard isn't buying you all that much > --- if you run a random write workload, and you don't notice any > performance degradation, and you don't notice an increase in the SSD's > write amplification numbers (if they are provided by your SSD), then > you might very well find that it's not worth it to use mount -o > discard. > > I personally don't bother using mount -o discard, and instead > periodically run fstrim, on my personal machines. Part of that is > because I'm mostly just reading and replying to emails, building > kernels and editing text files, and that is not nearly as stressful on > the FTL as a full-blown random write workload (for example, if you > were running a database supporting a transaction processing workload). > That's what I'm doing locally, I issue a fstrim from time to time. But I found discard useful in QEMU guests because latest virtio-blk will punch holes in the host and save space. Cheers, -- per aspera ad upstream ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-23 0:47 ` Matteo Croce @ 2020-12-23 18:12 ` Theodore Y. Ts'o 2020-12-23 18:59 ` Matteo Croce 0 siblings, 1 reply; 13+ messages in thread From: Theodore Y. Ts'o @ 2020-12-23 18:12 UTC (permalink / raw) To: Matteo Croce; +Cc: linux-ext4 On Wed, Dec 23, 2020 at 01:47:33AM +0100, Matteo Croce wrote: > As an extra test I extracted the archive with data=ordered, remounted > with data=writeback and timed the rm -rf and viceversa. > The mount option is the one that counts, the one using during > extraction doesn't matter. Hmm... that's really surprising. At this point, the only thing I can suggest is to try using blktrace to see what's going on at the block layer when the I/O's and discard requests are being submitted. If there are no dirty blocks in the page cache, I don't see how data=ordered vs data=writeback would make a difference to how mount -o discard processing would take place. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-23 18:12 ` Theodore Y. Ts'o @ 2020-12-23 18:59 ` Matteo Croce 2020-12-24 3:16 ` Theodore Y. Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Matteo Croce @ 2020-12-23 18:59 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: Ext4 On Wed, Dec 23, 2020 at 7:12 PM Theodore Y. Ts'o <tytso@mit.edu> wrote: > > On Wed, Dec 23, 2020 at 01:47:33AM +0100, Matteo Croce wrote: > > As an extra test I extracted the archive with data=ordered, remounted > > with data=writeback and timed the rm -rf and viceversa. > > The mount option is the one that counts, the one using during > > extraction doesn't matter. > > Hmm... that's really surprising. At this point, the only thing I can > suggest is to try using blktrace to see what's going on at the block > layer when the I/O's and discard requests are being submitted. If > there are no dirty blocks in the page cache, I don't see how > data=ordered vs data=writeback would make a difference to how mount -o > discard processing would take place. > Hi, these are the blktrace outputs for both journaling modes: # dmesg |grep EXT4-fs |tail -1 [ 1594.829833] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: data=ordered,discard # blktrace /dev/nvme0n1 & sleep 1 ; time rm -rf /media/linux-5.10/ ; kill $! [1] 3032 real 0m1.328s user 0m0.063s sys 0m1.231s # === nvme0n1 === CPU 0: 0 events, 0 KiB data CPU 1: 0 events, 0 KiB data CPU 2: 0 events, 0 KiB data CPU 3: 1461 events, 69 KiB data CPU 4: 1 events, 1 KiB data CPU 5: 0 events, 0 KiB data CPU 6: 0 events, 0 KiB data CPU 7: 0 events, 0 KiB data Total: 1462 events (dropped 0), 69 KiB data # dmesg |grep EXT4-fs |tail -1 [ 1734.837651] EXT4-fs (nvme0n1p1): mounted filesystem with writeback data mode. Opts: data=writeback,discard # blktrace /dev/nvme0n1 & sleep 1 ; time rm -rf /media/linux-5.10/ ; kill $! [1] 3069 real 1m30.273s user 0m0.139s sys 0m3.084s # === nvme0n1 === CPU 0: 133830 events, 6274 KiB data CPU 1: 21878 events, 1026 KiB data CPU 2: 46365 events, 2174 KiB data CPU 3: 98116 events, 4600 KiB data CPU 4: 290902 events, 13637 KiB data CPU 5: 10926 events, 513 KiB data CPU 6: 76861 events, 3603 KiB data CPU 7: 17855 events, 837 KiB data Total: 696733 events (dropped 0), 32660 KiB data Cheers, -- per aspera ad upstream ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-23 18:59 ` Matteo Croce @ 2020-12-24 3:16 ` Theodore Y. Ts'o 2020-12-24 10:53 ` Matteo Croce 0 siblings, 1 reply; 13+ messages in thread From: Theodore Y. Ts'o @ 2020-12-24 3:16 UTC (permalink / raw) To: Matteo Croce; +Cc: Ext4 On Wed, Dec 23, 2020 at 07:59:13PM +0100, Matteo Croce wrote: > > Hi, > > these are the blktrace outputs for both journaling modes: Can you send me full trace files (or the outputs of blkparse) so we can see what's going on at a somewhat more granular detail? They'll be huge, so you may need to make them available for download from a web server; certainly the vger.kernel.org list server isn't going to let an attachment that large through. Thanks, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-24 3:16 ` Theodore Y. Ts'o @ 2020-12-24 10:53 ` Matteo Croce 0 siblings, 0 replies; 13+ messages in thread From: Matteo Croce @ 2020-12-24 10:53 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: Ext4 On Thu, Dec 24, 2020 at 4:16 AM Theodore Y. Ts'o <tytso@mit.edu> wrote: > > On Wed, Dec 23, 2020 at 07:59:13PM +0100, Matteo Croce wrote: > > > > Hi, > > > > these are the blktrace outputs for both journaling modes: > > Can you send me full trace files (or the outputs of blkparse) so we > can see what's going on at a somewhat more granular detail? > > They'll be huge, so you may need to make them available for download > from a web server; certainly the vger.kernel.org list server isn't > going to let an attachment that large through. > Hi, I've created a GDrive link, it should work for everyone: https://drive.google.com/file/d/1b35hzgUMSnNBZeMNhooFk4rACpNvCZuQ/view?usp=sharing Cheers, -- per aspera ad upstream ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <CGME20201229054143epcms2p15ae3cce43bb3c503adf94528f354ba78@epcms2p1>]
* Re: discard and data=writeback [not found] <CGME20201229054143epcms2p15ae3cce43bb3c503adf94528f354ba78@epcms2p1> @ 2020-12-29 5:41 ` Daejun Park 2020-12-29 13:42 ` Matteo Croce 0 siblings, 1 reply; 13+ messages in thread From: Daejun Park @ 2020-12-29 5:41 UTC (permalink / raw) To: mcroce@linux.microsoft.com, tytso@mit.edu; +Cc: linux-ext4@vger.kernel.org Hi, > # dmesg |grep EXT4-fs |tail -1 > [ 1594.829833] EXT4-fs (nvme0n1p1): mounted filesystem with ordered > data mode. Opts: data=ordered,discard > # blktrace /dev/nvme0n1 & sleep 1 ; time rm -rf /media/linux-5.10/ ; kill $! > [1] 3032 > > real 0m1.328s > user 0m0.063s > sys 0m1.231s > # === nvme0n1 === > CPU 0: 0 events, 0 KiB data > CPU 1: 0 events, 0 KiB data > CPU 2: 0 events, 0 KiB data > CPU 3: 1461 events, 69 KiB data > CPU 4: 1 events, 1 KiB data > CPU 5: 0 events, 0 KiB data > CPU 6: 0 events, 0 KiB data > CPU 7: 0 events, 0 KiB data > Total: 1462 events (dropped 0), 69 KiB data > > > # dmesg |grep EXT4-fs |tail -1 > [ 1734.837651] EXT4-fs (nvme0n1p1): mounted filesystem with writeback > data mode. Opts: data=writeback,discard > # blktrace /dev/nvme0n1 & sleep 1 ; time rm -rf /media/linux-5.10/ ; kill $! > [1] 3069 > > real 1m30.273s > user 0m0.139s > sys 0m3.084s > # === nvme0n1 === > CPU 0: 133830 events, 6274 KiB data > CPU 1: 21878 events, 1026 KiB data > CPU 2: 46365 events, 2174 KiB data > CPU 3: 98116 events, 4600 KiB data > CPU 4: 290902 events, 13637 KiB data > CPU 5: 10926 events, 513 KiB data > CPU 6: 76861 events, 3603 KiB data > CPU 7: 17855 events, 837 KiB data > Total: 696733 events (dropped 0), 32660 KiB data > In this result, there is few IO in ordered mode. As I understand (please correct this if I am wrong), with writeback + discard, ext4_issue_discard is called immediately at each rm command. However, with ordered mode, ext4_issue_discard is called when end of committing a transaction to pace with the corresponding transaction. It means, they are not discarded yet. Even with ordered mode, if sync is called after rm command, ext4_issue_discard can be called due to transaction commit. So, I think you will get similar results form writeback mode with sync command. Thanks, Daejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: discard and data=writeback 2020-12-29 5:41 ` Daejun Park @ 2020-12-29 13:42 ` Matteo Croce 0 siblings, 0 replies; 13+ messages in thread From: Matteo Croce @ 2020-12-29 13:42 UTC (permalink / raw) To: daejun7.park; +Cc: tytso@mit.edu, linux-ext4@vger.kernel.org On Tue, Dec 29, 2020 at 6:41 AM Daejun Park <daejun7.park@samsung.com> wrote: > > Hi, > > > # dmesg |grep EXT4-fs |tail -1 > > [ 1594.829833] EXT4-fs (nvme0n1p1): mounted filesystem with ordered > > data mode. Opts: data=ordered,discard > > # blktrace /dev/nvme0n1 & sleep 1 ; time rm -rf /media/linux-5.10/ ; kill $! > > [1] 3032 > > > > real 0m1.328s > > user 0m0.063s > > sys 0m1.231s > > # === nvme0n1 === > > CPU 0: 0 events, 0 KiB data > > CPU 1: 0 events, 0 KiB data > > CPU 2: 0 events, 0 KiB data > > CPU 3: 1461 events, 69 KiB data > > CPU 4: 1 events, 1 KiB data > > CPU 5: 0 events, 0 KiB data > > CPU 6: 0 events, 0 KiB data > > CPU 7: 0 events, 0 KiB data > > Total: 1462 events (dropped 0), 69 KiB data > > > > > > # dmesg |grep EXT4-fs |tail -1 > > [ 1734.837651] EXT4-fs (nvme0n1p1): mounted filesystem with writeback > > data mode. Opts: data=writeback,discard > > # blktrace /dev/nvme0n1 & sleep 1 ; time rm -rf /media/linux-5.10/ ; kill $! > > [1] 3069 > > > > real 1m30.273s > > user 0m0.139s > > sys 0m3.084s > > # === nvme0n1 === > > CPU 0: 133830 events, 6274 KiB data > > CPU 1: 21878 events, 1026 KiB data > > CPU 2: 46365 events, 2174 KiB data > > CPU 3: 98116 events, 4600 KiB data > > CPU 4: 290902 events, 13637 KiB data > > CPU 5: 10926 events, 513 KiB data > > CPU 6: 76861 events, 3603 KiB data > > CPU 7: 17855 events, 837 KiB data > > Total: 696733 events (dropped 0), 32660 KiB data > > > > In this result, there is few IO in ordered mode. > > As I understand (please correct this if I am wrong), with writeback + > discard, ext4_issue_discard is called immediately at each rm command. > However, with ordered mode, ext4_issue_discard is called when end of > committing a transaction to pace with the corresponding transaction. > It means, they are not discarded yet. > > Even with ordered mode, if sync is called after rm command, > ext4_issue_discard can be called due to transaction commit. > So, I think you will get similar results form writeback mode with sync > command. > Hi, that's what I get with data=ordered if I issue a sync after the removal: # time rm -rf /media/linux-5.10/ ; sync ; kill $! real 0m1.569s user 0m0.044s sys 0m1.508s # === nvme0n1 === CPU 0: 10980 events, 515 KiB data CPU 1: 0 events, 0 KiB data CPU 2: 0 events, 0 KiB data CPU 3: 26 events, 2 KiB data CPU 4: 3601 events, 169 KiB data CPU 5: 0 events, 0 KiB data CPU 6: 21786 events, 1022 KiB data CPU 7: 0 events, 0 KiB data Total: 36393 events (dropped 0), 1706 KiB data Still way less transactions than writeback. Cheers, -- per aspera ad upstream ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-12-29 13:43 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-18 18:40 discard and data=writeback Matteo Croce
2020-12-21 3:04 ` Theodore Y. Ts'o
2020-12-22 14:59 ` Matteo Croce
2020-12-22 16:34 ` Theodore Y. Ts'o
2020-12-22 22:53 ` Andreas Dilger
2020-12-23 1:25 ` Matteo Croce
2020-12-23 0:47 ` Matteo Croce
2020-12-23 18:12 ` Theodore Y. Ts'o
2020-12-23 18:59 ` Matteo Croce
2020-12-24 3:16 ` Theodore Y. Ts'o
2020-12-24 10:53 ` Matteo Croce
[not found] <CGME20201229054143epcms2p15ae3cce43bb3c503adf94528f354ba78@epcms2p1>
2020-12-29 5:41 ` Daejun Park
2020-12-29 13:42 ` Matteo Croce
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).