On 2022/3/7 10:39, Qu Wenruo wrote: > > > On 2022/3/7 10:23, Jan Ziak wrote: >> On Mon, Mar 7, 2022 at 1:48 AM Qu Wenruo wrote: >>> On 2022/3/6 23:59, Jan Ziak wrote: >>>> I would like to report that btrfs in Linux kernel 5.16.12 mounted with >>>> the autodefrag option wrote 5TB in a single day to a 1TB SSD that is >>>> about 50% full. >>>> >>>> Defragmenting 0.5TB on a drive that is 50% full should write far >>>> less than 5TB. >>> >>> If using defrag ioctl, that's a good and solid expectation. >>> >>> Autodefrag will mark any file which got smaller writes (<64K) for scan. >>> For smaller extents than 64K, they will be re-dirtied for writeback. >> >> The NVMe device has 512-byte sectors, but has another namespace with >> 4K sectors. Will it help btrfs-autodefrag to reformat the drive to 4K >> sectors? I expect that it won't help - I am asking just in case my >> expectation is wrong. > > The minimal sector size of btrfs is 4K, so I don't believe it would > cause any difference. > >> >>> So in theory, if the cleaner is triggered very frequently to do >>> autodefrag, it can indeed easily amplify the writes. >> >> According to usr/bin/glances, the sqlite app is writing less than 1 MB >> per second to the NVMe device. btrfs's autodefrag write amplification >> is from the 1 MB/s to approximately 200 MB/s. > > This is definitely something wrong. > > Autodefrag by default should only get triggered every 300s, thus even > all new bytes are re-dirtied, it should only cause a less than 300M > write burst every 300s, not a consistent write. > >> >>> Are you using commit= mount option? Which would reduce the commit >>> interval thus trigger autodefrag more frequently. >> >> I am not using commit= mount option. >> >>>> CPU utilization on an otherwise idle machine is approximately 600% all >>>> the time: btrfs-cleaner 100%, kworkers...btrfs 500%. >>> >>> The problem is why the CPU usage is at 100% for cleaner. >>> >>> Would you please apply this patch on your kernel? >>> https://patchwork.kernel.org/project/linux-btrfs/patch/bf2635d213e0c85251c4cd0391d8fbf274d7d637.1645705266.git.wqu@suse.com/ >>> >>> >>> Then enable the following trace events... >> >> I will try to apply the patch, collect the events and post the >> results. First, I will wait for the sqlite file to gain about 1 >> million extents, which shouldn't take too long. > > Thank you very much for the future trace events log. > > That would be the determining data for us to solve it. Forgot to mention that, that patch itself relies on refactors in the previous patches. Thus you may want to apply the whole patchset. Or use the attached diff which I manually backported for v5.16.12. Thanks, Qu > >> >> ---- >> >> BTW: "compsize file-with-million-extents" finishes in 0.2 seconds >> (uses BTRFS_IOC_TREE_SEARCH_V2 ioctl), but "filefrag >> file-with-million-extents" doesn't finish even after several minutes >> of time (uses FS_IOC_FIEMAP ioctl - manages to perform only about 5 >> ioctl syscalls per second - and appears to be slowing down as the >> value of the "fm_start" ioctl argument grows; e2fsprogs version >> 1.46.5). It would be nice if filefrag was faster than just a few >> ioctls per second. > > This is mostly a race with autodefrag. > > Both are using file extent map, thus if autodefrag is still trying to > redirty the file again and again, it would definitely cause problems for > anything also using file extent map. > > Thanks, > Qu >> >> ---- >> >> Sincerely >> Jan