* s_bmap and flags explanation [not found] <1586129076.70820212.1659538177737.JavaMail.zimbra@ijclab.in2p3.fr> @ 2022-08-03 14:56 ` Emmanouil Vamvakopoulos 2022-08-03 15:54 ` Carlos Maiolino 2022-08-03 21:59 ` Dave Chinner 0 siblings, 2 replies; 6+ messages in thread From: Emmanouil Vamvakopoulos @ 2022-08-03 14:56 UTC (permalink / raw) To: linux-xfs Hello developers It is possible to explain the FLAGS field in xfs_bmap output of a file EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 49700520968..49700520975 30 (8..15) 8 001111 1: [8..4175871]: 49708756480..49712932343 30 (8235520..12411383) 4175864 000111 2: [4175872..19976191]: 49715788288..49731588607 30 (15267328..31067647) 15800320 000011 3: [19976192..25153535]: 49731588608..49736765951 30 (31067648..36244991) 5177344 000011 4: [25153536..41930743]: 49767625216..49784402423 30 (67104256..83881463) 16777208 000111 5: [41930744..58707951]: 49784402424..49801179631 30 (83881464..100658671) 16777208 001111 6: [58707952..58959935]: 49801179632..49801431615 30 (100658672..100910655) 251984 001111 7: [58959936..75485159]: 49801431616..49817956839 30 (100910656..117435879) 16525224 001111 with [disk06]# du -sh ./00000869/014886f4 36G ./00000869/014886f4 [disk06]# du -sh --apparent-size ./00000869/014886f4 29G ./00000869/014886f4 I try to understand if this file contains unused externs and how those file are created like this (if we assume that the free space was not fragmented ) we are running CentOS Stream release 8 with 4.18.0-383.el8.x86_64 if I defrag the file above the difference bewteen apparent size and size with du disappered ! thank you in advance best e.v. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: s_bmap and flags explanation 2022-08-03 14:56 ` s_bmap and flags explanation Emmanouil Vamvakopoulos @ 2022-08-03 15:54 ` Carlos Maiolino 2022-08-03 21:59 ` Dave Chinner 1 sibling, 0 replies; 6+ messages in thread From: Carlos Maiolino @ 2022-08-03 15:54 UTC (permalink / raw) To: Emmanouil Vamvakopoulos; +Cc: linux-xfs On Wed, Aug 03, 2022 at 04:56:43PM +0200, Emmanouil Vamvakopoulos wrote: > > > Hello developers > > It is possible to explain the FLAGS field in xfs_bmap output of a file Flags bits for each extent: FLG_SHARED 0100000 /* shared extent */ FLG_PRE 0010000 /* Unwritten extent */ FLG_BSU 0001000 /* Not on begin of stripe unit */ FLG_ESU 0000100 /* Not on end of stripe unit */ FLG_BSW 0000010 /* Not on begin of stripe width */ FLG_ESW 0000001 /* Not on end of stripe width */ > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 49700520968..49700520975 30 (8..15) 8 001111 > 1: [8..4175871]: 49708756480..49712932343 30 (8235520..12411383) 4175864 000111 > 2: [4175872..19976191]: 49715788288..49731588607 30 (15267328..31067647) 15800320 000011 > 3: [19976192..25153535]: 49731588608..49736765951 30 (31067648..36244991) 5177344 000011 > 4: [25153536..41930743]: 49767625216..49784402423 30 (67104256..83881463) 16777208 000111 > 5: [41930744..58707951]: 49784402424..49801179631 30 (83881464..100658671) 16777208 001111 > 6: [58707952..58959935]: 49801179632..49801431615 30 (100658672..100910655) 251984 001111 > 7: [58959936..75485159]: 49801431616..49817956839 30 (100910656..117435879) 16525224 001111 > Disclaimer: I am not sure exactly how du accounts for --aparent-size. Said that, xfs_bmap shows you the current block mapping of the file you mentioned using 512 blocks. According to the mapping above, this file is mapped into 75485160 512-byte blocks, so: (75485160*512)/(1024**3) = 35.99 > [disk06]# du -sh ./00000869/014886f4 > 36G ./00000869/014886f4 Matching the size here. > [disk06]# du -sh --apparent-size ./00000869/014886f4 > 29G ./00000869/014886f4 According to du's man page: --apparent-size print apparent sizes rather than device usage; although the apparent size is usually smaller, it may be larger due to holes in ('sparse') files, internal fragmentation, indirect blocks, and the like Giving the stripe misalignment flags set on all the extents I'd say this is the main reason for why --apparent-size differs so much, if the writes being done to the file are not stripe aligned, each stripe unity might be wasting some space. > > I try to understand if this file contains unused externs > and how those file are created like this (if we assume that the free space was not fragmented ) Maybe your FS is on top of a striped volume and the FS itself is not configured with the correct unity/width? This is a guess btw, I may very well be wrong and it be related to something else :) > if I defrag the file above the difference bewteen apparent size and size with du disappered ! -- Carlos Maiolino ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: s_bmap and flags explanation 2022-08-03 14:56 ` s_bmap and flags explanation Emmanouil Vamvakopoulos 2022-08-03 15:54 ` Carlos Maiolino @ 2022-08-03 21:59 ` Dave Chinner 2022-08-04 10:25 ` Emmanouil Vamvakopoulos 1 sibling, 1 reply; 6+ messages in thread From: Dave Chinner @ 2022-08-03 21:59 UTC (permalink / raw) To: Emmanouil Vamvakopoulos; +Cc: linux-xfs On Wed, Aug 03, 2022 at 04:56:43PM +0200, Emmanouil Vamvakopoulos wrote: > > > Hello developers > > It is possible to explain the FLAGS field in xfs_bmap output of a file > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 49700520968..49700520975 30 (8..15) 8 001111 > 1: [8..4175871]: 49708756480..49712932343 30 (8235520..12411383) 4175864 000111 > 2: [4175872..19976191]: 49715788288..49731588607 30 (15267328..31067647) 15800320 000011 > 3: [19976192..25153535]: 49731588608..49736765951 30 (31067648..36244991) 5177344 000011 > 4: [25153536..41930743]: 49767625216..49784402423 30 (67104256..83881463) 16777208 000111 > 5: [41930744..58707951]: 49784402424..49801179631 30 (83881464..100658671) 16777208 001111 > 6: [58707952..58959935]: 49801179632..49801431615 30 (100658672..100910655) 251984 001111 > 7: [58959936..75485159]: 49801431616..49817956839 30 (100910656..117435879) 16525224 001111 $ man xfs_bmap ..... -v Shows verbose information. When this flag is specified, additional AG specific information is appended to each line in the following form: agno (startagoffset..endagoffset) nblocks flags A second -v option will print out the flags legend. ..... So: $ xfs_bmap -vvp foo foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 440138672..440138679 4 (687024..687031) 8 000000 FLAG Values: 0100000 Shared extent 0010000 Unwritten preallocated extent 0001000 Doesn't begin on stripe unit 0000100 Doesn't end on stripe unit 0000010 Doesn't begin on stripe width 0000001 Doesn't end on stripe width And there's what the flags mean. > with > > [disk06]# du -sh ./00000869/014886f4 > 36G ./00000869/014886f4 > [disk06]# du -sh --apparent-size ./00000869/014886f4 > 29G ./00000869/014886f4 > > I try to understand if this file contains unused externs > and how those file are created like this (if we assume that the free space was not fragmented ) > > we are running CentOS Stream release 8 with 4.18.0-383.el8.x86_64 > > if I defrag the file above the difference bewteen apparent size and size with du disappered ! It will be a result of speculative preallocation beyond EOF as the file is grown to ensure it doesn't get fragmented badly. Files in the size range of tens of GB or larger will have preallocation extend out to 8GB beyond EOF. It will get removed when the inode is reclaimed from memory (i.e. no longer in active use). Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: s_bmap and flags explanation 2022-08-03 21:59 ` Dave Chinner @ 2022-08-04 10:25 ` Emmanouil Vamvakopoulos 2022-08-04 13:30 ` Carlos Maiolino 2022-08-04 22:55 ` Dave Chinner 0 siblings, 2 replies; 6+ messages in thread From: Emmanouil Vamvakopoulos @ 2022-08-04 10:25 UTC (permalink / raw) To: Dave Chinner, >; +Cc: linux-xfs hello Carlos and Dave thank you for the replies a) for the mismatch in alignment bewteen xfs and underlying raid volume I have to re-check but from preliminary tests , when I mount the partition with a static allocsize ( e.g. allocsize=256k) we have large file with large number of externs ( up to 40) but the sizes from du was comparable. b) for the speculative preallocation beyond EOF of my files as I understood have to run xfs_fsr to get the space back. but why the inodes of those files remains dirty at least for 300 sec after the closing of the file and lost the automatic removal of the preallocation ? we are runing on CentOS Stream release 8 with 4.18.0-383.el8.x86_64 but we never see something simliar on CentOS Linux release 7.9.2009 (Core) with 3.10.0-1160.45.1.el7.x86_64 (for similar pattern of file sizes, but truly with different distributed strorage application) thank you in advance best e.v. ----- Original Message ----- From: "Dave Chinner" <david@fromorbit.com> To: "emmanouil vamvakopoulos" <emmanouil.vamvakopoulos@ijclab.in2p3.fr> Cc: "linux-xfs" <linux-xfs@vger.kernel.org> Sent: Wednesday, 3 August, 2022 23:59:09 Subject: Re: s_bmap and flags explanation On Wed, Aug 03, 2022 at 04:56:43PM +0200, Emmanouil Vamvakopoulos wrote: > > > Hello developers > > It is possible to explain the FLAGS field in xfs_bmap output of a file > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 49700520968..49700520975 30 (8..15) 8 001111 > 1: [8..4175871]: 49708756480..49712932343 30 (8235520..12411383) 4175864 000111 > 2: [4175872..19976191]: 49715788288..49731588607 30 (15267328..31067647) 15800320 000011 > 3: [19976192..25153535]: 49731588608..49736765951 30 (31067648..36244991) 5177344 000011 > 4: [25153536..41930743]: 49767625216..49784402423 30 (67104256..83881463) 16777208 000111 > 5: [41930744..58707951]: 49784402424..49801179631 30 (83881464..100658671) 16777208 001111 > 6: [58707952..58959935]: 49801179632..49801431615 30 (100658672..100910655) 251984 001111 > 7: [58959936..75485159]: 49801431616..49817956839 30 (100910656..117435879) 16525224 001111 $ man xfs_bmap ..... -v Shows verbose information. When this flag is specified, additional AG specific information is appended to each line in the following form: agno (startagoffset..endagoffset) nblocks flags A second -v option will print out the flags legend. ..... So: $ xfs_bmap -vvp foo foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 440138672..440138679 4 (687024..687031) 8 000000 FLAG Values: 0100000 Shared extent 0010000 Unwritten preallocated extent 0001000 Doesn't begin on stripe unit 0000100 Doesn't end on stripe unit 0000010 Doesn't begin on stripe width 0000001 Doesn't end on stripe width And there's what the flags mean. > with > > [disk06]# du -sh ./00000869/014886f4 > 36G ./00000869/014886f4 > [disk06]# du -sh --apparent-size ./00000869/014886f4 > 29G ./00000869/014886f4 > > I try to understand if this file contains unused externs > and how those file are created like this (if we assume that the free space was not fragmented ) > > we are running CentOS Stream release 8 with 4.18.0-383.el8.x86_64 > > if I defrag the file above the difference bewteen apparent size and size with du disappered ! It will be a result of speculative preallocation beyond EOF as the file is grown to ensure it doesn't get fragmented badly. Files in the size range of tens of GB or larger will have preallocation extend out to 8GB beyond EOF. It will get removed when the inode is reclaimed from memory (i.e. no longer in active use). Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: s_bmap and flags explanation 2022-08-04 10:25 ` Emmanouil Vamvakopoulos @ 2022-08-04 13:30 ` Carlos Maiolino 2022-08-04 22:55 ` Dave Chinner 1 sibling, 0 replies; 6+ messages in thread From: Carlos Maiolino @ 2022-08-04 13:30 UTC (permalink / raw) To: Emmanouil Vamvakopoulos; +Cc: Dave Chinner, linux-xfs Hi. On Thu, Aug 04, 2022 at 12:25:31PM +0200, Emmanouil Vamvakopoulos wrote: > hello Carlos and Dave > > thank you for the replies > > a) for the mismatch in alignment bewteen xfs and underlying raid volume I have to re-check > but from preliminary tests , when I mount the partition with a static allocsize ( e.g. allocsize=256k) > we have large file with large number of externs ( up to 40) but the sizes from du was comparable. allocsize mount option controls the EOF preallocation size, which, by default is dynamic, so, you just fixed it to a small size, and it might well be the reason why you ended up with so many extents, as the main goal of speculative preallocation is to try to reduce fragmentation by creating bigger extents, and as Dave mentioned, the extra space will be removed after file is closed. I'm not the best to explain details on speculative preallocation, but I suppose you're seeing a closer size report from du modes due the smaller preallocated space, even though you have more extents, the extra preallocated space is still very small. > > b) for the speculative preallocation beyond EOF of my files as I understood have to run xfs_fsr to get the space back. No, speculative preallocation is dynamically removed. > > but why the inodes of those files remains dirty at least for 300 sec after the closing of the file and lost the automatic removal of the preallocation ? > IIRC, speculative preallocated blocks can be kept around even after the file is closed, I believe append-only files are one example of that, where the speculative preallocated blocks will be kept after a file is closed. But I don't have a deep knowledge on the speculative prealloc algorithm to give more details. But I'm pretty sure it's tied up with the file's write patterns, maybe you can describe more how this file is written to? > we are runing on CentOS Stream release 8 with 4.18.0-383.el8.x86_64 > > but we never see something simliar on CentOS Linux release 7.9.2009 (Core) with 3.10.0-1160.45.1.el7.x86_64 > (for similar pattern of file sizes, but truly with different distributed strorage application) > That's a question more for the distribution not for the upstream project =/ unlikely somebody will remember what changed between 3.10 and 4.18 and also what the distribution backported (or not). > > > > ----- Original Message ----- > From: "Dave Chinner" <david@fromorbit.com> > To: "emmanouil vamvakopoulos" <emmanouil.vamvakopoulos@ijclab.in2p3.fr> > Cc: "linux-xfs" <linux-xfs@vger.kernel.org> > Sent: Wednesday, 3 August, 2022 23:59:09 > Subject: Re: s_bmap and flags explanation > > On Wed, Aug 03, 2022 at 04:56:43PM +0200, Emmanouil Vamvakopoulos wrote: > > > > > > Hello developers > > > > It is possible to explain the FLAGS field in xfs_bmap output of a file > > > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > > 0: [0..7]: 49700520968..49700520975 30 (8..15) 8 001111 > > 1: [8..4175871]: 49708756480..49712932343 30 (8235520..12411383) 4175864 000111 > > 2: [4175872..19976191]: 49715788288..49731588607 30 (15267328..31067647) 15800320 000011 > > 3: [19976192..25153535]: 49731588608..49736765951 30 (31067648..36244991) 5177344 000011 > > 4: [25153536..41930743]: 49767625216..49784402423 30 (67104256..83881463) 16777208 000111 > > 5: [41930744..58707951]: 49784402424..49801179631 30 (83881464..100658671) 16777208 001111 > > 6: [58707952..58959935]: 49801179632..49801431615 30 (100658672..100910655) 251984 001111 > > 7: [58959936..75485159]: 49801431616..49817956839 30 (100910656..117435879) 16525224 001111 > > $ man xfs_bmap > ..... > -v Shows verbose information. When this flag is specified, > additional AG specific information is appended to each > line in the following form: > > agno (startagoffset..endagoffset) nblocks flags > > A second -v option will print out the flags legend. > ..... > > So: > > $ xfs_bmap -vvp foo > foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 440138672..440138679 4 (687024..687031) 8 000000 > FLAG Values: > 0100000 Shared extent > 0010000 Unwritten preallocated extent > 0001000 Doesn't begin on stripe unit > 0000100 Doesn't end on stripe unit > 0000010 Doesn't begin on stripe width > 0000001 Doesn't end on stripe width > > And there's what the flags mean. > > > with > > > > [disk06]# du -sh ./00000869/014886f4 > > 36G ./00000869/014886f4 > > [disk06]# du -sh --apparent-size ./00000869/014886f4 > > 29G ./00000869/014886f4 > > > > I try to understand if this file contains unused externs > > and how those file are created like this (if we assume that the free space was not fragmented ) > > > > we are running CentOS Stream release 8 with 4.18.0-383.el8.x86_64 > > > > if I defrag the file above the difference bewteen apparent size and size with du disappered ! > > It will be a result of speculative preallocation beyond EOF as the > file is grown to ensure it doesn't get fragmented badly. Files in > the size range of tens of GB or larger will have preallocation > extend out to 8GB beyond EOF. It will get removed when the inode is > reclaimed from memory (i.e. no longer in active use). > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- Carlos Maiolino ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: s_bmap and flags explanation 2022-08-04 10:25 ` Emmanouil Vamvakopoulos 2022-08-04 13:30 ` Carlos Maiolino @ 2022-08-04 22:55 ` Dave Chinner 1 sibling, 0 replies; 6+ messages in thread From: Dave Chinner @ 2022-08-04 22:55 UTC (permalink / raw) To: Emmanouil Vamvakopoulos; +Cc: @pop.gmail.com>, linux-xfs On Thu, Aug 04, 2022 at 12:25:31PM +0200, Emmanouil Vamvakopoulos wrote: > hello Carlos and Dave > > thank you for the replies > > a) for the mismatch in alignment bewteen xfs and underlying raid volume I have to re-check > but from preliminary tests , when I mount the partition with a static allocsize ( e.g. allocsize=256k) > we have large file with large number of externs ( up to 40) but the sizes from du was comparable. As expected - fixing the post-EOF specualtive preallocation to 256kB means almost no consumed space beyond eof so they will always be close (but not identical) for a non-sparse, non-shared file. But that begs the question: why are you concerned about large files consuming slightly more space than expected for a short period of time? We've been doing this since commit 055388a3188f ("xfs: dynamic speculative EOF preallocation") which was committed in January 2011 - over a decade ago - and it's been well known for a couple of decades before that that ls and du cannot be relied to match on any filesystem that supports sparse files. And these days with deduplication/reflink that share extents betwen files, it's even less useful because du can be correct for every individual file, but then still report that more blocks are being used than the filesystem has capacity to store because it reports shared blocks multiple times... So why do you care that du and ls are different? > b) for the speculative preallocation beyond EOF of my files as I understood have to run xfs_fsr to get the space back. No, you don't need to do anything, and you *most definitely* do *not* want to run xfs_fsr to remove it. If you really must remove specualtive prealloc, then run: # xfs_spaceman -c "prealloc -m 0" <mntpt> And that will remove all specualtive preallocation that is current on all in-memory inodes via an immediate blockgc pass. If you just want to remove post-eof blocks on a single file, then find out the file size with stat and truncate it to the same size. The truncate won't change the file size, but it will remove all blocks beyond EOF. *However* You should not ever need to be doing this as there are several automated triggers to remove it, all when the filesytem detects there is no active modification of the file being performed. One trigger is the last close of a file descriptor, another is the periodic background blockgc worker, and another is memory reclaim removing the inode from memory. In all cases, these are triggers that indicate that the file is not currently being written to, and hence the speculative prealloc is not needed anymore and so can be removed. So you should never have to remove it manually. > but why the inodes of those files remains dirty at least for 300 sec after the closing of the file and lost the automatic removal of the preallocation ? What do you mean by "dirty"? A file with post-eof preallocation is not dirty in any way once the data in the file has been written back (usually within 30s). > we are runing on CentOS Stream release 8 with 4.18.0-383.el8.x86_64 > > but we never see something simliar on CentOS Linux release 7.9.2009 (Core) with 3.10.0-1160.45.1.el7.x86_64 > (for similar pattern of file sizes, but truly with different distributed strorage application) RHEL 7/CentOS 7 had this same behaviour - it was introduced in 2.6.38. All your observation means is that the application running on RHEL 7 was writing the files in a way that didn't trigger speculative prealloc beyond EOF, not that speculative prealloc beyond EOF didn't exist.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-08-04 22:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1586129076.70820212.1659538177737.JavaMail.zimbra@ijclab.in2p3.fr>
2022-08-03 14:56 ` s_bmap and flags explanation Emmanouil Vamvakopoulos
2022-08-03 15:54 ` Carlos Maiolino
2022-08-03 21:59 ` Dave Chinner
2022-08-04 10:25 ` Emmanouil Vamvakopoulos
2022-08-04 13:30 ` Carlos Maiolino
2022-08-04 22:55 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox