From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: kreijack@inwind.it
Cc: Chris Murphy <lists@colorremedies.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: is BTRFS_IOC_DEFRAG behavior optimal?
Date: Mon, 8 Feb 2021 17:21:54 -0500 [thread overview]
Message-ID: <20210208222154.GD32440@hungrycats.org> (raw)
In-Reply-To: <3897f126-e977-d842-f91d-b48b74958f3d@libero.it>
On Mon, Feb 08, 2021 at 11:11:47PM +0100, Goffredo Baroncelli wrote:
> On 2/7/21 11:06 PM, Chris Murphy wrote:
> > systemd-journald journals on Btrfs default to nodatacow, upon log
> > rotation it's submitted for defragmenting with BTRFS_IOC_DEFRAG. The
> > result looks curious. I can't tell what the logic is from the results.
> >
> > The journal file starts out being fallocated with a size of 8MB, and
> > as it grows there is an append of 8MB increments, also fallocated.
> > This leads to a filefrag -v that looks like this (ext4 and btrfs
> > nodatacow follow the same behavior, both are provided for reference):
> >
> > ext4
> > https://pastebin.com/6vuufwXt
> >
> > btrfs
> > https://pastebin.com/Y18B2m4h
> >
> > Following defragment with BTRFS_IOC_DEFRAG it looks like this:
> > https://pastebin.com/1ufErVMs
> >
> > It appears at first glance to be significantly more fragmented. Closer
> > inspection shows that most of the extents weren't relocated. But
> > what's up with the peculiar interleaving? Is this an improvement over
> > the original allocation?
>
> I am not sure how read the filefrag output: I see several lines like
> [...]
> 5: 1691.. 1693: 125477.. 125479: 3:
> 6: 1694.. 1694: 125480.. 125480: 1: unwritten
> [...]
>
> What means "unwritten" ? The kernel documentation [*] says:
> [...]
> * FIEMAP_EXTENT_UNWRITTEN
> Unwritten extent - the extent is allocated but its data has not been
> initialized. This indicates the extent's data will be all zero if read
> through the filesystem but the contents are undefined if read directly from
> the device.
> [..]
> So it seems that the data didn't touch the platters (!)
>
> My educate guess is that there is something strange in the sequence:
> - write
> - sync
> - close log
> - move log
> - defrag log
>
> May be the defrag starts before all the data reach the platters ?
defrag will put the file's contents back into delalloc, and it won't be
allocated until a flush (fsync, sync, or commit interval). Defrag is
roughly equivalent to simply copying the data to a new file in btrfs,
except the logical extents are atomically updated to point to the new
location.
FIEMAP has an option flag to sync the data before returning a map.
DEFRAG has an option to start IO immediately so it will presumably be
done by the time you look at the extents with FIEMAP.
> For what matters, I create a file with the same fragmentation like your one
>
> $ sudo filefrag -v data.txt
> Filesystem type is: 9123683e
> File size of data.txt is 25165824 (6144 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 0: 1597171.. 1597171: 1:
> 1: 1.. 1599: 163433285.. 163434883: 1599: 1597172:
> 2: 1600.. 1607: 1601255.. 1601262: 8: 163434884:
> 3: 1608.. 1689: 1604137.. 1604218: 82: 1601263:
> 4: 1690.. 1690: 1597484.. 1597484: 1: 1604219:
> 5: 1691.. 1693: 1597465.. 1597467: 3: 1597485:
> 6: 1694.. 1694: 1597966.. 1597966: 1: 1597468:
> 7: 1695.. 1722: 1599557.. 1599584: 28: 1597967:
> 8: 1723.. 1723: 1599211.. 1599211: 1: 1599585:
> 9: 1724.. 1955: 1648394.. 1648625: 232: 1599212:
> 10: 1956.. 1956: 1599695.. 1599695: 1: 1648626:
> 11: 1957.. 2047: 1625881.. 1625971: 91: 1599696:
> 12: 2048.. 2417: 1648804.. 1649173: 370: 1625972:
> 13: 2418.. 2420: 1597468.. 1597470: 3: 1649174:
> 14: 2421.. 2478: 1624667.. 1624724: 58: 1597471:
> 15: 2479.. 2479: 1596416.. 1596416: 1: 1624725:
> 16: 2480.. 2482: 1601045.. 1601047: 3: 1596417:
> 17: 2483.. 2483: 1596854.. 1596854: 1: 1601048:
> 18: 2484.. 2523: 1602715.. 1602754: 40: 1596855:
> 19: 2524.. 2527: 1597471.. 1597474: 4: 1602755:
> 20: 2528.. 2598: 1624725.. 1624795: 71: 1597475:
> 21: 2599.. 2599: 1596858.. 1596858: 1: 1624796:
> 22: 2600.. 2607: 1601263.. 1601270: 8: 1596859:
> 23: 2608.. 2608: 1596863.. 1596863: 1: 1601271:
> 24: 2609.. 2611: 1601271.. 1601273: 3: 1596864:
> 25: 2612.. 2612: 1596864.. 1596864: 1: 1601274:
> 26: 2613.. 2615: 1601274.. 1601276: 3: 1596865:
> 27: 2616.. 2616: 1596981.. 1596981: 1: 1601277:
> 28: 2617.. 2691: 1649174.. 1649248: 75: 1596982:
> 29: 2692.. 2696: 1597475.. 1597479: 5: 1649249:
> 30: 2697.. 2756: 1634995.. 1635054: 60: 1597480:
> 31: 2757.. 2758: 1597480.. 1597481: 2: 1635055:
> 32: 2759.. 2762: 1601351.. 1601354: 4: 1597482:
> 33: 2763.. 2764: 1597482.. 1597483: 2: 1601355:
> 34: 2765.. 2837: 1649249.. 1649321: 73: 1597484:
> 35: 2838.. 2838: 1597038.. 1597038: 1: 1649322:
> 36: 2839.. 2855: 1601538.. 1601554: 17: 1597039:
> 37: 2856.. 2856: 1597045.. 1597045: 1: 1601555:
> 38: 2857.. 2904: 1624547.. 1624594: 48: 1597046:
> 39: 2905.. 2926: 1600795.. 1600816: 22: 1624595:
> 40: 2927.. 2942: 1602034.. 1602049: 16: 1600817:
> 41: 2943.. 2963: 1600817.. 1600837: 21: 1602050:
> 42: 2964.. 2979: 1602183.. 1602198: 16: 1600838:
> 43: 2980.. 3001: 1600927.. 1600948: 22: 1602199:
> 44: 3002.. 3043: 1621164.. 1621205: 42: 1600949:
> 45: 3044.. 3053: 1599231.. 1599240: 10: 1621206:
> 46: 3054.. 3066: 1601952.. 1601964: 13: 1599241:
> 47: 3067.. 3067: 1597056.. 1597056: 1: 1601965:
> 48: 3068.. 3084: 1602375.. 1602391: 17: 1597057:
> 49: 3085.. 3094: 1599290.. 1599299: 10: 1602392:
> 50: 3095.. 3096: 1601355.. 1601356: 2: 1599300:
> 51: 3097.. 3107: 1600717.. 1600727: 11: 1601357:
> 52: 3108.. 3156: 1642892.. 1642940: 49: 1600728:
> 53: 3157.. 3157: 1597059.. 1597059: 1: 1642941:
> 54: 3158.. 3251: 1649322.. 1649415: 94: 1597060:
> 55: 3252.. 3254: 1599241.. 1599243: 3: 1649416:
> 56: 3255.. 3304: 1645466.. 1645515: 50: 1599244:
> 57: 3305.. 3305: 1597100.. 1597100: 1: 1645516:
> 58: 3306.. 3312: 1601357.. 1601363: 7: 1597101:
> 59: 3313.. 3319: 1599300.. 1599306: 7: 1601364:
> 60: 3320.. 3331: 1601611.. 1601622: 12: 1599307:
> 61: 3332.. 3339: 1600838.. 1600845: 8: 1601623:
> 62: 3340.. 3343: 1601419.. 1601422: 4: 1600846:
> 63: 3344.. 3351: 1600846.. 1600853: 8: 1601423:
> 64: 3352.. 3432: 1649416.. 1649496: 81: 1600854:
> 65: 3433.. 3433: 1597109.. 1597109: 1: 1649497:
> 66: 3434.. 3489: 1649497.. 1649552: 56: 1597110:
> 67: 3490.. 3491: 1599227.. 1599228: 2: 1649553:
> 68: 3492.. 3521: 1619348.. 1619377: 30: 1599229:
> 69: 3522.. 3523: 1599307.. 1599308: 2: 1619378:
> 70: 3524.. 3530: 1601688.. 1601694: 7: 1599309:
> 71: 3531.. 3539: 1600949.. 1600957: 9: 1601695:
> 72: 3540.. 3579: 1629356.. 1629395: 40: 1600958:
> 73: 3580.. 3580: 1597124.. 1597124: 1: 1629396:
> 74: 3581.. 3601: 1604219.. 1604239: 21: 1597125:
> 75: 3602.. 3603: 1599585.. 1599586: 2: 1604240:
> 76: 3604.. 3614: 1602636.. 1602646: 11: 1599587:
> 77: 3615.. 3616: 1599587.. 1599588: 2: 1602647:
> 78: 3617.. 3677: 1649553.. 1649613: 61: 1599589:
> 79: 3678.. 3680: 1599692.. 1599694: 3: 1649614:
> 80: 3681.. 3723: 1647818.. 1647860: 43: 1599695:
> 81: 3724.. 3726: 1599821.. 1599823: 3: 1647861:
> 82: 3727.. 3756: 1622218.. 1622247: 30: 1599824:
> 83: 3757.. 3759: 1600630.. 1600632: 3: 1622248:
> 84: 3760.. 3766: 1603288.. 1603294: 7: 1600633:
> 85: 3767.. 3768: 1600633.. 1600634: 2: 1603295:
> 86: 3769.. 3950: 76053306.. 76053487: 182: 1600635:
> 87: 3951.. 3958: 1600958.. 1600965: 8: 76053488:
> 88: 3959.. 3986: 1619921.. 1619948: 28: 1600966:
> 89: 3987.. 3995: 1600966.. 1600974: 9: 1619949:
> 90: 3996.. 4036: 1649614.. 1649654: 41: 1600975:
> 91: 4037.. 4045: 1600975.. 1600983: 9: 1649655:
> 92: 4046.. 4050: 1601423.. 1601427: 5: 1600984:
> 93: 4051.. 4052: 1600854.. 1600855: 2: 1601428:
> 94: 4053.. 4055: 1601555.. 1601557: 3: 1600856:
> 95: 4056.. 4056: 1597129.. 1597129: 1: 1601558:
> 96: 4057.. 4059: 1601745.. 1601747: 3: 1597130:
> 97: 4060.. 4060: 1597134.. 1597134: 1: 1601748:
> 98: 4061.. 4063: 1602050.. 1602052: 3: 1597135:
> 99: 4064.. 4064: 1597137.. 1597137: 1: 1602053:
> 100: 4065.. 4079: 1604297.. 1604311: 15: 1597138:
> 101: 4080.. 4088: 1600987.. 1600995: 9: 1604312:
> 102: 4089.. 4095: 1603295.. 1603301: 7: 1600996:
> 103: 4096.. 4106: 1600996.. 1601006: 11: 1603302:
> 104: 4107.. 4117: 1622600.. 1622610: 11: 1601007:
> 105: 4118.. 4119: 1601007.. 1601008: 2: 1622611:
> 106: 4120.. 4129: 1622611.. 1622620: 10: 1601009:
> 107: 4130.. 4131: 1601009.. 1601010: 2: 1622621:
> 108: 4132.. 4141: 1622621.. 1622630: 10: 1601011:
> 109: 4142.. 4145: 1601011.. 1601014: 4: 1622631:
> 110: 4146.. 4155: 1622986.. 1622995: 10: 1601015:
> 111: 4156.. 4157: 1601015.. 1601016: 2: 1622996:
> 112: 4158.. 4168: 1622996.. 1623006: 11: 1601017:
> 113: 4169.. 4170: 1601017.. 1601018: 2: 1623007:
> 114: 4171.. 4180: 1623007.. 1623016: 10: 1601019:
> 115: 4181.. 4182: 1601019.. 1601020: 2: 1623017:
> 116: 4183.. 4192: 1624473.. 1624482: 10: 1601021:
> 117: 4193.. 4195: 1601021.. 1601023: 3: 1624483:
> 118: 4196.. 4205: 1624796.. 1624805: 10: 1601024:
> 119: 4206.. 4207: 1601024.. 1601025: 2: 1624806:
> 120: 4208.. 4217: 1624806.. 1624815: 10: 1601026:
> 121: 4218.. 4220: 1601026.. 1601028: 3: 1624816:
> 122: 4221.. 4230: 1625972.. 1625981: 10: 1601029:
> 123: 4231.. 4408: 1648626.. 1648803: 178: 1625982:
> 124: 4409.. 4411: 1602199.. 1602201: 3: 1648804:
> 125: 4412.. 4434: 1601328.. 1601350: 23: 1602202:
> 126: 4435.. 4437: 1602647.. 1602649: 3: 1601351:
> 127: 4438.. 4439: 1601029.. 1601030: 2: 1602650:
> 128: 4440.. 4442: 1602755.. 1602757: 3: 1601031:
> 129: 4443.. 4480: 1601650.. 1601687: 38: 1602758:
> 130: 4481.. 4491: 1629530.. 1629540: 11: 1601688:
> 131: 4492.. 4560: 1624404.. 1624472: 69: 1629541:
> 132: 4561.. 4571: 1629541.. 1629551: 11: 1624473:
> 133: 4572.. 4582: 1601031.. 1601041: 11: 1629552:
> 134: 4583.. 4586: 1603302.. 1603305: 4: 1601042:
> 135: 4587.. 4620: 1602537.. 1602570: 34: 1603306:
> 136: 4621.. 4631: 1629716.. 1629726: 11: 1602571:
> 137: 4632.. 4634: 1601042.. 1601044: 3: 1629727:
> 138: 4635.. 6143: 156004864.. 156006372: 1509: 1601045: last,eof
> data.txt: 139 extents found
>
> the I tried to defrag it
>
> $ btrfs fi defra data.txt
> $ sudo filefrag -v data.txt
> Filesystem type is: 9123683e
> File size of data.txt is 25165824 (6144 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 6143: 164002967.. 164009110: 6144: last,eof
> data.txt: 1 extent found
>
> So it seems that the defrag works
Be very careful how you set up this test case. If you use fallocate on
a file, it has a _permanent_ effect on the inode, and alters a lot of
normal btrfs behavior downstream. You won't see these effects if you
just write some data to a file without using prealloc.
> [*] https://www.kernel.org/doc/Documentation/filesystems/fiemap.txt
> >
> > https://pastebin.com/1ufErVMs
> >
> > If I unwind the interleaving, it looks like all the extents fall into
> > two localities and within each locality the extents aren't that far
> > apart - so my guess is that this file is also not meaningfully
> > fragmented, in practice. Surely the drive firmware will reorder the
> > reads to arrive at the least amount of seeks?
> >
>
>
> --
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2021-02-08 22:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-07 22:06 is BTRFS_IOC_DEFRAG behavior optimal? Chris Murphy
2021-02-08 22:11 ` Goffredo Baroncelli
2021-02-08 22:21 ` Zygo Blaxell [this message]
2021-02-09 1:05 ` Chris Murphy
2021-02-09 0:42 ` Chris Murphy
2021-02-09 18:13 ` Goffredo Baroncelli
2021-02-09 19:01 ` Chris Murphy
2021-02-09 19:45 ` Goffredo Baroncelli
2021-02-09 20:26 ` Chris Murphy
2021-02-10 6:37 ` Chris Murphy
2021-02-10 19:14 ` Goffredo Baroncelli
2021-02-11 0:19 ` Chris Murphy
2021-02-11 3:08 ` kreijack
2021-02-11 3:13 ` Zygo Blaxell
2021-02-11 3:39 ` Chris Murphy
2021-02-11 6:12 ` Zygo Blaxell
2021-02-11 8:46 ` Chris Murphy
2021-02-13 0:16 ` Zygo Blaxell
2021-02-11 3:52 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210208222154.GD32440@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).