From: Axel Burri <axel@tty0.ch>
To: unlisted-recipients:; (no To-header on input)
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 4/4] btrfs-progs: change -t option for subvolume list to print a simple space-separated table (making it machine-readable)
Date: Mon, 5 Oct 2015 17:08:48 +0200 [thread overview]
Message-ID: <56129280.1010800@tty0.ch> (raw)
In-Reply-To: <561138F4.5090105@inwind.it>
On 2015-10-04 16:34, Goffredo Baroncelli wrote:
> On 2015-10-04 05:37, Duncan wrote:
>> Goffredo Baroncelli posted on Sat, 03 Oct 2015 19:41:33 +0200 as
>> excerpted:
>>
>>> On 2015-10-03 12:09, Axel Burri wrote:
>>>>
>>>>
>>>> On 2015-10-03 11:56, Goffredo Baroncelli wrote:
>>>>> On 2015-10-02 18:41, axel@tty0.ch wrote:
>>>>>> Old implementation used tabs "\t", and tried to work around problems
>>>>>> by guessing amount of tabs needed (e.g. "\t\t" after top level", with
>>>>>> buggy output as soon as empty uuids are printed). This will never
>>>>>> work correctly, as tab width is a user-defined setting in the
>>>>>> terminal.
>>>>>
>>>>>
>>>>> Why not use string_table() and table_*() functions ?
>>>>
>>>> string_table(), as well as all table functions by nature, needs to know
>>>> the maximum size of all cells in a row before printing, and therefore
>>>> buffers all the output before printing. It would eat up a lot of memory
>>>> for large tables (it is not unusual to have 1000+ subvolumes in btrfs
>>>> if you make heavy use of snapshotting). Furthermore, it would slow down
>>>> things by not printing the output linewise.
>>>
>>>
>>> Assuming 200bytes per row (== subvolume) x 1000 subvolumes = 200kB... I
>>> don't think that this could be a problem, nor in terms of memory used
>>> nor in terms of speed: if you have 1000+ subvolumes the most time
>>> consuming activity is traversing the filesystem looking at the
>>> snapshot...
>>
>> Perhaps unfortunately, scaling to millions of snapshots/subvolumes really
>> *is* expected by some people. You'd be surprised at the number of folks
>> that setup automated per-minute snapshotting with no automated thinning,
>> and expect to be able to keep several years' worth of snapshots, and then
>> wonder why btrfs maintenance commands such as balance take weeks/months...
> [...]
>> Obviously btrfs doesn't scale to that level now, and if it did, someone
>> making the mistake of trying to get a listing of millions of snapshots
>> would very likely change their mind before even hitting 10%...
>>
>> But that's why actually processing line-by-line is important, so they'll
>> actually /see/ what happened and ctrl-C it, instead of the program
>> aborting as it runs into (for example) the 32-bit user/kernel memory
>> barrier, without printing anything useful...
>
> Please Ducan, read the code.
>
> *TODAY* "btrfs list" does a scan of all subvolumes storing information in memory !
>
> This is the most time consuming activity (think traversing a filesystem with millions of snapshots)
>
> *Then* "btrfs list" print the info.
>
> So you are already blocked at the screen until all subvolume are read. And I repeat (re)processing the information requires less time than reading the information from the disk.
>
> [....]
>
A quick look at the code shows me that Goffredo is right here, as
__list_subvol_search() always fetches ALL data from
BTRFS_IOC_TREE_SEARCH, putting it into a rbtree for later processing
(assemble full paths, sorting).
While there is certainly room for improvements here (assuming that
BTRFS_IOC_TREE_SEARCH returns objectid's in sorted order, it would
definitively be possible to produce line-by-line output), the code looks
pretty elegant the way it is.
I still don't think it is wise to bloat things further just for printing
nice tables. My impression is that "btrfs subvolume list" is
human-readable enough without the '-t' flag, while the output with '-t'
flag is much more machine-readable-friendly, and thus should have the
highest possible performance. e.g.:
btrfs sub list -t / | (read th; while read $th ; do echo $gen; done)
btrfs sub list -t | column -t
Again, this is just my opinion, being a "unix-purist". Maybe a good
compromise would be to use a single "\t" instead of " " as column delimiter.
next prev parent reply other threads:[~2015-10-05 15:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-02 16:41 [PATCH 0/4] btrfs-progs: improve output of btrfs subvolume list command axel
2015-10-02 16:41 ` [PATCH 1/4] btrfs-progs: add -A option for subvolume list (print all available information) axel
2015-10-02 16:41 ` [PATCH 2/4] btrfs-progs: add "flags" column for subvolume list (shows "readonly" flag with -A) axel
2015-10-02 16:41 ` [PATCH 3/4] btrfs-progs: add option "--time-format=short|iso|unix|locale" to subvolume list axel
2015-10-02 16:41 ` [PATCH 4/4] btrfs-progs: change -t option for subvolume list to print a simple space-separated table (making it machine-readable) axel
2015-10-03 9:56 ` Goffredo Baroncelli
2015-10-03 10:06 ` Goffredo Baroncelli
2015-10-03 10:17 ` Axel Burri
[not found] ` <560FA944.3050606@digint.ch>
2015-10-03 17:41 ` Goffredo Baroncelli
2015-10-04 3:37 ` Duncan
2015-10-04 14:34 ` Goffredo Baroncelli
2015-10-05 15:08 ` Axel Burri [this message]
[not found] ` <56129171.4040200@digint.ch>
2015-10-05 15:42 ` Goffredo Baroncelli
2015-10-05 16:58 ` Axel Burri
[not found] ` <5612B30A.9030308@tty0.ch>
2015-10-05 20:09 ` btrfs machine readable output [was Re: btrfs patches] Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56129280.1010800@tty0.ch \
--to=axel@tty0.ch \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).