All of lore.kernel.org
 help / color / mirror / Atom feed
From: Axel Burri <axel@tty0.ch>
To: unlisted-recipients:; (no To-header on input)
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 4/4] btrfs-progs: change -t option for subvolume list to print a simple space-separated table (making it machine-readable)
Date: Mon, 5 Oct 2015 17:08:48 +0200	[thread overview]
Message-ID: <56129280.1010800@tty0.ch> (raw)
In-Reply-To: <561138F4.5090105@inwind.it>



On 2015-10-04 16:34, Goffredo Baroncelli wrote:
> On 2015-10-04 05:37, Duncan wrote:
>> Goffredo Baroncelli posted on Sat, 03 Oct 2015 19:41:33 +0200 as
>> excerpted:
>>
>>> On 2015-10-03 12:09, Axel Burri wrote:
>>>>
>>>>
>>>> On 2015-10-03 11:56, Goffredo Baroncelli wrote:
>>>>> On 2015-10-02 18:41, axel@tty0.ch wrote:
>>>>>> Old implementation used tabs "\t", and tried to work around problems
>>>>>> by guessing amount of tabs needed (e.g. "\t\t" after top level", with
>>>>>> buggy output as soon as empty uuids are printed). This will never
>>>>>> work correctly, as tab width is a user-defined setting in the
>>>>>> terminal.
>>>>>
>>>>>
>>>>> Why not use string_table() and table_*() functions  ?
>>>>
>>>> string_table(), as well as all table functions by nature, needs to know
>>>> the maximum size of all cells in a row before printing, and therefore
>>>> buffers all the output before printing. It would eat up a lot of memory
>>>> for large tables (it is not unusual to have 1000+ subvolumes in btrfs
>>>> if you make heavy use of snapshotting). Furthermore, it would slow down
>>>> things by not printing the output linewise.
>>>
>>>
>>> Assuming 200bytes per row (== subvolume) x 1000 subvolumes = 200kB... I
>>> don't think that this could be a problem, nor in terms of memory used
>>> nor in terms of speed: if you have 1000+ subvolumes the most time
>>> consuming activity is traversing the filesystem looking at the
>>> snapshot...
>>
>> Perhaps unfortunately, scaling to millions of snapshots/subvolumes really 
>> *is* expected by some people.  You'd be surprised at the number of folks 
>> that setup automated per-minute snapshotting with no automated thinning, 
>> and expect to be able to keep several years' worth of snapshots, and then 
>> wonder why btrfs maintenance commands such as balance take weeks/months...
> [...]
>> Obviously btrfs doesn't scale to that level now, and if it did, someone 
>> making the mistake of trying to get a listing of millions of snapshots 
>> would very likely change their mind before even hitting 10%...
>>
>> But that's why actually processing line-by-line is important, so they'll 
>> actually /see/ what happened and ctrl-C it, instead of the program 
>> aborting as it runs into (for example) the 32-bit user/kernel memory 
>> barrier, without printing anything useful...
> 
> Please Ducan, read the code.
> 
> *TODAY* "btrfs list" does a scan of all subvolumes storing information in memory !
> 
> This is the most time consuming activity (think traversing a filesystem with millions of snapshots)
> 
> *Then* "btrfs list" print the info.
> 
> So you are already blocked at the screen until all subvolume are read. And I repeat (re)processing the information requires less time than reading the information from the disk.
> 
> [....]
> 

A quick look at the code shows me that Goffredo is right here, as
__list_subvol_search() always fetches ALL data from
BTRFS_IOC_TREE_SEARCH, putting it into a rbtree for later processing
(assemble full paths, sorting).

While there is certainly room for improvements here (assuming that
BTRFS_IOC_TREE_SEARCH returns objectid's in sorted order, it would
definitively be possible to produce line-by-line output), the code looks
pretty elegant the way it is.

I still don't think it is wise to bloat things further just for printing
nice tables. My impression is that "btrfs subvolume list" is
human-readable enough without the '-t' flag, while the output with '-t'
flag is much more machine-readable-friendly, and thus should have the
highest possible performance. e.g.:

  btrfs sub list  -t / | (read th; while read $th ; do echo $gen; done)
  btrfs sub list -t | column -t

Again, this is just my opinion, being a "unix-purist". Maybe a good
compromise would be to use a single "\t" instead of " " as column delimiter.

  reply	other threads:[~2015-10-05 15:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02 16:41 [PATCH 0/4] btrfs-progs: improve output of btrfs subvolume list command axel
2015-10-02 16:41 ` [PATCH 1/4] btrfs-progs: add -A option for subvolume list (print all available information) axel
2015-10-02 16:41 ` [PATCH 2/4] btrfs-progs: add "flags" column for subvolume list (shows "readonly" flag with -A) axel
2015-10-02 16:41 ` [PATCH 3/4] btrfs-progs: add option "--time-format=short|iso|unix|locale" to subvolume list axel
2015-10-02 16:41 ` [PATCH 4/4] btrfs-progs: change -t option for subvolume list to print a simple space-separated table (making it machine-readable) axel
2015-10-03  9:56   ` Goffredo Baroncelli
2015-10-03 10:06     ` Goffredo Baroncelli
2015-10-03 10:17     ` Axel Burri
     [not found]     ` <560FA944.3050606@digint.ch>
2015-10-03 17:41       ` Goffredo Baroncelli
2015-10-04  3:37         ` Duncan
2015-10-04 14:34           ` Goffredo Baroncelli
2015-10-05 15:08             ` Axel Burri [this message]
     [not found]             ` <56129171.4040200@digint.ch>
2015-10-05 15:42               ` Goffredo Baroncelli
2015-10-05 16:58                 ` Axel Burri
     [not found]                 ` <5612B30A.9030308@tty0.ch>
2015-10-05 20:09                   ` btrfs machine readable output [was Re: btrfs patches] Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56129280.1010800@tty0.ch \
    --to=axel@tty0.ch \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.