From: Taylor Blau <ttaylorr@github.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net,
tboegi@web.de, e@80x24.org
Subject: Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol
Date: Tue, 18 Apr 2017 11:42:09 -0600 [thread overview]
Message-ID: <20170418174209.GA92973@Ida> (raw)
In-Reply-To: <1D510C6F-A830-48BE-880B-62F4212F4A7F@gmail.com>
On Tue, Apr 18, 2017 at 06:14:36PM +0200, Lars Schneider wrote:
> > Both Git and the filter are going to have to keep these paths in memory
> > somewhere, be that in-process, or on disk. That being said, I can see potential
> > troubles with a large number of long paths that exceed the memory available to
> > Git or the filter when stored in a hashmap/set.
> >
> > On Git's side, I think trading that for some CPU time might make sense. If Git
> > were to SHA1 each path and store that in a hashmap, it would consume more CPU
> > time, but less memory to store each path. Git and the filter could then exchange
> > path names, and Git would simply SHA1 the pathname each time it needed to refer
> > back to memory associated with that entry in a hashmap.
>
> I would be surprised if this would be necessary. If we filter delay 50,000
> files (= a lot!) with a path length of 1000 characters (= very long!) then we
> would use 50MB plus some hashmap data structures. Modern machines should have
> enough RAM I would think...
I agree, and thanks for correcting my thinking here. I ran a simple command to
get the longest path names in a large repository, as:
$ find . -type f | awk '{ print length($1) }' | sort -r -n | uniq -c
And found a few files close to the 200 character mark as the longest pathnames
in the repository. I think 50k files at 1k bytes per pathname is quite enough
head-room :-).
--
Thanks,
Taylor Blau
prev parent reply other threads:[~2017-04-18 17:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-09 19:11 [PATCH v3 0/4] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-04-09 19:11 ` [PATCH v3 1/4] t0021: keep filter log files on comparison Lars Schneider
2017-04-09 19:11 ` [PATCH v3 2/4] t0021: make debug log file name configurable Lars Schneider
2017-04-09 19:11 ` [PATCH v3 3/4] t0021: write "OUT" only on success Lars Schneider
2017-04-09 19:11 ` [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-04-10 10:00 ` Lars Schneider
2017-04-10 14:28 ` Eric Wong
2017-04-10 14:52 ` Lars Schneider
2017-04-10 20:54 ` Torsten Bögershausen
2017-04-11 19:50 ` Lars Schneider
2017-04-12 4:37 ` Torsten Bögershausen
2017-04-18 8:53 ` Lars Schneider
2017-04-19 18:55 ` Torsten Bögershausen
2017-05-21 20:25 ` Lars Schneider
2017-04-12 17:34 ` Taylor Blau
2017-04-12 17:46 ` Taylor Blau
2017-04-18 16:14 ` Lars Schneider
2017-04-18 17:42 ` Taylor Blau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170418174209.GA92973@Ida \
--to=ttaylorr@github.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=larsxschneider@gmail.com \
--cc=peff@peff.net \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.