* unexplained behavior/issue with git archive?
@ 2015-07-23 15:23 Jan Vales
2015-07-23 15:59 ` John Keeping
0 siblings, 1 reply; 3+ messages in thread
From: Jan Vales @ 2015-07-23 15:23 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]
hi,
i seem to trigger behavior i do not understand with git archive.
I have this little 3 liner (vmdiff.sh):
#!/bin/bash
git diff --name-status "$2" "$3" > "$1.files"
git diff --name-only "$2" "$3" |xargs -d'\n' git archive -o "$1" "$3" --
For testing purpose, lets assume this call:
# ./vmdiff.sh latest.zip HEAD^1 HEAD
# cat latest.zip.files | wc -l
149021
# cat latest.zip.files | egrep "^D" | wc -l
159
# mkdir empty; cd empty; unzip latest.zip ; find * | wc -l
1090
My goal is to basically diff (parts of) filesystems against each other
and create an archive with all changed files + a file list to know what
files were deleted. (I currently do not care about the files
permissions+ownership, and it doesnt really matter in the current
problem. Also dont ask, why one would store a root-filesystem in git :)
What I do not understand: why does the zip file only contains 1090
files+dirs if the wc -l shows like 150k files and only like 159 were
deleted?
There should be like 149k files in that archive.
Also only the few files are all from "var" and none from etc or srv
where definitely files changed in too! (and show up in latest.zip.files)
Is there a limit of files git archive can process?
lg
Jan Vales
--
I only read plaintext emails.
Someone @ irc://irc.fsinf.at:6667/tuwien
webIRC: https://frost.fsinf.at/iris/
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: unexplained behavior/issue with git archive?
2015-07-23 15:23 unexplained behavior/issue with git archive? Jan Vales
@ 2015-07-23 15:59 ` John Keeping
2015-07-23 17:21 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: John Keeping @ 2015-07-23 15:59 UTC (permalink / raw)
To: Jan Vales; +Cc: git
On Thu, Jul 23, 2015 at 05:23:49PM +0200, Jan Vales wrote:
> i seem to trigger behavior i do not understand with git archive.
>
> I have this little 3 liner (vmdiff.sh):
> #!/bin/bash
> git diff --name-status "$2" "$3" > "$1.files"
> git diff --name-only "$2" "$3" |xargs -d'\n' git archive -o "$1" "$3" --
>
>
> For testing purpose, lets assume this call:
> # ./vmdiff.sh latest.zip HEAD^1 HEAD
>
> # cat latest.zip.files | wc -l
> 149021
>
> # cat latest.zip.files | egrep "^D" | wc -l
> 159
>
> # mkdir empty; cd empty; unzip latest.zip ; find * | wc -l
> 1090
>
> My goal is to basically diff (parts of) filesystems against each other
> and create an archive with all changed files + a file list to know what
> files were deleted. (I currently do not care about the files
> permissions+ownership, and it doesnt really matter in the current
> problem. Also dont ask, why one would store a root-filesystem in git :)
>
> What I do not understand: why does the zip file only contains 1090
> files+dirs if the wc -l shows like 150k files and only like 159 were
> deleted?
> There should be like 149k files in that archive.
>
> Also only the few files are all from "var" and none from etc or srv
> where definitely files changed in too! (and show up in latest.zip.files)
>
> Is there a limit of files git archive can process?
Not explicitly, but there is a limit on the size of command lines and
xargs will invoke the command multiple times if enough arguments are
given.
What happens if you do:
git diff --name-only HEAD^ HEAD | xargs -d'\n' echo | wc -l
?
With a small number of items, there should only be one output line, but
if xargs invokes the command multiple times there will be multiple
lines. For example (using -L2 to force a maximum of two arguments per
invocation):
$ printf '%s\n' a b c | xargs -d'\n' echo | wc -l
1
$ printf '%s\n' a b c | xargs -d'\n' -L2 echo | wc -l
2
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: unexplained behavior/issue with git archive?
2015-07-23 15:59 ` John Keeping
@ 2015-07-23 17:21 ` Junio C Hamano
0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2015-07-23 17:21 UTC (permalink / raw)
To: John Keeping; +Cc: Jan Vales, git
John Keeping <john@keeping.me.uk> writes:
> With a small number of items, there should only be one output line, but
> if xargs invokes the command multiple times there will be multiple
> lines. For example (using -L2 to force a maximum of two arguments per
> invocation):
>
> $ printf '%s\n' a b c | xargs -d'\n' echo | wc -l
> 1
> $ printf '%s\n' a b c | xargs -d'\n' -L2 echo | wc -l
> 2
Yup, I think this thread is mistitled; it looks like an "unexpected
behaviour with xargs". Or "common pitfalls with xargs", perhaps.
Now, what would be a reasonable workaround. To work around command
line length limits (not necessarily for xargs, but the exact same
issue would arise if you are trying to specify too many pathspecs on
the command line), many of our commands take paths from their
standard input. Would it be reasonable to teach "git archive" to
also do so?
Or would it make sense to teach "git archive -o" a new mode to
append to an existing archive, so that repeated invocations of "git
archive" via such a use of "xargs" would create in the first
invocation and then keep appending to the same archive in the
subsequent invocations?
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-07-23 17:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-23 15:23 unexplained behavior/issue with git archive? Jan Vales
2015-07-23 15:59 ` John Keeping
2015-07-23 17:21 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).