git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Keeping <john@keeping.me.uk>
To: Jan Vales <jan@jvales.net>
Cc: git@vger.kernel.org
Subject: Re: unexplained behavior/issue with git archive?
Date: Thu, 23 Jul 2015 16:59:36 +0100	[thread overview]
Message-ID: <20150723155936.GC14935@serenity.lan> (raw)
In-Reply-To: <55B10705.6090303@jvales.net>

On Thu, Jul 23, 2015 at 05:23:49PM +0200, Jan Vales wrote:
> i seem to trigger behavior i do not understand with git archive.
> 
> I have this little 3 liner (vmdiff.sh):
> #!/bin/bash
> git diff --name-status "$2" "$3" > "$1.files"
> git diff --name-only "$2" "$3" |xargs -d'\n' git archive -o "$1" "$3" --
> 
> 
> For testing purpose, lets assume this call:
> # ./vmdiff.sh latest.zip HEAD^1 HEAD
> 
> # cat latest.zip.files | wc -l
> 149021
> 
> # cat latest.zip.files | egrep "^D" | wc -l
> 159
> 
> # mkdir empty; cd empty; unzip latest.zip ; find * | wc -l
> 1090
> 
> My goal is to basically diff (parts of) filesystems against each other
> and create an archive with all changed files + a file list to know what
> files were deleted. (I currently do not care about the files
> permissions+ownership, and it doesnt really matter in the current
> problem. Also dont ask, why one would store a root-filesystem in git :)
> 
> What I do not understand: why does the zip file only contains 1090
> files+dirs if the wc -l shows like 150k files and only like 159 were
> deleted?
> There should be like 149k files in that archive.
> 
> Also only the few files are all from "var" and none from etc or srv
> where definitely files changed in too! (and show up in latest.zip.files)
> 
> Is there a limit of files git archive can process?

Not explicitly, but there is a limit on the size of command lines and
xargs will invoke the command multiple times if enough arguments are
given.

What happens if you do:

	git diff --name-only HEAD^ HEAD | xargs -d'\n' echo | wc -l

?

With a small number of items, there should only be one output line, but
if xargs invokes the command multiple times there will be multiple
lines.  For example (using -L2 to force a maximum of two arguments per
invocation):

	$ printf '%s\n' a b c | xargs -d'\n' echo | wc -l
	1
	$ printf '%s\n' a b c | xargs -d'\n' -L2 echo | wc -l
	2

  reply	other threads:[~2015-07-23 15:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-23 15:23 unexplained behavior/issue with git archive? Jan Vales
2015-07-23 15:59 ` John Keeping [this message]
2015-07-23 17:21   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150723155936.GC14935@serenity.lan \
    --to=john@keeping.me.uk \
    --cc=git@vger.kernel.org \
    --cc=jan@jvales.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).