* zip files created with git archive flags text files as binaries @ 2015-02-23 13:58 Ulrike Fischer 2015-02-23 19:30 ` René Scharfe 2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe 0 siblings, 2 replies; 8+ messages in thread From: Ulrike Fischer @ 2015-02-23 13:58 UTC (permalink / raw) To: git I'm using git on windows 7. $ git --version git version 1.9.4.msysgit.0 Some days ago I uploaded a latex package to CTAN (www.ctan.org). I created the zip-file with git archive --format=zip --prefix=citeall/ --output=zip/citeall_2015-02-20.zip HEAD The zip contained four text files and a pdf. The CTAN maintainers informed me that all files in the zip are flagged as binaries and this makes it difficult for them to process them further (they want to correct line feeds of the text files: http://mirror.ctan.org/tex-archive/help/ctan/CTAN-upload-addendum.html#crlf) unzip -Z reports for my zip: $ unzip -Z citeall_2015_02_20.zip Archive: citeall_2015_02_20.zip Zip file size: 105509 bytes, number of entries: 6 drwx--- 0.0 fat 0 bx stor 15-Feb-20 17:07 citeall/ -rw---- 0.0 fat 458 bx defN 15-Feb-20 17:07 citeall/README -rw---- 0.0 fat 102244 bx defN 15-Feb-20 17:07 citeall/citeall.pdf -rw---- 0.0 fat 3431 bx defN 15-Feb-20 17:07 citeall/citeall.sty -rw---- 0.0 fat 3971 bx defN 15-Feb-20 17:07 citeall/citeall.tex -rw---- 0.0 fat 557 bx defN 15-Feb-20 17:07 citeall/examples-citeall.bib 6 files, 110661 bytes uncompressed, 104669 bytes compressed: 5.4% The problem are all the "bx" entries. When I zip all the files with the standard windows zip-tool I get this: $ unzip -Z citeall-win.zip Archive: citeall-win.zip Zip file size: 105275 bytes, number of entries: 5 -rw---- 2.0 fat 102244 b- defN 15-Feb-20 17:07 citeall/citeall.pdf -rw---- 2.0 fat 3431 t- defN 15-Feb-20 17:07 citeall/citeall.sty -rw---- 2.0 fat 3971 t- defN 15-Feb-20 17:07 citeall/citeall.tex -rw---- 2.0 fat 557 t- defN 15-Feb-20 17:07 citeall/examples-citeall.bib -rw---- 2.0 fat 458 t- defN 15-Feb-20 17:07 citeall/README 5 files, 110661 bytes uncompressed, 104675 bytes compressed: 5.4% Here the text files have a correct t flag. I don't know if it the problem exists also with zips created with git archive on non-windows OS. Would it be possible to correct the zip-backend so that it flags text files correctly? Or alternativly could one configure git archive to use another zip programm? (I tried to sent this already some hours ago, but the message seems to be lost ...) -- Ulrike Fischer http://www.troubleshooting-tex.de/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zip files created with git archive flags text files as binaries 2015-02-23 13:58 zip files created with git archive flags text files as binaries Ulrike Fischer @ 2015-02-23 19:30 ` René Scharfe 2015-03-04 21:13 ` René Scharfe 2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe 1 sibling, 1 reply; 8+ messages in thread From: René Scharfe @ 2015-02-23 19:30 UTC (permalink / raw) To: luatex, git Am 23.02.2015 um 14:58 schrieb Ulrike Fischer: > I'm using git on windows 7. > > $ git --version > git version 1.9.4.msysgit.0 Git's code for ZIP file creation hasn't changed since then. > Some days ago I uploaded a latex package to CTAN (www.ctan.org). > I created the zip-file with > > git archive --format=zip --prefix=citeall/ > --output=zip/citeall_2015-02-20.zip HEAD > > The zip contained four text files and a pdf. > > The CTAN maintainers informed me that all files in the zip are > flagged as binaries and this makes it difficult for them to process > them further (they want to correct line feeds of the text files: > http://mirror.ctan.org/tex-archive/help/ctan/CTAN-upload-addendum.html#crlf) I wouldn't have expected that this ZIP file attribute is actually used in the wild. > unzip -Z reports for my zip: > > $ unzip -Z citeall_2015_02_20.zip > Archive: citeall_2015_02_20.zip > Zip file size: 105509 bytes, number of entries: 6 > drwx--- 0.0 fat 0 bx stor 15-Feb-20 17:07 citeall/ > -rw---- 0.0 fat 458 bx defN 15-Feb-20 17:07 citeall/README > -rw---- 0.0 fat 102244 bx defN 15-Feb-20 17:07 > citeall/citeall.pdf > -rw---- 0.0 fat 3431 bx defN 15-Feb-20 17:07 > citeall/citeall.sty > -rw---- 0.0 fat 3971 bx defN 15-Feb-20 17:07 > citeall/citeall.tex > -rw---- 0.0 fat 557 bx defN 15-Feb-20 17:07 > citeall/examples-citeall.bib > 6 files, 110661 bytes uncompressed, 104669 bytes compressed: 5.4% > > The problem are all the "bx" entries. The "x" is due to the extra mtime header and unrelated to your issue. "b" (binary) is set unconditionally to ensure that line endings are kept intact by unpackers on all platforms. "fat" is used as lowest common denominator for regular files and directories; "unx" is used for symbolic links. > When I zip all the files with the standard windows zip-tool I get > this: > > $ unzip -Z citeall-win.zip > Archive: citeall-win.zip > Zip file size: 105275 bytes, number of entries: 5 > -rw---- 2.0 fat 102244 b- defN 15-Feb-20 17:07 > citeall/citeall.pdf > -rw---- 2.0 fat 3431 t- defN 15-Feb-20 17:07 > citeall/citeall.sty > -rw---- 2.0 fat 3971 t- defN 15-Feb-20 17:07 > citeall/citeall.tex > -rw---- 2.0 fat 557 t- defN 15-Feb-20 17:07 > citeall/examples-citeall.bib > -rw---- 2.0 fat 458 t- defN 15-Feb-20 17:07 citeall/README > 5 files, 110661 bytes uncompressed, 104675 bytes compressed: 5.4% > > Here the text files have a correct t flag. > > I don't know if it the problem exists also with zips created with > git archive on non-windows OS. Yes, git archive creates the same ZIP files everywhere. > Would it be possible to correct the zip-backend so that it flags > text files correctly? Or alternativly could one configure git > archive to use another zip programm? We would have to detect the line ending format (DOS, Unix, Macintosh, etc.) of each file, then set the attribute "t" (text) and the host system. The detection would slow down archive creation a bit and the resulting files would be different, of course, so this feature should only be enabled by a new command line option. I'll take a look. René ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zip files created with git archive flags text files as binaries 2015-02-23 19:30 ` René Scharfe @ 2015-03-04 21:13 ` René Scharfe 0 siblings, 0 replies; 8+ messages in thread From: René Scharfe @ 2015-03-04 21:13 UTC (permalink / raw) To: luatex, git Am 23.02.2015 um 20:30 schrieb René Scharfe: > Am 23.02.2015 um 14:58 schrieb Ulrike Fischer: >> The zip contained four text files and a pdf. >> >> The CTAN maintainers informed me that all files in the zip are >> flagged as binaries and this makes it difficult for them to process >> them further (they want to correct line feeds of the text files: >> http://mirror.ctan.org/tex-archive/help/ctan/CTAN-upload-addendum.html#crlf) By the way, a workaround for CTAN could be to extract the files using unzip and zip them again using Info-ZIP zip (one of the "good zip programs" mentioned on that website). The result will be a ZIP file with text files flagged appropriately. Just saying. >> Would it be possible to correct the zip-backend so that it flags >> text files correctly? Or alternativly could one configure git >> archive to use another zip programm? > > We would have to detect the line ending format (DOS, Unix, Macintosh, > etc.) of each file, then set the attribute "t" (text) and the host > system. The detection would slow down archive creation a bit and the > resulting files would be different, of course, so this feature should > only be enabled by a new command line option. I'll take a look. Actually its easier than that. unzip -a (with end-of-line conversion) doesn't care for the actual format, it simply converts all occurrences of CR, CRLF and LF to the appropriate newline chars for the platform it is running on if the text flag is set. Files with mixed line endings are normalized, i.e. they have a consistent end-of-line style afterward. I'll send a first patch shortly. René ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] archive-zip: add --text parameter 2015-02-23 13:58 zip files created with git archive flags text files as binaries Ulrike Fischer 2015-02-23 19:30 ` René Scharfe @ 2015-03-04 21:13 ` René Scharfe 2015-03-05 2:16 ` Junio C Hamano ` (2 more replies) 1 sibling, 3 replies; 8+ messages in thread From: René Scharfe @ 2015-03-04 21:13 UTC (permalink / raw) To: luatex, git Entries in a ZIP file can be marked as text files. Extractors can use that flag to apply end-of-line conversions. An example is unzip -a. git archive currently marks all ZIP file entries as binary files. This patch adds the new option --text that can be used to mark non-binary files or all files as text files, thus enabling the use of unzip -a. No sign-off, yet, because I'm not sure we really need another option. E.g. --text=all doesn't seem to be actually useful, but it was easy to implement. Info-ZIP's zip always creates archives like --text=auto does, so perhaps we should make that our default behavior as well? Changing the default behavior would cause newer versions of git archive to create different ZIP files than older ones, of course. This can break caching and signature checking. The last time we did that was in 2012 when we added an extended mtime field (227bf5980), I think. I don't remember any fallout from that change, but there was a recent discussion about the stability of generated tar files, so I'm a bit cautious: http://thread.gmane.org/gmane.comp.version-control.git/258516 --- Documentation/git-archive.txt | 5 ++++ archive-zip.c | 23 ++++++++++++++---- archive.c | 18 ++++++++++++++ archive.h | 7 ++++++ t/t5003-archive-zip.sh | 56 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 105 insertions(+), 4 deletions(-) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index cfa1e4e..684ca36 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -93,6 +93,11 @@ zip Highest and slowest compression level. You can specify any number from 1 to 9 to adjust compression speed and ratio. +--text=<which>:: + Mark the specfied entries as text files so that `unzip -a` + converts end-of-line characters while extracting. The value + must be either 'all', 'auto', or 'none' (the default). + CONFIGURATION ------------- diff --git a/archive-zip.c b/archive-zip.c index 4bde019..3767940 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -5,6 +5,7 @@ #include "archive.h" #include "streaming.h" #include "utf8.h" +#include "xdiff-interface.h" static int zip_date; static int zip_time; @@ -210,6 +211,7 @@ static int write_zip_entry(struct archiver_args *args, struct git_istream *stream = NULL; unsigned long flags = 0; unsigned long size; + int is_binary = -1; crc = crc32(0, NULL, 0); @@ -238,8 +240,14 @@ static int write_zip_entry(struct archiver_args *args, method = 0; attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) : (mode & 0111) ? ((mode) << 16) : 0; - if (S_ISREG(mode) && args->compression_level != 0 && size > 0) - method = 8; + if (S_ISREG(mode)) { + if (args->compression_level != 0 && size > 0) + method = 8; + if (args->text == ARCHIVE_TEXT_ALL) + is_binary = 0; + else if (args->text == ARCHIVE_TEXT_NONE) + is_binary = 1; + } if (S_ISREG(mode) && type == OBJ_BLOB && !args->convert && size > big_file_threshold) { @@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args, return error("cannot read %s", sha1_to_hex(sha1)); crc = crc32(crc, buffer, size); + if (is_binary < 0) + is_binary = buffer_is_binary(buffer, size); out = buffer; } compressed_size = (method == 0) ? size : 0; @@ -300,7 +310,6 @@ static int write_zip_entry(struct archiver_args *args, copy_le16(dirent.extra_length, ZIP_EXTRA_MTIME_SIZE); copy_le16(dirent.comment_length, 0); copy_le16(dirent.disk, 0); - copy_le16(dirent.attr1, 0); copy_le32(dirent.attr2, attr2); copy_le32(dirent.offset, zip_offset); @@ -328,6 +337,8 @@ static int write_zip_entry(struct archiver_args *args, if (readlen <= 0) break; crc = crc32(crc, buf, readlen); + if (is_binary < 0) + is_binary = buffer_is_binary(buffer, size); write_or_die(1, buf, readlen); } close_istream(stream); @@ -361,6 +372,8 @@ static int write_zip_entry(struct archiver_args *args, if (readlen <= 0) break; crc = crc32(crc, buf, readlen); + if (is_binary < 0) + is_binary = buffer_is_binary(buffer, size); zstream.next_in = buf; zstream.avail_in = readlen; @@ -405,6 +418,8 @@ static int write_zip_entry(struct archiver_args *args, free(deflated); free(buffer); + copy_le16(dirent.attr1, !is_binary); + memcpy(zip_dir + zip_dir_offset, &dirent, ZIP_DIR_HEADER_SIZE); zip_dir_offset += ZIP_DIR_HEADER_SIZE; memcpy(zip_dir + zip_dir_offset, path, pathlen); @@ -466,7 +481,7 @@ static int write_zip_archive(const struct archiver *ar, static struct archiver zip_archiver = { "zip", write_zip_archive, - ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE + ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE|ARCHIVER_TEXT_ATTRIBUTE }; void init_zip_archiver(void) diff --git a/archive.c b/archive.c index 96057ed..89bd23d 100644 --- a/archive.c +++ b/archive.c @@ -417,6 +417,7 @@ static int parse_archive_args(int argc, const char **argv, const char *remote = NULL; const char *exec = NULL; const char *output = NULL; + const char *text = NULL; int compression_level = -1; int verbose = 0; int i; @@ -442,6 +443,8 @@ static int parse_archive_args(int argc, const char **argv, OPT__COMPR_HIDDEN('7', &compression_level, 7), OPT__COMPR_HIDDEN('8', &compression_level, 8), OPT__COMPR('9', &compression_level, N_("compress better"), 9), + OPT_STRING(0, "text", &text, N_("which"), + N_("specify which files contain text")), OPT_GROUP(""), OPT_BOOL('l', "list", &list, N_("list supported archive formats")), @@ -493,6 +496,21 @@ static int parse_archive_args(int argc, const char **argv, format, compression_level); } } + args->text = ARCHIVE_TEXT_NONE; + if (text) { + if (!strcmp(text, "auto")) + args->text = ARCHIVE_TEXT_AUTO; + else if (!strcmp(text, "all")) + args->text = ARCHIVE_TEXT_ALL; + else if (!strcmp(text, "none")) + args->text = ARCHIVE_TEXT_NONE; + else + die("Unknown argument: --text=%s", text); + if (args->text != ARCHIVE_TEXT_NONE && + !((*ar)->flags & ARCHIVER_TEXT_ATTRIBUTE)) + die("Argument not supported for format '%s': --text=%s", + format, text); + } args->verbose = verbose; args->base = base; args->baselen = strlen(base); diff --git a/archive.h b/archive.h index 4a791e1..eabcd11 100644 --- a/archive.h +++ b/archive.h @@ -14,11 +14,18 @@ struct archiver_args { unsigned int verbose : 1; unsigned int worktree_attributes : 1; unsigned int convert : 1; + unsigned int text : 2; int compression_level; }; #define ARCHIVER_WANT_COMPRESSION_LEVELS 1 #define ARCHIVER_REMOTE 2 +#define ARCHIVER_TEXT_ATTRIBUTE 4 + +#define ARCHIVE_TEXT_NONE 0 +#define ARCHIVE_TEXT_ALL 1 +#define ARCHIVE_TEXT_AUTO 2 + struct archiver { const char *name; int (*write_archive)(const struct archiver *, struct archiver_args *); diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh index c929db5..4e49aad 100755 --- a/t/t5003-archive-zip.sh +++ b/t/t5003-archive-zip.sh @@ -35,12 +35,56 @@ check_zip() { " } +zip_text() { + option=$1 + zipfile=text_$1.zip + dir=text_$1 + + test_expect_success "git archive --format-zip --text=$option" " + git archive --format=zip --text=$option HEAD >$zipfile + " + + test_expect_success UNZIP " extract ZIP archive with EOL conversion" ' + (mkdir $dir && cd $dir && "$GIT_UNZIP" -a ../$zipfile) + ' +} + +check_text_converted() { + dir=text_$1 + filetype=$2 + extracted=$dir/a/$filetype + + test_expect_success " validate that $filetype files are converted" " + test_cmp_bin $extracted.cr $extracted.crlf && + test_cmp_bin $extracted.cr $extracted.lf + " +} + +check_text_verbatim() { + dir=text_$1 + filetype=$2 + extracted=$dir/a/$filetype + original=a/$filetype + + test_expect_success " validate that $filetype files are unchanged" " + test_cmp_bin $original.cr $extracted.cr && + test_cmp_bin $original.crlf $extracted.crlf && + test_cmp_bin $original.lf $extracted.lf + " +} + test_expect_success \ 'populate workdir' \ 'mkdir a && echo simple textfile >a/a && mkdir a/bin && cp /bin/sh a/bin && + printf "text\r" >a/text.cr && + printf "text\r\n" >a/text.crlf && + printf "text\n" >a/text.lf && + printf "\0\r" >a/binary.cr && + printf "\0\r\n" >a/binary.crlf && + printf "\0\n" >a/binary.lf && printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 && printf "A not substituted O" >a/substfile2 && (p=long_path_to_a_file && cd a && @@ -124,4 +168,16 @@ test_expect_success 'git archive --format=zip on large files' ' check_zip large-compressed +zip_text all +check_text_converted all text +check_text_converted all binary + +zip_text auto +check_text_converted auto text +check_text_verbatim auto binary + +zip_text none +check_text_verbatim none text +check_text_verbatim none binary + test_done -- 2.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] archive-zip: add --text parameter 2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe @ 2015-03-05 2:16 ` Junio C Hamano 2015-03-05 15:27 ` René Scharfe 2015-03-05 15:27 ` René Scharfe 2015-03-05 19:06 ` [PATCH v2] archive-zip: mark text files in archives René Scharfe 2 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2015-03-05 2:16 UTC (permalink / raw) To: René Scharfe; +Cc: luatex, git René Scharfe <l.s.r@web.de> writes: > No sign-off, yet, because I'm not sure we really need another option. > E.g. --text=all doesn't seem to be actually useful, but it was easy to > implement. Info-ZIP's zip always creates archives like --text=auto > does, so perhaps we should make that our default behavior as well? My knee-jerk reaction is "yeah, why not? what are the downsides, other than the result will not be bit-for-bit identical to the output from older Git". I am sure I am missing something as I do not regularly use this format. > @@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args, > return error("cannot read %s", > sha1_to_hex(sha1)); > crc = crc32(crc, buffer, size); > + if (is_binary < 0) > + is_binary = buffer_is_binary(buffer, size); In this codepath, do you have the path of the thing the buffer contents came from? I am wondering if consulting the attributes system is a better idea. Anything that is explicitly marked as "binary" or "-diff" is definitely binary, and anything that is not marked as "binary" is text to us for all practical purposes, no? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] archive-zip: add --text parameter 2015-03-05 2:16 ` Junio C Hamano @ 2015-03-05 15:27 ` René Scharfe 0 siblings, 0 replies; 8+ messages in thread From: René Scharfe @ 2015-03-05 15:27 UTC (permalink / raw) To: Junio C Hamano; +Cc: luatex, git Am 05.03.2015 um 03:16 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> No sign-off, yet, because I'm not sure we really need another option. >> E.g. --text=all doesn't seem to be actually useful, but it was easy to >> implement. Info-ZIP's zip always creates archives like --text=auto >> does, so perhaps we should make that our default behavior as well? > > My knee-jerk reaction is "yeah, why not? what are the downsides, > other than the result will not be bit-for-bit identical to the > output from older Git". I am sure I am missing something as I do > not regularly use this format. AFAICS there won't be any other downsides. And archive stability is harder to achieve for ZIP anyway because it depends on compression level and (more fundamentally) on libz version. >> @@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args, >> return error("cannot read %s", >> sha1_to_hex(sha1)); >> crc = crc32(crc, buffer, size); >> + if (is_binary < 0) >> + is_binary = buffer_is_binary(buffer, size); > > In this codepath, do you have the path of the thing the buffer > contents came from? I am wondering if consulting the attributes > system is a better idea. Anything that is explicitly marked as > "binary" or "-diff" is definitely binary, and anything that is not > marked as "binary" is text to us for all practical purposes, no? Yes, attributes can help, especially to allow users to correct wrong guesses of the heuristic. Offering automatic detection of binary files by default like git diff and git grep is still a good idea, I think. buffer_is_binary() doesn't add a lot of overhead since it only looks at the first few bytes of the buffer. René ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] archive-zip: add --text parameter 2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe 2015-03-05 2:16 ` Junio C Hamano @ 2015-03-05 15:27 ` René Scharfe 2015-03-05 19:06 ` [PATCH v2] archive-zip: mark text files in archives René Scharfe 2 siblings, 0 replies; 8+ messages in thread From: René Scharfe @ 2015-03-05 15:27 UTC (permalink / raw) To: luatex, git Am 04.03.2015 um 22:13 schrieb René Scharfe: > diff --git a/archive-zip.c b/archive-zip.c > index 4bde019..3767940 100644 > --- a/archive-zip.c > +++ b/archive-zip.c > @@ -5,6 +5,7 @@ > #include "archive.h" > #include "streaming.h" > #include "utf8.h" > +#include "xdiff-interface.h" > > static int zip_date; > static int zip_time; > @@ -210,6 +211,7 @@ static int write_zip_entry(struct archiver_args *args, > struct git_istream *stream = NULL; > unsigned long flags = 0; > unsigned long size; > + int is_binary = -1; > > crc = crc32(0, NULL, 0); > > @@ -238,8 +240,14 @@ static int write_zip_entry(struct archiver_args *args, > method = 0; > attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) : > (mode & 0111) ? ((mode) << 16) : 0; > - if (S_ISREG(mode) && args->compression_level != 0 && size > 0) > - method = 8; > + if (S_ISREG(mode)) { > + if (args->compression_level != 0 && size > 0) > + method = 8; > + if (args->text == ARCHIVE_TEXT_ALL) > + is_binary = 0; > + else if (args->text == ARCHIVE_TEXT_NONE) > + is_binary = 1; > + } > > if (S_ISREG(mode) && type == OBJ_BLOB && !args->convert && > size > big_file_threshold) { > @@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args, > return error("cannot read %s", > sha1_to_hex(sha1)); > crc = crc32(crc, buffer, size); > + if (is_binary < 0) > + is_binary = buffer_is_binary(buffer, size); > out = buffer; > } > compressed_size = (method == 0) ? size : 0; > @@ -300,7 +310,6 @@ static int write_zip_entry(struct archiver_args *args, > copy_le16(dirent.extra_length, ZIP_EXTRA_MTIME_SIZE); > copy_le16(dirent.comment_length, 0); > copy_le16(dirent.disk, 0); > - copy_le16(dirent.attr1, 0); > copy_le32(dirent.attr2, attr2); > copy_le32(dirent.offset, zip_offset); > > @@ -328,6 +337,8 @@ static int write_zip_entry(struct archiver_args *args, > if (readlen <= 0) > break; > crc = crc32(crc, buf, readlen); > + if (is_binary < 0) > + is_binary = buffer_is_binary(buffer, size); buffer is NULL here, so this crashes. buf and readlen have to be used instead. This code path is only used for entries that are too big to be compressed in one go. > write_or_die(1, buf, readlen); > } > close_istream(stream); > @@ -361,6 +372,8 @@ static int write_zip_entry(struct archiver_args *args, > if (readlen <= 0) > break; > crc = crc32(crc, buf, readlen); > + if (is_binary < 0) > + is_binary = buffer_is_binary(buffer, size); Same here. > zstream.next_in = buf; > zstream.avail_in = readlen; > @@ -405,6 +418,8 @@ static int write_zip_entry(struct archiver_args *args, > free(deflated); > free(buffer); > > + copy_le16(dirent.attr1, !is_binary); > + > memcpy(zip_dir + zip_dir_offset, &dirent, ZIP_DIR_HEADER_SIZE); > zip_dir_offset += ZIP_DIR_HEADER_SIZE; > memcpy(zip_dir + zip_dir_offset, path, pathlen); > @@ -466,7 +481,7 @@ static int write_zip_archive(const struct archiver *ar, > static struct archiver zip_archiver = { > "zip", > write_zip_archive, > - ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE > + ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE|ARCHIVER_TEXT_ATTRIBUTE > }; > > void init_zip_archiver(void) ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] archive-zip: mark text files in archives 2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe 2015-03-05 2:16 ` Junio C Hamano 2015-03-05 15:27 ` René Scharfe @ 2015-03-05 19:06 ` René Scharfe 2 siblings, 0 replies; 8+ messages in thread From: René Scharfe @ 2015-03-05 19:06 UTC (permalink / raw) To: Ulrike Fischer, git; +Cc: Junio C Hamano Set the text flag for ZIP archive entries that look like text files so that unzip -a can be used to perform end-of-line conversions. Info-ZIP zip does the same. Detect binary files the same way as git diff and git grep do, namely by checking for the attribute "diff" and its negation "-diff", and if none is found by falling back to checking for the presence of NUL bytes in the first few bytes of the file contents. 7-Zip, Windows' built-in ZIP functionality and Info-ZIP unzip without the switch -a are not affected by the change and still extract text files without doing any end-of-line conversions. NB: The actual end-of-line style used in the archive entries doesn't matter to unzip -a, as it converts any CR, CRLF and LF to the line end characters appropriate for the platform it is running on. Suggested-by: Ulrike Fischer <luatex@nililand.de> Signed-off-by: Rene Scharfe <l.s.r@web.de> --- archive-zip.c | 25 ++++++++++++++++++++++++- t/t5003-archive-zip.sh | 47 ++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 70 insertions(+), 2 deletions(-) diff --git a/archive-zip.c b/archive-zip.c index 4bde019..0f9e87f 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -5,6 +5,8 @@ #include "archive.h" #include "streaming.h" #include "utf8.h" +#include "userdiff.h" +#include "xdiff-interface.h" static int zip_date; static int zip_time; @@ -189,6 +191,16 @@ static int has_only_ascii(const char *s) } } +static int entry_is_binary(const char *path, const void *buffer, size_t size) +{ + struct userdiff_driver *driver = userdiff_find_by_path(path); + if (!driver) + driver = userdiff_find_by_name("default"); + if (driver->binary != -1) + return driver->binary; + return buffer_is_binary(buffer, size); +} + #define STREAM_BUFFER_SIZE (1024 * 16) static int write_zip_entry(struct archiver_args *args, @@ -210,6 +222,8 @@ static int write_zip_entry(struct archiver_args *args, struct git_istream *stream = NULL; unsigned long flags = 0; unsigned long size; + int is_binary = -1; + const char *path_without_prefix = path + args->baselen; crc = crc32(0, NULL, 0); @@ -256,6 +270,8 @@ static int write_zip_entry(struct archiver_args *args, return error("cannot read %s", sha1_to_hex(sha1)); crc = crc32(crc, buffer, size); + is_binary = entry_is_binary(path_without_prefix, + buffer, size); out = buffer; } compressed_size = (method == 0) ? size : 0; @@ -300,7 +316,6 @@ static int write_zip_entry(struct archiver_args *args, copy_le16(dirent.extra_length, ZIP_EXTRA_MTIME_SIZE); copy_le16(dirent.comment_length, 0); copy_le16(dirent.disk, 0); - copy_le16(dirent.attr1, 0); copy_le32(dirent.attr2, attr2); copy_le32(dirent.offset, zip_offset); @@ -328,6 +343,9 @@ static int write_zip_entry(struct archiver_args *args, if (readlen <= 0) break; crc = crc32(crc, buf, readlen); + if (is_binary == -1) + is_binary = entry_is_binary(path_without_prefix, + buf, readlen); write_or_die(1, buf, readlen); } close_istream(stream); @@ -361,6 +379,9 @@ static int write_zip_entry(struct archiver_args *args, if (readlen <= 0) break; crc = crc32(crc, buf, readlen); + if (is_binary == -1) + is_binary = entry_is_binary(path_without_prefix, + buf, readlen); zstream.next_in = buf; zstream.avail_in = readlen; @@ -405,6 +426,8 @@ static int write_zip_entry(struct archiver_args *args, free(deflated); free(buffer); + copy_le16(dirent.attr1, !is_binary); + memcpy(zip_dir + zip_dir_offset, &dirent, ZIP_DIR_HEADER_SIZE); zip_dir_offset += ZIP_DIR_HEADER_SIZE; memcpy(zip_dir + zip_dir_offset, path, pathlen); diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh index c929db5..14744b2 100755 --- a/t/t5003-archive-zip.sh +++ b/t/t5003-archive-zip.sh @@ -33,6 +33,37 @@ check_zip() { test_expect_success UNZIP " validate file contents" " diff -r a ${dir_with_prefix}a " + + dir=eol_$1 + dir_with_prefix=$dir/$2 + extracted=${dir_with_prefix}a + original=a + + test_expect_success UNZIP " extract ZIP archive with EOL conversion" ' + (mkdir $dir && cd $dir && "$GIT_UNZIP" -a ../$zipfile) + ' + + test_expect_success UNZIP " validate that text files are converted" " + test_cmp_bin $extracted/text.cr $extracted/text.crlf && + test_cmp_bin $extracted/text.cr $extracted/text.lf + " + + test_expect_success UNZIP " validate that binary files are unchanged" " + test_cmp_bin $original/binary.cr $extracted/binary.cr && + test_cmp_bin $original/binary.crlf $extracted/binary.crlf && + test_cmp_bin $original/binary.lf $extracted/binary.lf + " + + test_expect_success UNZIP " validate that diff files are converted" " + test_cmp_bin $extracted/diff.cr $extracted/diff.crlf && + test_cmp_bin $extracted/diff.cr $extracted/diff.lf + " + + test_expect_success UNZIP " validate that -diff files are unchanged" " + test_cmp_bin $original/nodiff.cr $extracted/nodiff.cr && + test_cmp_bin $original/nodiff.crlf $extracted/nodiff.crlf && + test_cmp_bin $original/nodiff.lf $extracted/nodiff.lf + " } test_expect_success \ @@ -41,6 +72,18 @@ test_expect_success \ echo simple textfile >a/a && mkdir a/bin && cp /bin/sh a/bin && + printf "text\r" >a/text.cr && + printf "text\r\n" >a/text.crlf && + printf "text\n" >a/text.lf && + printf "text\r" >a/nodiff.cr && + printf "text\r\n" >a/nodiff.crlf && + printf "text\n" >a/nodiff.lf && + printf "\0\r" >a/binary.cr && + printf "\0\r\n" >a/binary.crlf && + printf "\0\n" >a/binary.lf && + printf "\0\r" >a/diff.cr && + printf "\0\r\n" >a/diff.crlf && + printf "\0\n" >a/diff.lf && printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 && printf "A not substituted O" >a/substfile2 && (p=long_path_to_a_file && cd a && @@ -66,7 +109,9 @@ test_expect_success 'add files to repository' ' GIT_COMMITTER_DATE="2005-05-27 22:00" git commit -m initial ' -test_expect_success 'setup export-subst' ' +test_expect_success 'setup export-subst and diff attributes' ' + echo "a/nodiff.* -diff" >>.git/info/attributes && + echo "a/diff.* diff" >>.git/info/attributes && echo "substfile?" export-subst >>.git/info/attributes && git log --max-count=1 "--pretty=format:A${SUBSTFORMAT}O" HEAD \ >a/substfile1 -- 2.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-03-05 19:06 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-23 13:58 zip files created with git archive flags text files as binaries Ulrike Fischer 2015-02-23 19:30 ` René Scharfe 2015-03-04 21:13 ` René Scharfe 2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe 2015-03-05 2:16 ` Junio C Hamano 2015-03-05 15:27 ` René Scharfe 2015-03-05 15:27 ` René Scharfe 2015-03-05 19:06 ` [PATCH v2] archive-zip: mark text files in archives René Scharfe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).