All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Date: Tue, 21 Jun 2016 00:54:11 +0200	[thread overview]
Message-ID: <57687413.3030609@web.de> (raw)
In-Reply-To: <20160616043733.GA18323@sigill.intra.peff.net>

Am 16.06.2016 um 06:37 schrieb Jeff King:
> The ustar format has a fixed-length field for the size of
> each file entry which is supposed to contain up to 11 bytes
> of octal-formatted data plus a NUL or space terminator.
>
> These means that the largest size we can represent is
> 077777777777, or 1 byte short of 8GB. The correct solution
> for a larger file, according to POSIX.1-2001, is to add an
> extended pax header, similar to how we handle long
> filenames. This patch does that, and writes zero for the
> size field in the ustar header (the last bit is not
> mentioned by POSIX, but it matches how GNU tar behaves with
> --format=pax).
>
> This should be a strict improvement over the current
> behavior, which is to die in xsnprintf with a "BUG".
> However, there's some interesting history here.
>
> Prior to f2f0267 (archive-tar: use xsnprintf for trivial
> formatting, 2015-09-24), we silently overflowed the "size"
> field. The extra bytes ended up in the "mtime" field of the
> header, which was then immediately written itself,
> overwriting our extra bytes. What that means depends on how
> many bytes we wrote.
>
> If the size was 64GB or greater, then we actually overflowed
> digits into the mtime field, meaning our value was was
> effectively right-shifted by those lost octal digits. And
> this patch is again a strict improvement over that.
>
> But if the size was between 8GB and 64GB, then our 12-byte
> field held all of the actual digits, and only our NUL
> terminator overflowed. According to POSIX, there should be a
> NUL or space at the end of the field. However, GNU tar seems
> to be lenient here, and will correctly parse a size up 64GB
> (minus one) from the field. So sizes in this range might
> have just worked, depending on the implementation reading
> the tarfile.
>
> This patch is mostly still an improvement there, as the 8GB
> limit is specifically mentioned in POSIX as the correct
> limit. But it's possible that it could be a regression
> (versus the pre-f2f0267 state) if all of the following are
> true:
>
>    1. You have a file between 8GB and 64GB.
>
>    2. Your tar implementation _doesn't_ know about pax
>       extended headers.
>
>    3. Your tar implementation _does_ parse 12-byte sizes from
>       the ustar header without a delimiter.
>
> It's probably not worth worrying about such an obscure set
> of conditions, but I'm documenting it here just in case.
>
> There's no test included here. I did confirm that this works
> (using GNU tar) with:
>
>    $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge
>    $ git add huge
>    $ git commit -q -m foo
>    $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null
>    -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge
>
> Pre-f2f0267, this would yield a bogus size of 8GB, and
> post-f2f0267, git-archive simply dies.
>
> Unfortunately, it's quite an expensive test to run. For one
> thing, unless your filesystem supports files with holes, it
> takes 64GB of disk space (you might think piping straight to
> `hash-object --stdin` would be better, but it's not; that
> tries to buffer all 64GB in RAM!). Furthermore, hashing and
> compressing the object takes several minutes of CPU time.
>
> We could ship just the resulting compressed object data as a
> loose object, but even that takes 64MB. So sadly, this code
> path remains untested in the test suite.

If we could set the limit to a lower value than 8GB for testing then we 
could at least check if the extended header is written, e.g. if 
ustar_size() could be convinced to return 0 every time using a hidden 
command line parameter or an environment variable or something better.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>   archive-tar.c | 28 +++++++++++++++++++++++++++-
>   1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index cb99df2..7340b64 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
>   	strbuf_addch(sb, '\n');
>   }
>
> +/*
> + * Like strbuf_append_ext_header, but for numeric values.
> + */
> +static void strbuf_append_ext_header_uint(struct strbuf *sb,
> +					  const char *keyword,
> +					  uintmax_t value)
> +{
> +	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
> +	int len;
> +
> +	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
> +	strbuf_append_ext_header(sb, keyword, buf, len);
> +}
> +
>   static unsigned int ustar_header_chksum(const struct ustar_header *header)
>   {
>   	const unsigned char *p = (const unsigned char *)header;
> @@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
>   	return i;
>   }
>
> +static inline unsigned long ustar_size(uintmax_t size)
> +{
> +	if (size < 077777777777UL)

Shouldn't that be less-or-equal?

> +		return size;
> +	else
> +		return 0;
> +}
> +
>   static void prepare_header(struct archiver_args *args,
>   			   struct ustar_header *header,
>   			   unsigned int mode, unsigned long size)
>   {
>   	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
> -	xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0);
> +	xsnprintf(header->size, sizeof(header->size), "%011lo",
> +		  S_ISREG(mode) ? ustar_size(size) : 0);
>   	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
>
>   	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
> @@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args,
>   			memcpy(header.linkname, buffer, size);
>   	}
>
> +	if (ustar_size(size) != size)
> +		strbuf_append_ext_header_uint(&ext_header, "size", size);

It needs "S_ISREG(mode) && " as well, no?  In practice it probably 
doesn't matter (until someone stores a 8GB long symlink target), but the 
size field should only be set for regular files.

> +
>   	prepare_header(args, &header, mode, size);
>
>   	if (ext_header.len > 0) {
>

  reply	other threads:[~2016-06-20 22:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16  4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King
2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-20 22:54   ` René Scharfe [this message]
2016-06-21 15:59     ` Jeff King
2016-06-21 16:02       ` Jeff King
2016-06-21 20:42       ` René Scharfe
2016-06-21 20:57         ` René Scharfe
2016-06-21 21:04           ` Jeff King
2016-06-22  5:46             ` René Scharfe
2016-06-21 21:02         ` Jeff King
2016-06-22  5:46           ` René Scharfe
2016-06-23 19:21             ` Jeff King
2016-06-21 20:54       ` René Scharfe
2016-06-21 19:44   ` Robin H. Johnson
2016-06-21 20:57     ` Jeff King
2016-06-16  4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-20 22:54   ` René Scharfe
2016-06-22  5:46     ` René Scharfe
2016-06-23 19:22       ` Jeff King
2016-06-23 21:38         ` René Scharfe
2016-06-23 21:39           ` Jeff King
2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
2016-06-21 16:16 ` Jeff King
2016-06-21 16:16   ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-21 16:17   ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-21 18:43   ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
2016-06-23 23:15   ` [PATCH v3] " Jeff King
2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
2016-06-23 23:31       ` Jeff King
2016-06-24 16:38       ` Johannes Sixt
2016-06-24 16:46         ` Jeff King
2016-06-24 17:05           ` Johannes Sixt
2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
2016-06-24 19:43               ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King
2016-06-24 20:52                 ` Johannes Sixt
2016-06-24 21:05                   ` Jeff King
2016-06-24 21:32                     ` Johannes Sixt
2016-06-24 19:44               ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King
2016-06-24 19:45               ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King
2016-06-24 19:45               ` [PATCH 4/4] t/lib-git-daemon: " Jeff King
2016-06-24 19:48               ` [PATCH 0/4] portable signal-checking in tests Jeff King
2016-06-24 18:56       ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano
2016-06-24 19:07         ` Jeff King
2016-06-24 19:44           ` Junio C Hamano
2016-06-24 20:58           ` Eric Sunshine
2016-06-24 21:09             ` Jeff King
2016-06-24 20:58           ` Jeff King
2016-06-24 22:41             ` Junio C Hamano
2016-06-24 23:22               ` Jeff King
2016-06-23 23:21     ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-24 19:01       ` Junio C Hamano
2016-06-24 19:10         ` Jeff King
2016-06-24 19:45           ` Junio C Hamano
2016-06-24 19:46             ` Jeff King
2016-06-23 23:21     ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-24 19:06       ` Junio C Hamano
2016-06-24 19:16         ` Jeff King
2016-06-23 23:21     ` [PATCH v3 4/4] archive-tar: drop return value Jeff King
2016-06-24 11:49       ` Remi Galan Alfonso
2016-06-24 13:13         ` Jeff King
2016-06-24 19:10           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57687413.3030609@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.