From: Junio C Hamano <gitster@pobox.com>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
Date: Sun, 21 Feb 2010 21:30:41 -0800 [thread overview]
Message-ID: <7v635p4z26.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <alpine.LFD.2.00.1002211950250.1946@xanadu.home> (Nicolas Pitre's message of "Sun\, 21 Feb 2010 20\:35\:48 -0500 \(EST\)")
Nicolas Pitre <nico@fluxnic.net> writes:
> The whole point is to detect data incoherencyes.
Yes. We want to make sure that the SHA-1 we compute is over what we fed
deflate().
> So current sequence of events is as follows:
>
> T0 write_sha1_file_prepare() is called
> T1 start initial SHA1 computation on data buffer
> T2 in the middle of initial SHA1 computation
> T3 end of initial SHA1 computation -> object name is determined
> T4 write_loose_object() is called
> ... enter the write loop
> T5+n deflate() called on buffer n
> T6+n git_SHA1_Update(() called on the same buffer n
> T7+n deflated data written out
> ...
> Tend abort if result of T6+n doesn't match object name from T3
>
> So... what can happen:
>
> 1) Data is externally modified before T5+n: deflated data and its CRC32
> will be coherent with the SHA1 computed in T6+n, but incoherent with
> the SHA1 used for the object name. Wrong data is written to the
> object even if it will inflate OK. We really want to prevent that
> from happening. The test in Tend will fail.
>
> 2) Data is externally modified between T5+n and T6+n: the deflated data
> and CRC32 will be coherent with the object name but incoherent with
> the parano_sha1. Although written data will be OK, this is way too
> close from being wrong, and the test in Tend will fail. If there is
> more than one round into the loop and the external modifications are
> large enough then this becomes the same as case 1 above.
>
> 3) Data is externally modified in T2: again the test in Tend will fail.
>
> So in all possible cases I can think of, the write will abort.
There is one pathological case.
Immediately before T5+n (or between T5+n and T6+n), the external process
changes the data deflate() is working on, but before T6+n, the external
process changes the data back. Two SHA-1's computed may match, but it is
not a hash over what was deflated(); you won't be able to abort.
next prev parent reply other threads:[~2010-02-22 5:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-21 4:27 [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects Nicolas Pitre
2010-02-21 19:45 ` Junio C Hamano
2010-02-21 21:26 ` Nicolas Pitre
2010-02-21 22:22 ` Junio C Hamano
2010-02-21 22:30 ` Junio C Hamano
2010-02-22 1:35 ` Nicolas Pitre
2010-02-22 5:30 ` Junio C Hamano [this message]
2010-02-22 5:50 ` Nicolas Pitre
2010-02-22 6:17 ` Junio C Hamano
2010-02-22 6:31 ` Junio C Hamano
2010-02-22 17:36 ` Nicolas Pitre
2010-02-22 19:55 ` Junio C Hamano
2010-02-22 6:27 ` Dmitry Potapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7v635p4z26.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=nico@fluxnic.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).