git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Nicolas Pitre <nico@cam.org>, Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Wed, 13 Feb 2008 08:39:59 +0100	[thread overview]
Message-ID: <20080213073959.GA27158@auto.tuwien.ac.at> (raw)
In-Reply-To: <20080213062015.GF24004@spearce.org>

On Wed, Feb 13, 2008 at 01:20:15AM -0500, Shawn O. Pearce wrote:
>   tree.c:
> 
>     This is inside track_tree_refs and its a file mode we do
>     not recognize.  We spit out a warning, but try to lookup
>     the unknown object anyway, even though we're looking at what
>     should be a bogus tree.  This occurs only in parse_tree_buffer,
>     only if track_objects_refs is on, and only if we see a tree
>     that is actually not understood by this version of Git.  Hmm,
>     shouldn't happen unless the tree itself is corrupt.
> 
>     This code looked fishy enough to me to dig up the blame history
>     for it:
> 
>       commit e2ac7cb5fbcf1407003aa07cdcd14141527ea2e3
>       Author: Sam Vilain <sam.vilain@catalyst.net.nz>
>       Date: Wed Jun  6 06:25:17 2007
> 
>       Don't assume tree entries that are not dirs are blobs
> 
>       When scanning the trees in track_tree_refs() there is a "lazy" test
>       that assumes that entries are either directories or files.  Don't do
>       that.
> 
>     Sounds like it was around the time of S_ISGITLINK being
>     introduced.  Looking at this code again I have to wonder, why
>     the hell are we looking up and tracking an object inside of a
>     tree when we don't understand the mode?
> 
>     Lets say a new form of S_ISGITLINK gets introduced in the future.
>     By this I mean the mode says "hey, this SHA-1 isn't really in my
>     object pool".  We're going to cram a dummy object into the refs
>     table for this tree.  Why?  We don't do this for S_ISGITLINK.
> 
>     Lets say its a new object type (not tree/tag/commit/blob) but
>     it is in our storage pool.  If this Git doesn't know the mode,
>     it sure as heck won't know what the hell the loose object header
>     (or pack header!) says about that object.
> 
>     One of the key places where you might expect tracking an unknown
>     (but referenced by a tree) SHA-1 type would be in reachability,
>     rev-list --objects, packfile generation.  But the process_tree()
>     function in reachable.c doesn't have Sam's change above, so it
>     will assume anything new looking is a blob.
>     
>     Oh, and what a rabbit hole I just fell down.  The only caller
>     that seems to set "track_object_refs = 1" (and thus get into this
>     odd lookup_unknown_object() call) is fsck.  Everyone else sets
>     it to 0, including its default declaration.
> 
>     So we've got this nice baggage, and differing implementation,
>     so fsck can be happy how?  We've also got a whole lot of apps
>     setting "track_object_refs = 0", which is what it defaults to,
>     unless you managed to run fsck first.  Hmmph.
> 
>     Is it just me or is track_object_refs perhaps outlived its
>     usefulness?

See the patch at the start of this topic. I'm working on replacing
track-object-refs.

It is missing some frees and I copied the lookup_unknown_object call
(This should be replaced with an error message in my patch, as the the
verification of the tree content would reject such an invalid mode).

mfg Martin Kögler

  reply	other threads:[~2008-02-13  7:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11  0:00 ` Junio C Hamano
2008-02-11  0:33   ` Nicolas Pitre
2008-02-11 19:56     ` Martin Koegler
2008-02-11 20:41       ` Nicolas Pitre
2008-02-11 21:58         ` Martin Koegler
2008-02-12 16:02           ` Nicolas Pitre
2008-02-12 19:04             ` Martin Koegler
2008-02-12 20:22               ` Nicolas Pitre
2008-02-12 21:38                 ` Martin Koegler
2008-02-12 21:51                   ` Nicolas Pitre
2008-02-13  6:20                     ` Shawn O. Pearce
2008-02-13  7:39                       ` Martin Koegler [this message]
2008-02-14  9:00                         ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07                           ` Martin Koegler
2008-02-13  7:42             ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13  8:11               ` Martin Koegler
2008-02-13 12:01                 ` Johannes Schindelin
2008-02-14  6:16                   ` Shawn O. Pearce
2008-02-14 19:04                   ` Martin Koegler
2008-02-15  0:06                     ` Johannes Schindelin
2008-02-15  7:18                       ` Martin Koegler
2008-02-12  7:20   ` Martin Koegler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080213073959.GA27158@auto.tuwien.ac.at \
    --to=mkoegler@auto.tuwien.ac.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).