git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Stefan Näwe" <stefan.naewe@atlas-elektronik.com>
Cc: Git list <git@vger.kernel.org>
Subject: Re: git gc gives "error: Could not read..."
Date: Mon, 1 Jun 2015 04:14:50 -0400	[thread overview]
Message-ID: <20150601081450.GA32634@peff.net> (raw)
In-Reply-To: <556C0BAD.80106@atlas-elektronik.com>

On Mon, Jun 01, 2015 at 09:37:17AM +0200, Stefan Näwe wrote:

> One of my repos started giving an error on 'git gc' recently:
> 
>  $ git gc
>  error: Could not read 7713c3b1e9ea2dd9126244697389e4000bb39d85
>  Counting objects: 3052, done.
>  Delta compression using up to 4 threads.
>  Compressing objects: 100% (531/531), done.
>  Writing objects: 100% (3052/3052), done.
>  Total 3052 (delta 2504), reused 3052 (delta 2504)
>  error: Could not read 7713c3b1e9ea2dd9126244697389e4000bb39d85

The only error string that matches that is the one in parse_commit(),
when we fail to read the object. It happens twice here because
`git gc` runs several subcommands; you can see which ones are generating
the error if you run with GIT_TRACE=1.

I am surprised that it doesn't cause the commands to abort, though. If
we are traversing the object graph to repack, for example, we would want
to abort if we are missing a reachable object (i.e., the repository is
corrupt).

> I tried:
> 
>  $ git cat-file -t 7713c3b1e9ea2dd9126244
>  fatal: Not a valid object name 7713c3b1e9ea2dd9126244

Not surprising, if we don't have the object. What is curious is why git
wants to look it up in the first place. I.e., who is referencing it?

Either:

  1. It is an object that we are OK to be missing (e.g., the
     UNINTERESTING side of a traversal), and the error should be
     suppressed.

  2. Your repository really is corrupted, and this is a case where we
     need to be paying attention to the return value of parse_commit but
     are not.

I'd love to see:

  - the output of "GIT_TRACE=1 git gc" (to see which subcommand is
    causing the error)

  - the output of "git fsck" (which should hopefully confirm whether or
    not there is a real problem)

  - any mentions of the sha1 in the refs or reflogs. Something like:

      sha1=7713c3b1e9ea2dd9126244697389e4000bb39d85
      cd .git
      grep $sha1 $(find packed-refs refs logs -type f)

  - If that doesn't turn up any hits, then presumably it's an object
    referencing the sha1. We can dig into the objects (all of them, not
    just reachable ones), like:

      {
        # loose objects
        (cd .git/objects && find ?? -type f | tr -d /)
        # packed objects
        for i in .git/objects/pack/*.idx; do
          git show-index <$i
        done | cut -d' ' -f2
      } |
      # omit blobs; they are expensive to access and cannot have
      # reachability pointers
      git cat-file --batch-check='%(objecttype) %(objectname)' |
      grep -v ^blob |
      cut -d' ' -f2 |
      # now get all of the contents, and look for our object; this is
      # going to be slow, since it's one process per object; but we
      # can't use --batch because we need to pretty-print the trees
      xargs -n1 git cat-file -p |
      less +/$sha1

I would have guessed this was maybe caused by trying to traverse
unreachable recent objects for reachability. It fits case 1 (it is OK
for us to be missing these objects, but we might accidentally complain),
and it would probably happen twice during a gc (once for the repack, and
once for `git prune`).

But that code should not be present in older versions of msysgit, as it
came in v2.2.0 (and I assume "older msysgit is v1.9.5). And if that is
the problem, it would follow a copy of the repo, but not a clone (though
I guess if your clone was on the local filesystem, we blindly hardlink
the objects, so it might follow there).

-Peff

  reply	other threads:[~2015-06-01  8:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-01  7:37 git gc gives "error: Could not read..." Stefan Näwe
2015-06-01  8:14 ` Jeff King [this message]
2015-06-01  8:40   ` Stefan Näwe
2015-06-01  8:52     ` Jeff King
2015-06-01  9:14       ` Stefan Näwe
2015-06-01  9:58         ` Jeff King
2015-06-01 10:08           ` Stefan Näwe
2015-06-01 10:22             ` Jeff King
2015-06-01  9:54       ` [RFC/PATCH 0/3] silence missing-link warnings in some cases Jeff King
2015-06-01  9:56         ` [PATCH 1/3] add quieter versions of parse_{tree,commit} Jeff King
2015-06-01  9:56         ` [PATCH 2/3] silence broken link warnings with revs->ignore_missing_links Jeff King
2015-06-01  9:56         ` [PATCH 3/3] suppress errors on missing UNINTERESTING links Jeff King
2015-06-01 15:03         ` [RFC/PATCH 0/3] silence missing-link warnings in some cases Junio C Hamano
2015-06-01 15:41           ` Jeff King
2015-06-01 16:11             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150601081450.GA32634@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=stefan.naewe@atlas-elektronik.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).