git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum
@ 2008-08-27 21:48 Shawn O. Pearce
  2008-08-27 21:48 ` [JGIT PATCH 2/2] index-pack: Use fsync to ensure received pack data is on disk Shawn O. Pearce
  2008-08-28  0:09 ` [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum Nicolas Pitre
  0 siblings, 2 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2008-08-27 21:48 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

When we are processing a thin pack and making it non-thin we need
to update the header with a new object count.  That causes us to
recompute the footer checksum for the entire pack, and the only
way to do that is to re-read the data from disk.

If there was filesystem corruption in the process (e.g. a bad
disk sector, or a kernel bug) we don't want to produce a valid
pack at the end.  Instead we need to fail-fast with the error
so the user is aware of the corruption.

We now keep track of where the end of the original data is and
run two SHA-1 computations during the header-footer fixup.  If
the original data region doesn't match the original footer we
got over the network we know there was corruption and we just
cannot trust this pack file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---

  This was inspired by the data corruption thread started
  by J. Bruce Fields.  At some point in the thread Linus
  pointed out the C Git index-pack isn't as safe as it can
  be, and offered a strategy to fix it.

  We weren't as safe as we could be either in jgit, even
  though we do verify CRC codes when we re-read a delta
  base from disk.  Not all objects are used as a base and
  thus we could miss some forms of corruption.

 .../src/org/spearce/jgit/transport/IndexPack.java  |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java b/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java
index 29d99db..db9268e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java
@@ -156,6 +156,9 @@ public static IndexPack create(final Repository db, final InputStream is)
 
 	private byte[] packcsum;
 
+	/** If {@link #fixThin} this is the last byte of the original checksum. */
+	private long originalEOF;
+
 	/**
 	 * Create a new pack indexer utility.
 	 * 
@@ -398,8 +401,9 @@ private void resolveChildDeltas(final long pos, int type, byte[] data,
 	private void fixThinPack(final ProgressMonitor progress) throws IOException {
 		growEntries();
 
+		originalEOF = packOut.length() - 20;
 		final Deflater def = new Deflater(Deflater.DEFAULT_COMPRESSION, false);
-		long end = packOut.length() - 20;
+		long end = originalEOF;
 		for (final ObjectId baseId : new ArrayList<ObjectId>(baseById.keySet())) {
 			final ObjectLoader ldr = repo.openObject(baseId);
 			if (ldr == null)
@@ -453,9 +457,13 @@ private void writeWhole(final Deflater def, final int typeCode,
 	}
 
 	private void fixHeaderFooter() throws IOException {
+		final MessageDigest origDigest = Constants.newMessageDigest();
+		long origRemaining = originalEOF - 12;
+
 		packOut.seek(0);
 		if (packOut.read(buf, 0, 12) != 12)
 			throw new IOException("Cannot re-read pack header to fix count");
+		origDigest.update(buf, 0, 12);
 		NB.encodeInt32(buf, 8, entryCount);
 		packOut.seek(0);
 		packOut.write(buf, 0, 12);
@@ -466,9 +474,16 @@ private void fixHeaderFooter() throws IOException {
 			final int n = packOut.read(buf);
 			if (n < 0)
 				break;
+			if (origRemaining > 0) {
+				origDigest.update(buf, 0, (int) Math.min(n, origRemaining));
+				origRemaining -= n;
+			}
 			packDigest.update(buf, 0, n);
 		}
 
+		if (!Arrays.equals(origDigest.digest(), packcsum))
+			throw new IOException("Pack corrupted while writing to filesystem");
+
 		packcsum = packDigest.digest();
 		packOut.write(packcsum);
 	}
-- 
1.6.0.174.gd789c

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [JGIT PATCH 2/2] index-pack: Use fsync to ensure received pack data is on disk
  2008-08-27 21:48 [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum Shawn O. Pearce
@ 2008-08-27 21:48 ` Shawn O. Pearce
  2008-08-28  0:09 ` [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum Nicolas Pitre
  1 sibling, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2008-08-27 21:48 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

C Git uses fsync() to ensure that a pack file is stable on disk before
it starts to delete any other copies of the data, or before it tries
to update refs to refer to data stored in the pack.  This makes it a
lot less likely that a repository will see corruption due to refs
pointing at unavailable objects.

We now force our pack and its associated .idx file to disk before
we close the file descriptors, but after we have finished reading
and writing all of the data.  This way we can be fairly certain
a future user of the data (like a RefUpdate) always has objects.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/transport/IndexPack.java  |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java b/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java
index db9268e..7b1f7ee 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/transport/IndexPack.java
@@ -259,6 +259,8 @@ public void index(final ProgressMonitor progress) throws IOException {
 						fixThinPack(progress);
 					}
 				}
+				if (packOut != null)
+					packOut.getChannel().force(true);
 
 				packDigest = null;
 				baseById = null;
@@ -510,6 +512,7 @@ private void writeIdx() throws IOException {
 			else
 				iw = PackIndexWriter.createVersion(os, outputVersion);
 			iw.write(list, packcsum);
+			os.getChannel().force(true);
 		} finally {
 			os.close();
 		}
-- 
1.6.0.174.gd789c

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum
  2008-08-27 21:48 [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum Shawn O. Pearce
  2008-08-27 21:48 ` [JGIT PATCH 2/2] index-pack: Use fsync to ensure received pack data is on disk Shawn O. Pearce
@ 2008-08-28  0:09 ` Nicolas Pitre
  2008-08-28  0:14   ` Shawn O. Pearce
  1 sibling, 1 reply; 5+ messages in thread
From: Nicolas Pitre @ 2008-08-28  0:09 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Robin Rosenberg, git

On Wed, 27 Aug 2008, Shawn O. Pearce wrote:

> When we are processing a thin pack and making it non-thin we need
> to update the header with a new object count.  That causes us to
> recompute the footer checksum for the entire pack, and the only
> way to do that is to re-read the data from disk.
> 
> If there was filesystem corruption in the process (e.g. a bad
> disk sector, or a kernel bug) we don't want to produce a valid
> pack at the end.  Instead we need to fail-fast with the error
> so the user is aware of the corruption.
> 
> We now keep track of where the end of the original data is and
> run two SHA-1 computations during the header-footer fixup.  If
> the original data region doesn't match the original footer we
> got over the network we know there was corruption and we just
> cannot trust this pack file.
> 
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
> ---
> 
>   This was inspired by the data corruption thread started
>   by J. Bruce Fields.  At some point in the thread Linus
>   pointed out the C Git index-pack isn't as safe as it can
>   be, and offered a strategy to fix it.

I thought it was Nicolas Pitre who offered that strategy?
Maybe I was mistaken.  ;-)


Nicolas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum
  2008-08-28  0:09 ` [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum Nicolas Pitre
@ 2008-08-28  0:14   ` Shawn O. Pearce
  2008-08-28  0:26     ` Nicolas Pitre
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn O. Pearce @ 2008-08-28  0:14 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Robin Rosenberg, git

Nicolas Pitre <nico@cam.org> wrote:
> On Wed, 27 Aug 2008, Shawn O. Pearce wrote:
> > 
> >   This was inspired by the data corruption thread started
> >   by J. Bruce Fields.  At some point in the thread Linus
> >   pointed out the C Git index-pack isn't as safe as it can
> >   be, and offered a strategy to fix it.
> 
> I thought it was Nicolas Pitre who offered that strategy?
> Maybe I was mistaken.  ;-)

Sorry about the bad attribution.  Going back through the
archives does say it was you that had the brilliant idea.

Heh.  Well at least my fuzzy memory of recent history is
only on the mailing list archives and not in a commit.

;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum
  2008-08-28  0:14   ` Shawn O. Pearce
@ 2008-08-28  0:26     ` Nicolas Pitre
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolas Pitre @ 2008-08-28  0:26 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Robin Rosenberg, git

On Wed, 27 Aug 2008, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@cam.org> wrote:
> > On Wed, 27 Aug 2008, Shawn O. Pearce wrote:
> > > 
> > >   This was inspired by the data corruption thread started
> > >   by J. Bruce Fields.  At some point in the thread Linus
> > >   pointed out the C Git index-pack isn't as safe as it can
> > >   be, and offered a strategy to fix it.
> > 
> > I thought it was Nicolas Pitre who offered that strategy?
> > Maybe I was mistaken.  ;-)
> 
> Sorry about the bad attribution.  Going back through the
> archives does say it was you that had the brilliant idea.

Thanks.  Now my ego is fully restored.  ;-)


Nicolas

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-08-28  0:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-27 21:48 [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum Shawn O. Pearce
2008-08-27 21:48 ` [JGIT PATCH 2/2] index-pack: Use fsync to ensure received pack data is on disk Shawn O. Pearce
2008-08-28  0:09 ` [JGIT PATCH 1/2] index-pack: Avoid disk corruption yielding a valid pack footer checksum Nicolas Pitre
2008-08-28  0:14   ` Shawn O. Pearce
2008-08-28  0:26     ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).