git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fact-import: failed to apply delta
@ 2009-02-10  3:26 Daniel Barkalow
  2009-02-10 10:28 ` Johannes Schindelin
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-10  3:26 UTC (permalink / raw)
  To: git

I'm getting a "fatal: failed to apply delta" from fast-import. I'm using a 
lot of checkpoints, and I haven't had it happen without making progress, 
so I was eventually able to import what I was importing (bunch of stuff I 
can't distribute, imported from perforce with the latest version of my p4 
importer that I'm still working on). Also, everything that was saved by 
the checkpoints was valid and correct (at least after the fact).

I'm going to see if it's reproducable, and, if so, if I can get a test 
case that I can distribute, but I wanted to post to see if anyone had any 
special debugging advice for this error message and program combination.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10  3:26 fact-import: failed to apply delta Daniel Barkalow
@ 2009-02-10 10:28 ` Johannes Schindelin
  2009-02-10 15:56   ` Shawn O. Pearce
  0 siblings, 1 reply; 24+ messages in thread
From: Johannes Schindelin @ 2009-02-10 10:28 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Hi,

[nice typo in the subject BTW]

On Mon, 9 Feb 2009, Daniel Barkalow wrote:

> I'm getting a "fatal: failed to apply delta" from fast-import. I'm using 
> a lot of checkpoints, and I haven't had it happen without making 
> progress, so I was eventually able to import what I was importing (bunch 
> of stuff I can't distribute, imported from perforce with the latest 
> version of my p4 importer that I'm still working on). Also, everything 
> that was saved by the checkpoints was valid and correct (at least after 
> the fact).
> 
> I'm going to see if it's reproducable, and, if so, if I can get a test 
> case that I can distribute, but I wanted to post to see if anyone had 
> any special debugging advice for this error message and program 
> combination.

I see three likely candidates: two in index-pack.c and one in sha1_file.c.  
My advice: instrument the code (IOW litter the code with debug output that 
tells you where it did what), and then run it on the same test case you 
had the problems.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 10:28 ` Johannes Schindelin
@ 2009-02-10 15:56   ` Shawn O. Pearce
  2009-02-10 17:15     ` Daniel Barkalow
  0 siblings, 1 reply; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 15:56 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Johannes Schindelin, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Mon, 9 Feb 2009, Daniel Barkalow wrote:
> 
> > I'm getting a "fatal: failed to apply delta" from fast-import. I'm using 
> > a lot of checkpoints, and I haven't had it happen without making 
> > progress, so I was eventually able to import what I was importing (bunch 
> > of stuff I can't distribute, imported from perforce with the latest 
> > version of my p4 importer that I'm still working on). Also, everything 
> > that was saved by the checkpoints was valid and correct (at least after 
> > the fact).
> > 
> > I'm going to see if it's reproducable, and, if so, if I can get a test 
> > case that I can distribute, but I wanted to post to see if anyone had 
> > any special debugging advice for this error message and program 
> > combination.
> 
> I see three likely candidates: two in index-pack.c and one in sha1_file.c.  

It has to be the one in sha1_file.c.  fast-import never calls into the
code in index-pack.c.

> My advice: instrument the code (IOW litter the code with debug output that 
> tells you where it did what), and then run it on the same test case you 
> had the problems.

My initial guess is, we're probably having trouble reading from
the pack we are writing.  This shouldn't be possible; fast-import
uses an index of all objects in memory to locate the byte offset
and then reads from that block.

I wonder if this is a write(2) vs. mmap(2) inconsistency in the
VM system of your OS.  fast-import plays some games here where
it is actively writing into areas that might have been mmap'd.
We should be closing the existing mmap's and opening them again
when we switch from writing to reading (see gfi_unpack_entry),
but if that's failing for some reason then we might be trying to
read a VM page which isn't correctly initialized, and the delta
isn't valid, and we can't inflate it.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 15:56   ` Shawn O. Pearce
@ 2009-02-10 17:15     ` Daniel Barkalow
  2009-02-10 17:22       ` Shawn O. Pearce
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-10 17:15 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, git

On Tue, 10 Feb 2009, Shawn O. Pearce wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > On Mon, 9 Feb 2009, Daniel Barkalow wrote:
> > 
> > > I'm getting a "fatal: failed to apply delta" from fast-import. I'm using 
> > > a lot of checkpoints, and I haven't had it happen without making 
> > > progress, so I was eventually able to import what I was importing (bunch 
> > > of stuff I can't distribute, imported from perforce with the latest 
> > > version of my p4 importer that I'm still working on). Also, everything 
> > > that was saved by the checkpoints was valid and correct (at least after 
> > > the fact).
> > > 
> > > I'm going to see if it's reproducable, and, if so, if I can get a test 
> > > case that I can distribute, but I wanted to post to see if anyone had 
> > > any special debugging advice for this error message and program 
> > > combination.
> > 
> > I see three likely candidates: two in index-pack.c and one in sha1_file.c.  
> 
> It has to be the one in sha1_file.c.  fast-import never calls into the
> code in index-pack.c.

Yup, it's that one. I added some sizes to the message and got them.

> > My advice: instrument the code (IOW litter the code with debug output that 
> > tells you where it did what), and then run it on the same test case you 
> > had the problems.
> 
> My initial guess is, we're probably having trouble reading from
> the pack we are writing.  This shouldn't be possible; fast-import
> uses an index of all objects in memory to locate the byte offset
> and then reads from that block.
>
> I wonder if this is a write(2) vs. mmap(2) inconsistency in the
> VM system of your OS. 

This is Linux 2.6.22 (ubuntu generic x86), so I don't think it does 
anything we don't expect.

> fast-import plays some games here where it is actively writing into 
> areas that might have been mmap'd.  We should be closing the existing 
> mmap's and opening them again when we switch from writing to reading 
> (see gfi_unpack_entry), but if that's failing for some reason then we 
> might be trying to read a VM page which isn't correctly initialized, and 
> the delta isn't valid, and we can't inflate it.

I wonder if we're somehow getting the wrong object. I'm getting a 
base_size of 584 but the object pointed to is a tree of size 1882. This 
seems to me like correctly initialized memory that just isn't what we 
wanted.

Is there some way to see if the pack it was writing is actually corrupt 
(beyond not having the hash set)?

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 17:15     ` Daniel Barkalow
@ 2009-02-10 17:22       ` Shawn O. Pearce
  2009-02-10 17:47         ` Daniel Barkalow
  0 siblings, 1 reply; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 17:22 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Johannes Schindelin, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> 
> Is there some way to see if the pack it was writing is actually corrupt 
> (beyond not having the hash set)?

Hmm.  If you have the pack fragment, its going to take some editing
to get it through the existing validation tools.

First you need to know how many objects are in the pack just so you
can update the object count, which is a 4 byte network byte order
field starting at offset 8.  Then you need the SHA-1 checksum of
the entire pack appended onto the end, as the last 20 bytes.

Once the pack is "closed" (by applying those fixes), you can run it
through both index-pack and verify-pack.  They pick up most errors,
especially a delta apply sort of error.  They won't find corrupt
tree modes though, or object connectivity errors.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 17:22       ` Shawn O. Pearce
@ 2009-02-10 17:47         ` Daniel Barkalow
  2009-02-10 19:12           ` Shawn O. Pearce
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-10 17:47 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, git

On Tue, 10 Feb 2009, Shawn O. Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > 
> > Is there some way to see if the pack it was writing is actually corrupt 
> > (beyond not having the hash set)?
> 
> Hmm.  If you have the pack fragment, its going to take some editing
> to get it through the existing validation tools.

Actually, I went for the other end; I made close_pack_windows() not mind 
the open windows (hey, it's dying anyway in my case, nobody's going to 
write more), and the results passed verification and "git fsck --full" 
with just a few dangling blobs and a dangling commit. So it seems to me 
that it has to be wrong information in memory.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 17:47         ` Daniel Barkalow
@ 2009-02-10 19:12           ` Shawn O. Pearce
  2009-02-10 20:03             ` Daniel Barkalow
  0 siblings, 1 reply; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 19:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Johannes Schindelin, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> 
> Actually, I went for the other end; I made close_pack_windows() not mind 
> the open windows (hey, it's dying anyway in my case, nobody's going to 
> write more), and the results passed verification and "git fsck --full" 
> with just a few dangling blobs and a dangling commit. So it seems to me 
> that it has to be wrong information in memory.

Like the wrong offset within the pack for the object start?

Can you compare the offsets you are getting during
unpack_delta_entry() against what verify-pack -v
shows for the same file?  They should agree, unless
we're somehow wrong in memory within fast-import.

But then, the output pack-*.idx file created when
fast-import closed the pack would be wrong too.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 19:12           ` Shawn O. Pearce
@ 2009-02-10 20:03             ` Daniel Barkalow
  2009-02-10 20:12               ` Shawn O. Pearce
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-10 20:03 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, git

On Tue, 10 Feb 2009, Shawn O. Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > 
> > Actually, I went for the other end; I made close_pack_windows() not mind 
> > the open windows (hey, it's dying anyway in my case, nobody's going to 
> > write more), and the results passed verification and "git fsck --full" 
> > with just a few dangling blobs and a dangling commit. So it seems to me 
> > that it has to be wrong information in memory.
> 
> Like the wrong offset within the pack for the object start?
> 
> Can you compare the offsets you are getting during
> unpack_delta_entry() against what verify-pack -v
> shows for the same file?  They should agree, unless
> we're somehow wrong in memory within fast-import.

Is there some easy way to tell what object it was having problems with 
when it failed to unpack? I've got a whole lot of objects.

On the other hand, there's something interesting:

The expected size of the base is 1882, while the actual size is 151. The 
base offset it found was 12.

I'm using "checkpoint" a lot, so I've got 24 packs. Two of them have tree 
objects of size 1882 at offset 12; a different one has a tree object of 
size 151 at offset 12. The one with the object of size 151 was the one 
that was still open at the end. There's no tree of size 1882 in this pack, 
nor in any other pack that has a tree of size 151.

So maybe it's right about the offsets and all, but it's confused about 
which pack something was in? Maybe it cached something when the pack 
containing the object it wants was open, and it ended up thinking it was 
in the pack that's now open rather than the pack that was open and is now 
closed?

I don't suppose there would be an easy way to figure out the object it was 
trying to unpack by applying the delta?

> But then, the output pack-*.idx file created when
> fast-import closed the pack would be wrong too.

I think the wrong info it has is about the contents of a pack that had 
been closed previously. I think all of the info about objects in the open 
pack is correct.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 20:03             ` Daniel Barkalow
@ 2009-02-10 20:12               ` Shawn O. Pearce
  2009-02-10 21:19                 ` Daniel Barkalow
  0 siblings, 1 reply; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 20:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Johannes Schindelin, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> 
> Is there some easy way to tell what object it was having problems with 
> when it failed to unpack? I've got a whole lot of objects.

Can you use gdb to find it?  If so, walk up the stack into
fast-import.c's load_tree() function and look at sha1 here,
and also, *myoe.
 
> The expected size of the base is 1882, while the actual size is 151. The 
> base offset it found was 12.
> 
> I'm using "checkpoint" a lot, so I've got 24 packs. Two of them have tree 
> objects of size 1882 at offset 12; a different one has a tree object of 
> size 151 at offset 12. The one with the object of size 151 was the one 
> that was still open at the end. There's no tree of size 1882 in this pack, 
> nor in any other pack that has a tree of size 151.
> 
> So maybe it's right about the offsets and all, but it's confused about 
> which pack something was in? Maybe it cached something when the pack 
> containing the object it wants was open, and it ended up thinking it was 
> in the pack that's now open rather than the pack that was open and is now 
> closed?

fast-import keeps all of its object data in a single table of
"struct object_entry", the table is keyed by SHA-1.  Each entry
has a pack_id, which tells it which pack this object is in, and
the offset of the object within that pack.

Sounds like maybe its confusing the pack pointer in the all_packs
array (see gfi_unpack_entry).
 
> I don't suppose there would be an easy way to figure out the object it was 
> trying to unpack by applying the delta?

Not really.  You'd have to write code for this.  Or, since the pack
closes and you can index it, use "verify-pack -v" to find the object
starting at the offset you know its having trouble with, that should
tell you the object's SHA-1.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 20:12               ` Shawn O. Pearce
@ 2009-02-10 21:19                 ` Daniel Barkalow
  2009-02-10 21:25                   ` Shawn O. Pearce
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-10 21:19 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, git

On Tue, 10 Feb 2009, Shawn O. Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > 
> > Is there some easy way to tell what object it was having problems with 
> > when it failed to unpack? I've got a whole lot of objects.
> 
> Can you use gdb to find it?  If so, walk up the stack into
> fast-import.c's load_tree() function and look at sha1 here,
> and also, *myoe.

Okay, sha1 is 961a2199..., which is correct as a delta against the tree of 
size 151 at offset 12 of its own pack; the bogus thing seems to be the 
base_size (and presumably base).

> > The expected size of the base is 1882, while the actual size is 151. The 
> > base offset it found was 12.
> > 
> > I'm using "checkpoint" a lot, so I've got 24 packs. Two of them have tree 
> > objects of size 1882 at offset 12; a different one has a tree object of 
> > size 151 at offset 12. The one with the object of size 151 was the one 
> > that was still open at the end. There's no tree of size 1882 in this pack, 
> > nor in any other pack that has a tree of size 151.
> > 
> > So maybe it's right about the offsets and all, but it's confused about 
> > which pack something was in? Maybe it cached something when the pack 
> > containing the object it wants was open, and it ended up thinking it was 
> > in the pack that's now open rather than the pack that was open and is now 
> > closed?
> 
> fast-import keeps all of its object data in a single table of
> "struct object_entry", the table is keyed by SHA-1.  Each entry
> has a pack_id, which tells it which pack this object is in, and
> the offset of the object within that pack.
> 
> Sounds like maybe its confusing the pack pointer in the all_packs
> array (see gfi_unpack_entry).

I think maybe there's aliasing in the delta base cache? If it recycled a 
struct packed_git, the cache would come up with a cached tree at offset 12 
of the packed_git at that address, but the pack used by that struct has 
changed.

I don't see any reason that the situation couldn't arise where you start a
pack, look up an object in it while it's still open but the object isn't 
in the window, cache the delta base, end that packfile, eventually start a 
packfile that gets allocated in the space that was freed, produce a new 
delta against the object at exactly the same offset of the new pack (with 
the same address as the old pack), and go on happily until you try looking 
up this last delta and pull the wrong base out of the cache.

I don't see any code to flush the delta cache ever, but it's hard to get a 
new packed_git allocated at the address of a freed one, except by doing a 
lot of checkpoints in fast-import...

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 21:19                 ` Daniel Barkalow
@ 2009-02-10 21:25                   ` Shawn O. Pearce
  2009-02-10 21:32                     ` Daniel Barkalow
  0 siblings, 1 reply; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 21:25 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Johannes Schindelin, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> 
> I think maybe there's aliasing in the delta base cache? If it recycled a 
> struct packed_git, the cache would come up with a cached tree at offset 12 
> of the packed_git at that address, but the pack used by that struct has 
> changed.

Yup, that must be it.
 
> I don't see any reason that the situation couldn't arise where you start a
> pack, look up an object in it while it's still open but the object isn't 
> in the window, cache the delta base, end that packfile, eventually start a 
> packfile that gets allocated in the space that was freed, produce a new 
> delta against the object at exactly the same offset of the new pack (with 
> the same address as the old pack), and go on happily until you try looking 
> up this last delta and pull the wrong base out of the cache.
> 
> I don't see any code to flush the delta cache ever, but it's hard to get a 
> new packed_git allocated at the address of a freed one, except by doing a 
> lot of checkpoints in fast-import...

*ouch*.  I think you found it.

We should dump the cached_objects table in sha1_file.c during
a checkpoint in fast-import.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 21:25                   ` Shawn O. Pearce
@ 2009-02-10 21:32                     ` Daniel Barkalow
  2009-02-10 21:36                       ` Shawn O. Pearce
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-10 21:32 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, git

On Tue, 10 Feb 2009, Shawn O. Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > 
> > I think maybe there's aliasing in the delta base cache? If it recycled a 
> > struct packed_git, the cache would come up with a cached tree at offset 12 
> > of the packed_git at that address, but the pack used by that struct has 
> > changed.
> 
> Yup, that must be it.
>  
> > I don't see any reason that the situation couldn't arise where you start a
> > pack, look up an object in it while it's still open but the object isn't 
> > in the window, cache the delta base, end that packfile, eventually start a 
> > packfile that gets allocated in the space that was freed, produce a new 
> > delta against the object at exactly the same offset of the new pack (with 
> > the same address as the old pack), and go on happily until you try looking 
> > up this last delta and pull the wrong base out of the cache.
> > 
> > I don't see any code to flush the delta cache ever, but it's hard to get a 
> > new packed_git allocated at the address of a freed one, except by doing a 
> > lot of checkpoints in fast-import...
> 
> *ouch*.  I think you found it.
> 
> We should dump the cached_objects table in sha1_file.c during
> a checkpoint in fast-import.

No, that one's keyed by sha1, and doesn't get collisions; it's the 
delta_base_cache that's the issue; it's keyed by struct packed_git * and 
offset.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 21:32                     ` Daniel Barkalow
@ 2009-02-10 21:36                       ` Shawn O. Pearce
  2009-02-10 21:51                         ` Daniel Barkalow
  2009-02-10 22:30                         ` Junio C Hamano
  0 siblings, 2 replies; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 21:36 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Johannes Schindelin, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Tue, 10 Feb 2009, Shawn O. Pearce wrote:
> > 
> > We should dump the cached_objects table in sha1_file.c during
> > a checkpoint in fast-import.
> 
> No, that one's keyed by sha1, and doesn't get collisions; it's the 
> delta_base_cache that's the issue; it's keyed by struct packed_git * and 
> offset.

Uh, yea, I realize that after I sent the message.  Does this patch
fix it for you?

--8<--
Clear the delta base cache during fast-import checkpoint

Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.

Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 cache.h       |    1 +
 fast-import.c |    1 +
 sha1_file.c   |    7 +++++++
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index 8dcc53c..7f1a6e8 100644
--- a/cache.h
+++ b/cache.h
@@ -830,6 +830,7 @@ extern unsigned char* use_pack(struct packed_git *, struct pack_window **, off_t
 extern void close_pack_windows(struct packed_git *);
 extern void unuse_pack(struct pack_window **);
 extern void free_pack_by_name(const char *);
+extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *, int, int);
 extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t);
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
diff --git a/fast-import.c b/fast-import.c
index 1935206..03b13e0 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -945,6 +945,7 @@ static void end_packfile(void)
 {
 	struct packed_git *old_p = pack_data, *new_p;
 
+	clear_delta_base_cache();
 	if (object_count) {
 		char *idx_name;
 		int i;
diff --git a/sha1_file.c b/sha1_file.c
index 8868b80..d2dbc96 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1663,6 +1663,13 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
 	}
 }
 
+void clear_delta_base_cache(void)
+{
+	unsigned long p;
+	for (p = 0; p < MAX_DELTA_CACHE; p++)
+		release_delta_base_cache(&delta_base_cache[p])
+}
+
 static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	void *base, unsigned long base_size, enum object_type type)
 {
-- 
1.6.2.rc0.186.g417c

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 21:36                       ` Shawn O. Pearce
@ 2009-02-10 21:51                         ` Daniel Barkalow
  2009-02-10 22:30                         ` Junio C Hamano
  1 sibling, 0 replies; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-10 21:51 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, git

On Tue, 10 Feb 2009, Shawn O. Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > On Tue, 10 Feb 2009, Shawn O. Pearce wrote:
> > > 
> > > We should dump the cached_objects table in sha1_file.c during
> > > a checkpoint in fast-import.
> > 
> > No, that one's keyed by sha1, and doesn't get collisions; it's the 
> > delta_base_cache that's the issue; it's keyed by struct packed_git * and 
> > offset.
> 
> Uh, yea, I realize that after I sent the message.  Does this patch
> fix it for you?

Aside from the trivial typo, yes. (Although I can't be 100% sure it didn't 
just happen to change things leading to needing a different test case; I 
can say for sure that it got past the previous code's MTBF, which is a 
good sign.)

> --8<--
> Clear the delta base cache during fast-import checkpoint
> 
> Otherwise we may reuse the same memory address for a totally
> different "struct packed_git", and a previously cached object from
> the prior occupant might be returned when trying to unpack an object
> from the new pack.
> 
> Found-by: Daniel Barkalow <barkalow@iabervon.org>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
> ---
>  cache.h       |    1 +
>  fast-import.c |    1 +
>  sha1_file.c   |    7 +++++++
>  3 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index 8dcc53c..7f1a6e8 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -830,6 +830,7 @@ extern unsigned char* use_pack(struct packed_git *, struct pack_window **, off_t
>  extern void close_pack_windows(struct packed_git *);
>  extern void unuse_pack(struct pack_window **);
>  extern void free_pack_by_name(const char *);
> +extern void clear_delta_base_cache(void);
>  extern struct packed_git *add_packed_git(const char *, int, int);
>  extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t);
>  extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
> diff --git a/fast-import.c b/fast-import.c
> index 1935206..03b13e0 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -945,6 +945,7 @@ static void end_packfile(void)
>  {
>  	struct packed_git *old_p = pack_data, *new_p;
>  
> +	clear_delta_base_cache();
>  	if (object_count) {
>  		char *idx_name;
>  		int i;
> diff --git a/sha1_file.c b/sha1_file.c
> index 8868b80..d2dbc96 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1663,6 +1663,13 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
>  	}
>  }
>  
> +void clear_delta_base_cache(void)
> +{
> +	unsigned long p;
> +	for (p = 0; p < MAX_DELTA_CACHE; p++)
> +		release_delta_base_cache(&delta_base_cache[p])
> +}
> +
>  static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
>  	void *base, unsigned long base_size, enum object_type type)
>  {
> -- 
> 1.6.2.rc0.186.g417c
> 
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 21:36                       ` Shawn O. Pearce
  2009-02-10 21:51                         ` Daniel Barkalow
@ 2009-02-10 22:30                         ` Junio C Hamano
  2009-02-10 22:47                           ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2009-02-10 22:30 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Daniel Barkalow, Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
>> On Tue, 10 Feb 2009, Shawn O. Pearce wrote:
>> > 
>> > We should dump the cached_objects table in sha1_file.c during
>> > a checkpoint in fast-import.
>> 
>> No, that one's keyed by sha1, and doesn't get collisions; it's the 
>> delta_base_cache that's the issue; it's keyed by struct packed_git * and 
>> offset.
>
> Uh, yea, I realize that after I sent the message.  Does this patch
> fix it for you?
>
> --8<--
> Clear the delta base cache during fast-import checkpoint
>
> Otherwise we may reuse the same memory address for a totally
> different "struct packed_git", and a previously cached object from
> the prior occupant might be returned when trying to unpack an object
> from the new pack.

Can this be made more automatic?

For example if you do this every time a new pack is installed to
sha1_file(), like in add_packed_git() perhaps, wouldn't that be much less
error prone?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 22:30                         ` Junio C Hamano
@ 2009-02-10 22:47                           ` Junio C Hamano
  2009-02-10 23:09                             ` Shawn O. Pearce
  2009-02-11 18:09                             ` Daniel Barkalow
  0 siblings, 2 replies; 24+ messages in thread
From: Junio C Hamano @ 2009-02-10 22:47 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Daniel Barkalow, Johannes Schindelin, git

Junio C Hamano <gitster@pobox.com> writes:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
>
>> Daniel Barkalow <barkalow@iabervon.org> wrote:
>>> On Tue, 10 Feb 2009, Shawn O. Pearce wrote:
>>> > 
>>> > We should dump the cached_objects table in sha1_file.c during
>>> > a checkpoint in fast-import.
>>> 
>>> No, that one's keyed by sha1, and doesn't get collisions; it's the 
>>> delta_base_cache that's the issue; it's keyed by struct packed_git * and 
>>> offset.
>>
>> Uh, yea, I realize that after I sent the message.  Does this patch
>> fix it for you?
>>
>> --8<--
>> Clear the delta base cache during fast-import checkpoint
>>
>> Otherwise we may reuse the same memory address for a totally
>> different "struct packed_git", and a previously cached object from
>> the prior occupant might be returned when trying to unpack an object
>> from the new pack.
>
> Can this be made more automatic?
>
> For example if you do this every time a new pack is installed to
> sha1_file(), like in add_packed_git() perhaps, wouldn't that be much less
> error prone?

On second thought, I think fast-import is the only program that plays
funny games of feeding a packed_git that is *not* part of the real list of
packed_git installed in the system to unpack_entry(), so probably your
patch is a better idea.

We really need a better comment in the codepath from gfi_unpack_entry() to
unpack_entry() that there is a very unusual thing going on.

By the way, strictly speaking, you need to release the delta_base_cache
entries that is based on pack_data and nothing else, no?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 22:47                           ` Junio C Hamano
@ 2009-02-10 23:09                             ` Shawn O. Pearce
  2009-02-10 23:15                               ` Junio C Hamano
  2009-02-11 18:09                             ` Daniel Barkalow
  1 sibling, 1 reply; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 23:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, Johannes Schindelin, git

Junio C Hamano <gitster@pobox.com> wrote:
> 
> On second thought, I think fast-import is the only program that plays
> funny games of feeding a packed_git that is *not* part of the real list of
> packed_git installed in the system to unpack_entry(), so probably your
> patch is a better idea.

Right, that was my thought.
 
> We really need a better comment in the codepath from gfi_unpack_entry() to
> unpack_entry() that there is a very unusual thing going on.

That whole code is hairy.  It already has more comments than code.
What more can I really say here other than maybe this?

diff --git a/fast-import.c b/fast-import.c
index 03b13e0..7bfb563 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1204,6 +1204,12 @@ static void *gfi_unpack_entry(
 		 */
 		p->pack_size = pack_size + 20;
 	}
+	/* DANGER, WILL ROBINSON DANGER !!!!
+	 *
+	 * unpack_entry() wasn't meant to be called the way we are
+	 * about to call it right here.  Be very careful, any sort
+	 * of assumption is probably wrong.
+	 */
 	return unpack_entry(p, oe->offset, &type, sizep);
 }
 
 
> By the way, strictly speaking, you need to release the delta_base_cache
> entries that is based on pack_data and nothing else, no?

Right.

But the hiccup of a checkpoint in terms of overall performance is
such a huge amount (due to needing to re-read the entire pack to
compute its final checksum) that the loss of the delta_base_cache
is pretty much a drop in the bucket here.

I can go back and add in a struct packed_git* and filter to only
those entries in the cache, but it doesn't seem worth it to me.

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 23:09                             ` Shawn O. Pearce
@ 2009-02-10 23:15                               ` Junio C Hamano
  2009-02-10 23:16                                 ` Shawn O. Pearce
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2009-02-10 23:15 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Daniel Barkalow, Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> What more can I really say here other than maybe this?
>
> diff --git a/fast-import.c b/fast-import.c
> index 03b13e0..7bfb563 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -1204,6 +1204,12 @@ static void *gfi_unpack_entry(
>  		 */
>  		p->pack_size = pack_size + 20;
>  	}
> +	/* DANGER, WILL ROBINSON DANGER !!!!
> +	 *
> +	 * unpack_entry() wasn't meant to be called the way we are
> +	 * about to call it right here.  Be very careful, any sort
> +	 * of assumption is probably wrong.
> +	 */
>  	return unpack_entry(p, oe->offset, &type, sizep);
>  }

Yuck ;-).

>> By the way, strictly speaking, you need to release the delta_base_cache
>> entries that is based on pack_data and nothing else, no?
>
> Right.
>
> But the hiccup of a checkpoint in terms of overall performance is
> such a huge amount (due to needing to re-read the entire pack to
> compute its final checksum) that the loss of the delta_base_cache
> is pretty much a drop in the bucket here.
>
> I can go back and add in a struct packed_git* and filter to only
> those entries in the cache, but it doesn't seem worth it to me.

Nah, that was not a suggestion but a question.

The patch can and should go to maint, right?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 23:15                               ` Junio C Hamano
@ 2009-02-10 23:16                                 ` Shawn O. Pearce
  2009-02-10 23:32                                   ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-10 23:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, Johannes Schindelin, git

Junio C Hamano <gitster@pobox.com> wrote:
> 
> The patch can and should go to maint, right?

Yea, maint.  Don't forget the ';' I forgot in sha1_file.c.

Clearly, I failed to compile-test it before sending.  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 23:16                                 ` Shawn O. Pearce
@ 2009-02-10 23:32                                   ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2009-02-10 23:32 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Daniel Barkalow, Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> 
>> The patch can and should go to maint, right?
>
> Yea, maint.  Don't forget the ';' I forgot in sha1_file.c.
>
> Clearly, I failed to compile-test it before sending.  :-)

That's Ok, I always compile test before merging the result of "git am" to
any other place ;-).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-10 22:47                           ` Junio C Hamano
  2009-02-10 23:09                             ` Shawn O. Pearce
@ 2009-02-11 18:09                             ` Daniel Barkalow
  2009-02-11 18:15                               ` Shawn O. Pearce
  1 sibling, 1 reply; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-11 18:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Johannes Schindelin, git

On Tue, 10 Feb 2009, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > "Shawn O. Pearce" <spearce@spearce.org> writes:
> >
> >> Daniel Barkalow <barkalow@iabervon.org> wrote:
> >>> On Tue, 10 Feb 2009, Shawn O. Pearce wrote:
> >>> > 
> >>> > We should dump the cached_objects table in sha1_file.c during
> >>> > a checkpoint in fast-import.
> >>> 
> >>> No, that one's keyed by sha1, and doesn't get collisions; it's the 
> >>> delta_base_cache that's the issue; it's keyed by struct packed_git * and 
> >>> offset.
> >>
> >> Uh, yea, I realize that after I sent the message.  Does this patch
> >> fix it for you?
> >>
> >> --8<--
> >> Clear the delta base cache during fast-import checkpoint
> >>
> >> Otherwise we may reuse the same memory address for a totally
> >> different "struct packed_git", and a previously cached object from
> >> the prior occupant might be returned when trying to unpack an object
> >> from the new pack.
> >
> > Can this be made more automatic?
> >
> > For example if you do this every time a new pack is installed to
> > sha1_file(), like in add_packed_git() perhaps, wouldn't that be much less
> > error prone?
> 
> On second thought, I think fast-import is the only program that plays
> funny games of feeding a packed_git that is *not* part of the real list of
> packed_git installed in the system to unpack_entry(), so probably your
> patch is a better idea.

That's not the problem; the problem is calling free on a struct packed_git 
that has been given to unpack_entry(), because it raises the possibility 
of having the memory allocated to a different pack in the future and 
ending up with actively wrong entries in the delta cache, since it keys 
off of the pointer.

I think free_pack_by_name() also needs to drop the entries that are from 
the freed pack, to avoid having repack able to get the same problem, 
although I wouldn't be surprised if repack happened to never allocate a 
new pack after freeing an old pack with stale delta cache entries, or 
never used the delta cache after that, simply because it does one thing 
and then exits.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-11 18:09                             ` Daniel Barkalow
@ 2009-02-11 18:15                               ` Shawn O. Pearce
  2009-02-11 18:30                                 ` Junio C Hamano
  2009-02-11 18:33                                 ` Daniel Barkalow
  0 siblings, 2 replies; 24+ messages in thread
From: Shawn O. Pearce @ 2009-02-11 18:15 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, Johannes Schindelin, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> 
> I think free_pack_by_name() also needs to drop the entries that are from 
> the freed pack, to avoid having repack able to get the same problem, 
> although I wouldn't be surprised if repack happened to never allocate a 
> new pack after freeing an old pack with stale delta cache entries, or 
> never used the delta cache after that, simply because it does one thing 
> and then exits.

Oy.  I missed that we added this function.  Patch follows.

--8<--
Clear the delta base cache if a pack is rebuilt

There is some risk that re-opening a regenerated pack file with
different offsets could leave stale entries within the delta base
cache that could be matched up against other objects using the same
"struct packed_git*" and pack offset.

Throwing away the entire delta base cache in this case is safer,
as we don't have to worry about a recycled "struct packed_git*"
matching to the wrong base object, resulting in delta apply
errors while unpacking an object.

Suggested-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 sha1_file.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 7459a9c..5b6e0f6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -689,6 +689,7 @@ void free_pack_by_name(const char *pack_name)
 	while (*pp) {
 		p = *pp;
 		if (strcmp(pack_name, p->pack_name) == 0) {
+			clear_delta_base_cache();
 			close_pack_windows(p);
 			if (p->pack_fd != -1)
 				close(p->pack_fd);
-- 
1.6.2.rc0.186.g417c

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-11 18:15                               ` Shawn O. Pearce
@ 2009-02-11 18:30                                 ` Junio C Hamano
  2009-02-11 18:33                                 ` Daniel Barkalow
  1 sibling, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2009-02-11 18:30 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Daniel Barkalow, Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
>> 
>> I think free_pack_by_name() also needs to drop the entries that are from 
>> the freed pack, to avoid having repack able to get the same problem, 
>> although I wouldn't be surprised if repack happened to never allocate a 
>> new pack after freeing an old pack with stale delta cache entries, or 
>> never used the delta cache after that, simply because it does one thing 
>> and then exits.
>
> Oy.  I missed that we added this function.  Patch follows.

Thanks, both.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: fact-import: failed to apply delta
  2009-02-11 18:15                               ` Shawn O. Pearce
  2009-02-11 18:30                                 ` Junio C Hamano
@ 2009-02-11 18:33                                 ` Daniel Barkalow
  1 sibling, 0 replies; 24+ messages in thread
From: Daniel Barkalow @ 2009-02-11 18:33 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, Johannes Schindelin, git

On Wed, 11 Feb 2009, Shawn O. Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > 
> > I think free_pack_by_name() also needs to drop the entries that are from 
> > the freed pack, to avoid having repack able to get the same problem, 
> > although I wouldn't be surprised if repack happened to never allocate a 
> > new pack after freeing an old pack with stale delta cache entries, or 
> > never used the delta cache after that, simply because it does one thing 
> > and then exits.
> 
> Oy.  I missed that we added this function.  Patch follows.

I think it would be more clear to do something below (instead of the 
original patch); I think there's a better chance of authors knowing when 
to use this function than knowing when to use a function based on what it 
actually does, and there's a better chance that any future optimizations 
that need to be flushed under the same conditions would get included.

--8<--
Provide a function to free a struct packed_git that may have been used

When we look up entries in a pack, we sometimes cache the results. If a 
struct packed_git is freed afterwards (and its memory could be allocated 
as a different struct packed_git later), we need to clear out anything 
that may mis-recognize the pack.

In particular, we flush the delta cache.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>

diff --git a/cache.h b/cache.h
index 8d965b8..ecc55cf 100644
--- a/cache.h
+++ b/cache.h
@@ -822,6 +822,7 @@ extern unsigned char* use_pack(struct packed_git *, struct pack_window **, off_t
 extern void close_pack_windows(struct packed_git *);
 extern void unuse_pack(struct pack_window **);
 extern void free_pack_by_name(const char *);
+extern void free_used_pack(struct packed_git *);
 extern struct packed_git *add_packed_git(const char *, int, int);
 extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t);
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
diff --git a/fast-import.c b/fast-import.c
index f0e08ac..8ec9a4e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -987,7 +987,7 @@ static void end_packfile(void)
 		close(old_p->pack_fd);
 		unlink(old_p->pack_name);
 	}
-	free(old_p);
+	free_used_pack(old_p);
 
 	/* We can't carry a delta across packfiles. */
 	strbuf_release(&last_blob.data);
diff --git a/sha1_file.c b/sha1_file.c
index fd4980d..5a45f51 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -696,7 +696,7 @@ void free_pack_by_name(const char *pack_name)
 				munmap((void *)p->index_data, p->index_size);
 			free(p->bad_object_sha1);
 			*pp = p->next;
-			free(p);
+			free_used_pack(p);
 			return;
 		}
 		pp = &p->next;
@@ -1663,6 +1663,14 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
 	}
 }
 
+void free_used_pack(struct packed_git *pack)
+{
+	unsigned long p;
+	for (p = 0; p < MAX_DELTA_CACHE; p++)
+		release_delta_base_cache(&delta_base_cache[p]);
+	free(pack);
+}
+
 static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	void *base, unsigned long base_size, enum object_type type)
 {

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-02-11 18:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-10  3:26 fact-import: failed to apply delta Daniel Barkalow
2009-02-10 10:28 ` Johannes Schindelin
2009-02-10 15:56   ` Shawn O. Pearce
2009-02-10 17:15     ` Daniel Barkalow
2009-02-10 17:22       ` Shawn O. Pearce
2009-02-10 17:47         ` Daniel Barkalow
2009-02-10 19:12           ` Shawn O. Pearce
2009-02-10 20:03             ` Daniel Barkalow
2009-02-10 20:12               ` Shawn O. Pearce
2009-02-10 21:19                 ` Daniel Barkalow
2009-02-10 21:25                   ` Shawn O. Pearce
2009-02-10 21:32                     ` Daniel Barkalow
2009-02-10 21:36                       ` Shawn O. Pearce
2009-02-10 21:51                         ` Daniel Barkalow
2009-02-10 22:30                         ` Junio C Hamano
2009-02-10 22:47                           ` Junio C Hamano
2009-02-10 23:09                             ` Shawn O. Pearce
2009-02-10 23:15                               ` Junio C Hamano
2009-02-10 23:16                                 ` Shawn O. Pearce
2009-02-10 23:32                                   ` Junio C Hamano
2009-02-11 18:09                             ` Daniel Barkalow
2009-02-11 18:15                               ` Shawn O. Pearce
2009-02-11 18:30                                 ` Junio C Hamano
2009-02-11 18:33                                 ` Daniel Barkalow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).