git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Sixt <j.sixt@viscovery.net>,
	Charles Bailey <charles@hashpling.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Be more careful about updating refs
Date: Thu, 17 Jan 2008 21:30:06 -0500	[thread overview]
Message-ID: <20080118023005.GS24004@spearce.org> (raw)
In-Reply-To: <7vbq7jkixg.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> wrote:
> I ran strace and found that fast-import retains three windows to
> the same data that was opened while the pack was still being
> built (i.e. the filename is still tmp_pack_XXXXXX) when it dies:
> 
>     {next = 0x6f6d20, base = 0x6f77a0 "PACK", offset = 0, len = 907,
>      last_used = 9, inuse_cnt = 0}
>     {next = 0x728630, base = 0x6f7160 "PACK", offset = 0, len = 500, 
>       last_used = 5, inuse_cnt = 0}
>     {next = 0x0, base = 0x6f6d50 "PACK", offset = 0, len = 261, 
>       last_used = 1, inuse_cnt = 0}

Ouch.  Hmm, well, maybe it shouldn't keep three windows open to
the same part of the file, but with different lengths.  :)

But that's another issue.
 
> The following patch seems to fix the issue for me, but this is
> primarily meant for discussion,

I agree the patch should fix this particular issue and the
performance difference it will cause is minor enough to not be
worth discussing further.  But I have one minor comment (please
see it below).

> as I do not quite understand why
> the same issue does not manifest itself when NO_MMAP is not
> used.

When NO_MMAP is off (we are actually using the OS's true mmap)
the data is updated into the window when the file is written,
as the window is backed by the OS's filesystem cache.

If the data access is outside of the window (past the offset of
the longest window, here 907) we would just open a new window to
the relevant region of the file.  But when it is inside the window,
the window being backed by the filesystem cache saved us.  In the
case of NO_MMAP this doesn't work, so we get a failure later during
object access.

In particular if you look at gfi_unpack_entry() (which is where
we cause windows to be opened on the file being created) we tell
sha1_file.c it has 20 bytes available in the window *beyond* the
actual end of the file.  This is due to an assumption within the
windowing code of sha1_file.c that we always have the packfile
footer at the end of the file, so all windowing code assumes it
has at least 20 bytes from the start of a window that it can access
without needing to perform additional bounds checking.

So what's happening here is we are adding objects into the file,
some of which may have their data appearing in the trailing 20 bytes
of some prior window.  If that happens the cached window looks like
it can be used to access the data, but it really cannot be in the
NO_MMAP case as those trailing 20 bytes of the window are all \0's.

The astute reader may wonder how gfi_unpack_entry() works when
NO_MMAP is being used.  It doesn't for that last 20 bytes of
any window.  Which has me thinking that Junio's patch alone isn't
enough to make fast-import work correctly under NO_MMAP.

> diff --git a/fast-import.c b/fast-import.c
> index 3609c24..bd0ddb1 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -926,7 +926,13 @@ static void end_packfile(void)
>  		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
>  		if (!new_p)
>  			die("core git rejected index %s", idx_name);
> -		new_p->windows = old_p->windows;
> +		while (old_p->windows) {
> +			struct pack_window *w = old_p->windows;
> +			munmap(w->base, w->len);
> +			old_p->windows = w->next;
> +			free(w);
> +		}
> +		new_p->windows = NULL;

This assignment of NULL should not be necessary.  add_packed_git()
already does this for us on new_p so we do not need to repeat it
here in fast-import.

>  		all_packs[pack_id] = new_p;
>  		install_packed_git(new_p);
>  

-- 
Shawn.

  parent reply	other threads:[~2008-01-18  2:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15 23:50 Be more careful about updating refs Linus Torvalds
2008-01-16  0:02 ` Linus Torvalds
2008-01-16 19:52   ` Junio C Hamano
2008-01-17  9:15     ` Charles Bailey
2008-01-17 10:52       ` Johannes Sixt
2008-01-17 11:01         ` Charles Bailey
2008-01-17 12:41           ` Johannes Sixt
2008-01-17 12:58             ` Johannes Schindelin
2008-01-17 13:07               ` Charles Bailey
2008-01-18  1:43             ` Junio C Hamano
2008-01-18  2:01               ` Junio C Hamano
2008-01-18  2:13                 ` Shawn O. Pearce
2008-01-18  2:25                   ` Junio C Hamano
2008-01-18  2:33                     ` Shawn O. Pearce
2008-01-18  2:58                       ` Shawn O. Pearce
2008-01-18  3:18                         ` Shawn O. Pearce
2008-01-18  3:22                           ` Shawn O. Pearce
     [not found]                             ` <20080118035700.GA3458@spearce.org>
2008-01-18  4:27                               ` [PATCH] Fix random fast-import errors when compiled with NO_MMAP Linus Torvalds
2008-01-18  8:42                                 ` Charles Bailey
2008-01-18 17:08                                   ` Linus Torvalds
2008-01-19  3:25                                     ` Junio C Hamano
2008-01-19  3:55                                       ` Linus Torvalds
2008-01-21  3:57                                       ` Shawn O. Pearce
2008-01-18  6:10                               ` Junio C Hamano
2008-01-21  4:10                                 ` Shawn O. Pearce
2008-01-18  7:53                               ` Johannes Sixt
2008-01-18  9:26                               ` Charles Bailey
2008-01-18  9:36                                 ` Junio C Hamano
2008-01-18  9:45                                   ` Charles Bailey
2008-01-18 10:57                                     ` Junio C Hamano
2008-01-18  2:30               ` Shawn O. Pearce [this message]
2008-01-17 10:56       ` Be more careful about updating refs Charles Bailey
2008-01-16  0:29 ` Junio C Hamano
2008-01-16  0:42   ` Linus Torvalds
2008-01-16  1:11 ` Junio C Hamano
2008-01-23 22:53 ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080118023005.GS24004@spearce.org \
    --to=spearce@spearce.org \
    --cc=charles@hashpling.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j.sixt@viscovery.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).