git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Hommey <mh@glandium.org>
To: git@vger.kernel.org
Cc: Jonathan Nieder <jrnieder@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: fast-import's gfi_unpack_entry causes too many munmap/mmap cycles
Date: Sun, 17 Apr 2016 09:54:43 +0900	[thread overview]
Message-ID: <20160417005443.GA15847@glandium.org> (raw)
In-Reply-To: <20160416110403.GA19197@glandium.org>

On Sat, Apr 16, 2016 at 08:04:03PM +0900, Mike Hommey wrote:
> So I think I got myself a workaround...
> 
> > A --- B
> >  \
> >   \-- C
> > 
> > I have:
> > - diff between null-tree and A
> > - diff between A and B
> > - diff between B and C
> 
> I should be able to do:
> 
> - start the commit command for A
> - before finishing it, `ls ""`
> - then apply the diff for B and `ls ""`
> - then apply the diff for C and `ls ""`
> - then `deleteall`
> - then `M 040000 sha1_from_first_ls ` and finally finish A
> - create the commit for B with `from
>   0000000000000000000000000000000000000000\nmerge :mark` and `M 040000
>   sha1_from_second_ls`
> - likewise for C
> 
> ... and avoid gfi_unpack_entry.

And it works... as an avoidance of gfi_unpack_entry... but it has its
own problem: somehow the store_tree() that happens for each of those
`ls ""` commands is storing *all* trees. Even the ones that haven't
changed. In terms of a minimalistic fast-import script:

With:
  commit refs/FOO
  committer <foo@foo> 0 +0
  data 0

  M 644 inline a/a
  data 1
  a

  commit refs/FOO
  committer <foo@foo> 0 +0
  data 0

  M 644 inline b/b
  data 1
  b

store_tree is called for:
- b39954843ff6e09ec3aa2b942938c30c6bd1629e
- 2c3b59f77afa6fea6c1a380eeb0cb1eb292515b5
- 51e58bf6ce558dd384bbf9d493f9a376f3bcb089
- a97dda9f3a819113b3b239b9a62edece27136080

With:
  commit refs/FOO
  committer <foo@foo> 0 +0
  data 0

  M 644 inline a/a
  data 1
  a
  ls ""
  M 644 inline b/b
  data 1
  b

store_tree is called for:
- b39954843ff6e09ec3aa2b942938c30c6bd1629e
- 2c3b59f77afa6fea6c1a380eeb0cb1eb292515b5
- b39954843ff6e09ec3aa2b942938c30c6bd1629e
- 51e58bf6ce558dd384bbf9d493f9a376f3bcb089
- a97dda9f3a819113b3b239b9a62edece27136080

Note how b39954843ff6e09ec3aa2b942938c30c6bd1629e is being stored twice
(it's the tree for a/).

So in the scenario I'm testing, which has many more trees, I'm trading
29k gfi_unpack_entry calls and 230k store_tree calls for 1.96M
store_tree calls. On even larger trees, I'm not sure that wouldn't make
things even worse than they already are with gfi_unpack_entry.

Mike

  reply	other threads:[~2016-04-17  0:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-16  9:18 fast-import's gfi_unpack_entry causes too many munmap/mmap cycles Mike Hommey
2016-04-16  9:31 ` Mike Hommey
2016-04-16 11:04 ` Mike Hommey
2016-04-17  0:54   ` Mike Hommey [this message]
2016-04-17  1:13     ` Mike Hommey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160417005443.GA15847@glandium.org \
    --to=mh@glandium.org \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).