All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Peter C." <th3flyboy@gmail.com>
To: git@vger.kernel.org
Subject: [GSOC 2012] Some questions regarding a possible project to improve big file support
Date: Sun, 25 Mar 2012 16:48:09 -0400	[thread overview]
Message-ID: <4F6F8489.20108@gmail.com> (raw)

Hello, I'm considering working on Git for GSOC 2012, specifically in
improving big file support, however I wanted to ask a few questions
first, some about the low level operations of how Git handles diffs
between files, and also a question or two regarding implementation.

My first question is more of a question regarding low level
functionality of how Git diffs files. The question is, in the diff
process, does git just parse the file and see if there are diffs, or
does it use something like hashing to first tell if the file has been
modified at all, and then go to the diff process if the hash is
different. An extension to this question is, in Git's internal database,
does it set any kind of flag to say that a file is a binary if it is one.

My thought process in implementation involves checking the hash, and if
the hash is the same, skip it, if the hash is different, check the MIME
type possibly using libmagic, and if it matches a known binary format,
then just commit the new version, rather than trying to run a whole diff
and load the whole file in the process.

The thing I'm worried about is, would anything involved in this break
existing Git functionality, or backward compatibility. I'd also greatly
appreciate any feedback on my ideas.

Thanks,
Peter

             reply	other threads:[~2012-03-25 20:48 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-25 20:48 Peter C. [this message]
2012-03-26  1:21 ` [GSOC 2012] Some questions regarding a possible project to improve big file support Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F6F8489.20108@gmail.com \
    --to=th3flyboy@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.