From: Jarrad Hope <me@jarradhope.com>
To: git@vger.kernel.org
Subject: Tackling Git Limitations with Singular Large Line-seperated Plaintext files
Date: Fri, 27 Jun 2014 15:45:16 +0700 [thread overview]
Message-ID: <CAJoVafc1LMxmvCiWci3N+AuAZBsABR3Wb3c6c3stw93OJZ7Scw@mail.gmail.com> (raw)
Hello,
As a software developer I've used git for years and have found it the
perfect solution for source control.
Lately I have found myself using git in a unique use-case - modifying
DNA/RNA sequences and storing them in git, which are essentially
software/source code for cells/life. For Bacteria and Viruses the
repo's are very small <10mb & compress nicely.
However on the extreme end of the spectrum a human genome can run in
at 50gb or say ~1gb per file/chromosome.
Now, this is not the binary problem and it is not the same as storing
media inside git - I have reviewed the solutions that exist for the
binary problem, such as git-annex, git-media & bup. But they don't
provide the featureset of git and the data i'm storing is more like
plaintext sourcecode with relatively small edits per commit.
I have googled and asked in #git which discussion mostly revolved
around these tools.
The only project that holds interest is a 2009 project, git-bigfiles -
however it is abit dated & the author is not interested in reviving
this project - referring me to git-annex. Unfortunately.
With that background;
I wanted to discuss the problems with git and how I can contribute to
the core project to best solve them.
>From my understanding the largest problem revolves around git's delta
discovery method, holding 2 files in memory at once - is there a
reason this could not be adapted to page/chunk the data in a sliding
window fashion ?
Are there any other issues I need to know about, is anyone else
working on making git more capable of handling large source files that
I can collaborate with?
Thanks for your time,
Jarrad
next reply other threads:[~2014-06-27 8:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-27 8:45 Jarrad Hope [this message]
2014-06-27 15:45 ` Tackling Git Limitations with Singular Large Line-seperated Plaintext files Shawn Pearce
2014-06-27 17:48 ` Junio C Hamano
2014-06-27 19:38 ` Linus Torvalds
2014-06-27 19:47 ` Linus Torvalds
2014-06-27 19:55 ` Jason Pyeron
2014-06-27 20:13 ` Linus Torvalds
2014-06-28 6:51 ` Jarrad Hope
2014-06-30 12:56 ` Jakub Narębski
2014-08-10 21:45 ` Øyvind A. Holm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJoVafc1LMxmvCiWci3N+AuAZBsABR3Wb3c6c3stw93OJZ7Scw@mail.gmail.com \
--to=me@jarradhope.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).