git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Tackling Git Limitations with Singular Large Line-seperated Plaintext files
@ 2014-06-27  8:45 Jarrad Hope
  2014-06-27 15:45 ` Shawn Pearce
  0 siblings, 1 reply; 10+ messages in thread
From: Jarrad Hope @ 2014-06-27  8:45 UTC (permalink / raw)
  To: git

Hello,

As a software developer I've used git for years and have found it the
perfect solution for source control.

Lately I have found myself using git in a unique use-case - modifying
DNA/RNA sequences and storing them in git, which are essentially
software/source code for cells/life. For Bacteria and Viruses the
repo's are very small <10mb & compress nicely.

However on the extreme end of the spectrum a human genome can run in
at 50gb or say ~1gb per file/chromosome.

Now, this is not the binary problem and it is not the same as storing
media inside git - I have reviewed the solutions that exist for the
binary problem, such as git-annex, git-media & bup. But they don't
provide the featureset of git and the data i'm storing is more like
plaintext sourcecode with relatively small edits per commit.

I have googled and asked in #git which discussion mostly revolved
around these tools.

The only project that holds interest is a 2009 project, git-bigfiles -
however it is abit dated & the author is not interested in reviving
this project - referring me to git-annex. Unfortunately.

With that background;
I wanted to discuss the problems with git and how I can contribute to
the core project to best solve them.

>From my understanding the largest problem revolves around git's delta
discovery method, holding 2 files in memory at once - is there a
reason this could not be adapted to page/chunk the data in a sliding
window fashion ?

Are there any other issues I need to know about, is anyone else
working on making git more capable of handling large source files that
I can collaborate with?

Thanks for your time,
Jarrad

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-08-10 21:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-27  8:45 Tackling Git Limitations with Singular Large Line-seperated Plaintext files Jarrad Hope
2014-06-27 15:45 ` Shawn Pearce
2014-06-27 17:48   ` Junio C Hamano
2014-06-27 19:38     ` Linus Torvalds
2014-06-27 19:47       ` Linus Torvalds
2014-06-27 19:55       ` Jason Pyeron
2014-06-27 20:13         ` Linus Torvalds
2014-06-28  6:51           ` Jarrad Hope
2014-06-30 12:56       ` Jakub Narębski
2014-08-10 21:45         ` Øyvind A. Holm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).