* Git as electronic lab notebook @ 2009-12-19 12:23 Thomas Johnson 2009-12-19 13:38 ` Ciprian Dorin, Craciun 0 siblings, 1 reply; 6+ messages in thread From: Thomas Johnson @ 2009-12-19 12:23 UTC (permalink / raw) To: git Hello group, I've been using git on a few different projects over the last couple of months, and as a former svn user I really like it. Recently, I've been using it as an 'electronic lab notebook' for an empirical project. My workflow looks like this: 1. Start with the stable code base on head 2. Create and change to branch 'Experiment123' 3. Make some changes 4. Run the program, which generates a giant (10MB-4G) output text file, Experiment123.log. Update my LabNotebook.txt file. 5. Were the new changes helpful? 5.yes: Bzip Experiment123.log, and commit it on the branch. Merge the Experiment123 branch to head and goto 1. 5.no: Bzip Experiment123.log, and commit it on the branch. Merge LabNotebook.txt and Experiment123.log back to head. Switch back to head and goto 1. The thing is, Experiment123.log is going to be very similar to Experiment122.log and Experiment124.log except for a few details. My understanding is that git is great at compressing groups of files like this, is that correct? Should I not be bzipping them myself? On the other hand, I don't want HEAD to contain hundreds of gigs of uncompressed files that bzip down to only a few hundred megs. Any thoughts on the workflow itself would also be very welcome. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Git as electronic lab notebook 2009-12-19 12:23 Git as electronic lab notebook Thomas Johnson @ 2009-12-19 13:38 ` Ciprian Dorin, Craciun 2009-12-20 0:15 ` Johan 't Hart 0 siblings, 1 reply; 6+ messages in thread From: Ciprian Dorin, Craciun @ 2009-12-19 13:38 UTC (permalink / raw) To: Thomas Johnson; +Cc: git On Sat, Dec 19, 2009 at 2:23 PM, Thomas Johnson <thomas.j.johnson@gmail.com> wrote: > Hello group, > > I've been using git on a few different projects over the last couple of months, > and as a former svn user I really like it. Recently, I've been using it as an > 'electronic lab notebook' for an empirical project. My workflow looks like this: > 1. Start with the stable code base on head > 2. Create and change to branch 'Experiment123' > 3. Make some changes > 4. Run the program, which generates a giant (10MB-4G) output text file, > Experiment123.log. Update my LabNotebook.txt file. > 5. Were the new changes helpful? > 5.yes: Bzip Experiment123.log, and commit it on the branch. Merge the > Experiment123 branch to head and goto 1. > 5.no: Bzip Experiment123.log, and commit it on the branch. Merge LabNotebook.txt > and Experiment123.log back to head. Switch back to head and goto 1. > > The thing is, Experiment123.log is going to be very similar to Experiment122.log > and Experiment124.log except for a few details. My understanding is that git is > great at compressing groups of files like this, is that correct? Should I not be > bzipping them myself? On the other hand, I don't want HEAD to contain hundreds > of gigs of uncompressed files that bzip down to only a few hundred megs. > > Any thoughts on the workflow itself would also be very welcome. I have used myself such a similar workflow for parametric studies on some genetic algorithms, and below are my observations related to your question: * saving the entire log file (either zipped or not) in the repository has some drawbacks with repository clonning; (in my setup I've runned the tests in parallel on a different machine, and used Git to synchronize between the development machine and the test machine;) the problem lies in the fact that when I wanted to "clean" the test machine and start over I had to clone the repository, which also held all the unneeded log files; * (actually I've used two Git repositories -- one for the actual source code where I make the commits by hand, and another one which I use for the synchronization;) * even if you prefer having the logs, it's best to let Git handle the compression; because even if only some small parts change from the original txt file, I would guess that the BZip-ped file looks quite different; * maybe it would be better than instead of holding the experiment log, you just keep a sumarization of it (only the important stuff); and even if you do need the entire log, you could always recreate it by running the code again; (this was the road I took in the end, by keeping a small SQLite database of each experiment;) * (and of course there is also another little trick I've used: just put the logs file in a `log` directory which is "git-ignored", that way you can switch between branches, but Git won't touch the `log` directory, unless you force it by issuing `git clean -f -d -x`;) Hope I've been useful, Ciprian. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Git as electronic lab notebook 2009-12-19 13:38 ` Ciprian Dorin, Craciun @ 2009-12-20 0:15 ` Johan 't Hart 2009-12-20 3:15 ` Nicolas Pitre 0 siblings, 1 reply; 6+ messages in thread From: Johan 't Hart @ 2009-12-20 0:15 UTC (permalink / raw) To: Ciprian Dorin, Craciun; +Cc: Thomas Johnson, git Ciprian Dorin, Craciun schreef: > On Sat, Dec 19, 2009 at 2:23 PM, Thomas Johnson > <thomas.j.johnson@gmail.com> wrote: >> 4. Run the program, which generates a giant (10MB-4G) output text file, >> Experiment123.log. Update my LabNotebook.txt file. > * even if you prefer having the logs, it's best to let Git handle > the compression; because even if only some small parts change from the > original txt file, I would guess that the BZip-ped file looks quite > different; > Is git able to handle 4Gig files? I've heard git loads every file completely in memory before handling it... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Git as electronic lab notebook 2009-12-20 0:15 ` Johan 't Hart @ 2009-12-20 3:15 ` Nicolas Pitre 2009-12-20 4:43 ` Bill Lear 0 siblings, 1 reply; 6+ messages in thread From: Nicolas Pitre @ 2009-12-20 3:15 UTC (permalink / raw) To: Johan 't Hart; +Cc: Ciprian Dorin, Craciun, Thomas Johnson, git On Sun, 20 Dec 2009, Johan 't Hart wrote: > Is git able to handle 4Gig files? I've heard git loads every file completely > in memory before handling it... Right. Sowith current Git you will be able to deal with 4GB files only if you have a 64-bit machine and more than 4GB of RAM. Nicolas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Git as electronic lab notebook 2009-12-20 3:15 ` Nicolas Pitre @ 2009-12-20 4:43 ` Bill Lear 2009-12-20 4:55 ` Nicolas Pitre 0 siblings, 1 reply; 6+ messages in thread From: Bill Lear @ 2009-12-20 4:43 UTC (permalink / raw) To: Nicolas Pitre Cc: Johan 't Hart, Ciprian Dorin, Craciun, Thomas Johnson, git On Saturday, December 19, 2009 at 22:15:00 (-0500) Nicolas Pitre writes: >On Sun, 20 Dec 2009, Johan 't Hart wrote: > >> Is git able to handle 4Gig files? I've heard git loads every file completely >> in memory before handling it... > >Right. Sowith current Git you will be able to deal with 4GB files only >if you have a 64-bit machine and more than 4GB of RAM. ?? % uname -a Linux pppp 2.6.31.6-166.fc12.i686 #1 SMP Wed Dec 9 11:14:59 EST 2009 i686 i686 i386 GNU/Linux % cat /proc/meminfo | grep MemTotal MemTotal: 3095296 kB % mkdir gogle % cd gogle % git init % dd if=/dev/zero of=zerofile.tst bs=1k count=4700000 % git add * % git commit -a -m new [master (root-commit) 35a25be] new 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 zerofile.tst % git --version git version 1.6.5.7 Seems ok to me... Though, I find this interesting: % git log -p commit 35a25be3fff2f8bbd6ec22c94b9a5c0d66053d21 Author: Bill Lear <rael@zopyra.com> Date: Sat Dec 19 22:38:48 2009 -0600 new diff --git a/zerofile.tst b/zerofile.tst new file mode 100644 index 0000000..e5bd39d Binary files /dev/null and b/zerofile.tst differ Bill ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Git as electronic lab notebook 2009-12-20 4:43 ` Bill Lear @ 2009-12-20 4:55 ` Nicolas Pitre 0 siblings, 0 replies; 6+ messages in thread From: Nicolas Pitre @ 2009-12-20 4:55 UTC (permalink / raw) To: Bill Lear; +Cc: Johan 't Hart, Ciprian Dorin, Craciun, Thomas Johnson, git On Sat, 19 Dec 2009, Bill Lear wrote: > On Saturday, December 19, 2009 at 22:15:00 (-0500) Nicolas Pitre writes: > >On Sun, 20 Dec 2009, Johan 't Hart wrote: > > > >> Is git able to handle 4Gig files? I've heard git loads every file completely > >> in memory before handling it... > > > >Right. Sowith current Git you will be able to deal with 4GB files only > >if you have a 64-bit machine and more than 4GB of RAM. > > ?? > > % uname -a > Linux pppp 2.6.31.6-166.fc12.i686 #1 SMP Wed Dec 9 11:14:59 EST 2009 i686 i686 i386 GNU/Linux > % cat /proc/meminfo | grep MemTotal > MemTotal: 3095296 kB > % mkdir gogle > % cd gogle > % git init > % dd if=/dev/zero of=zerofile.tst bs=1k count=4700000 > % git add * > % git commit -a -m new > [master (root-commit) 35a25be] new > 1 files changed, 0 insertions(+), 0 deletions(-) > create mode 100644 zerofile.tst > % git --version > git version 1.6.5.7 > > Seems ok to me... That's the easy part. Diffing such files and delta compressing them, or even checking them out especially when delta compressed, just won't work if you don't have the RAM. Fixing this limitation would introduce significant complexity in the code that no one felt was worth it. I had some thoughts about supporting the addition of really huge files in a Git repository where only add/commit/checkout/fetch/push would work with no delta compression. That didn't materialized yet though. Nicolas ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-12-20 4:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-19 12:23 Git as electronic lab notebook Thomas Johnson 2009-12-19 13:38 ` Ciprian Dorin, Craciun 2009-12-20 0:15 ` Johan 't Hart 2009-12-20 3:15 ` Nicolas Pitre 2009-12-20 4:43 ` Bill Lear 2009-12-20 4:55 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox