From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Shawn O. Pearce" Subject: Re: Appropriateness of git for digital video production versioning Date: Thu, 29 Jan 2009 07:45:36 -0800 Message-ID: <20090129154536.GC26880@spearce.org> References: <8c4a72800901290736p4952e53byddca243f300dd8af@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org To: Charles Earl X-From: git-owner@vger.kernel.org Thu Jan 29 16:47:22 2009 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1LSZ6f-0000zf-Ee for gcvg-git-2@gmane.org; Thu, 29 Jan 2009 16:47:05 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753117AbZA2Ppi (ORCPT ); Thu, 29 Jan 2009 10:45:38 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752118AbZA2Pph (ORCPT ); Thu, 29 Jan 2009 10:45:37 -0500 Received: from george.spearce.org ([209.20.77.23]:33800 "EHLO george.spearce.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752117AbZA2Pph (ORCPT ); Thu, 29 Jan 2009 10:45:37 -0500 Received: by george.spearce.org (Postfix, from userid 1001) id 80A0438210; Thu, 29 Jan 2009 15:45:36 +0000 (UTC) Content-Disposition: inline In-Reply-To: <8c4a72800901290736p4952e53byddca243f300dd8af@mail.gmail.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Charles Earl wrote: > Are there past instances of git having been adapted to support version > control of digital media production workflow? You are going to run into scaling problems. Git works under the assumption that it can malloc() at least 2 complete copies of a file at once, in the same process. Last time I mucked around with digital media production, the volume of data in a video file was *huge*. Its workable on modern systems with terabyte disk arrays and so forth, but modern systems still can't afford the 100 GB of RAM necessary to allow Git to malloc() up two blocks of a single 40 GB video file. Also, since clients pretty much grab the entire repository when they clone it for working access, its going to suck down the entire media archive, *all* versions. That could be well into the hundreds of TB range and may never complete. > The bulk of content stored is binary data -- there have been posts on > this about integration of various binary diff implemetations with git. You mention later about using S3 or BlockStore to hold the binary content. Maybe the large binary data should be stored in S3, and the Git repository just holds the metadata and scripts, including scripts to perform downloads/uploads through S3. > The versioning of metadata, scripts, project structure seems to argue > for applicability of system such as git -- these fit the paradigm of > traditional scm. Yea, that's more typical of what Git was designed and built to store. > Example content is from media production suites such as Adobe After > Effects/Premier: video, compositions, etc. > I'd also like the object storage to be in S3/Amazon BlockStore or > similar remote stores. My suggestion? Use Git for your metadata and scripts. Include a few scripts that can download the large media files from S3 when they are needed, and upload new versions when they are modified. If you want to store versions over time of the files, sha1sum the media file and use that as the key name in the S3 bucket, and store the output of sha1sum into a file within Git. E.g. a ".media" text file just listing out sha1sum and path names: ceba7222551c722836564535697947e8a9b3e7ce big_file.mpg 75c8f5ecb97ec67c1ec949b16c72e6ba1361a528 other_file.mpg and use a simple script to edit/read that file, accessing S3 as necessary for the operations. -- Shawn.