git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* .gitignore for large files?
@ 2011-07-23 20:00 Philip Oakley
  2011-07-24  2:59 ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 6+ messages in thread
From: Philip Oakley @ 2011-07-23 20:00 UTC (permalink / raw)
  To: Git List

Has there been any discussion in the past on a method for ignoring large 
files via the .gitignore process?

It does appear to be a moderately common problem for folk to accidentally 
commit a large file which bloats their repository and they want rid of it, 
which causes history re-writes and such palaver.

Perhaps a simple '>' and '<' option (the latter to cover null or minimal 
files?) with a --warn postfix may be possible. Just an initial thought.

Where would the 'right place' be for me to look at the git code if it was 
beneficial.

Philip Oakley
(UK: Scotland) 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: .gitignore for large files?
  2011-07-23 20:00 .gitignore for large files? Philip Oakley
@ 2011-07-24  2:59 ` Nguyen Thai Ngoc Duy
  2011-07-25  6:53   ` Philip Oakley
  0 siblings, 1 reply; 6+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-07-24  2:59 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Git List

On Sun, Jul 24, 2011 at 3:00 AM, Philip Oakley <philipoakley@iee.org> wrote:
> Has there been any discussion in the past on a method for ignoring large
> files via the .gitignore process?
>
> It does appear to be a moderately common problem for folk to accidentally
> commit a large file which bloats their repository and they want rid of it,
> which causes history re-writes and such palaver.

Once they are in, they cannot be ignored. Perhaps commit hooks at
server side is a better place?

> Perhaps a simple '>' and '<' option (the latter to cover null or minimal
> files?) with a --warn postfix may be possible. Just an initial thought.

Or you can make use of .gitattributes, more flexible syntax.

> Where would the 'right place' be for me to look at the git code if it was
> beneficial.

In dir.c, add_exlude() does the parsing, excluded_from_list() handles the logic.
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: .gitignore for large files?
  2011-07-24  2:59 ` Nguyen Thai Ngoc Duy
@ 2011-07-25  6:53   ` Philip Oakley
  2011-07-25 13:17     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 6+ messages in thread
From: Philip Oakley @ 2011-07-25  6:53 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git List

Duy,

The .git attributes is a good call. I'm thinking that it would be a global 
flag (i.e. is a max file limit in place at all on this repo), and then a set 
of limits, initially one for 'text' and one for 'binary' (and possibly a 
third for anything with an ext diff), and probably use a similar attribute 
format as used for setting diff attributes and its filters to set a per file 
type limit [if required]

The relevant function names gives me a place to start....

Philip
----- Original Message ----- 
From: "Nguyen Thai Ngoc Duy" <pclouds@gmail.com>
To: "Philip Oakley" <philipoakley@iee.org>
Cc: "Git List" <git@vger.kernel.org>
Sent: Sunday, July 24, 2011 3:59 AM
Subject: Re: .gitignore for large files?


> On Sun, Jul 24, 2011 at 3:00 AM, Philip Oakley <philipoakley@iee.org> 
> wrote:
>> Has there been any discussion in the past on a method for ignoring large
>> files via the .gitignore process?
>>
>> It does appear to be a moderately common problem for folk to accidentally
>> commit a large file which bloats their repository and they want rid of 
>> it,
>> which causes history re-writes and such palaver.
>
> Once they are in, they cannot be ignored. Perhaps commit hooks at
> server side is a better place?
>
>> Perhaps a simple '>' and '<' option (the latter to cover null or minimal
>> files?) with a --warn postfix may be possible. Just an initial thought.
>
> Or you can make use of .gitattributes, more flexible syntax.
>
>> Where would the 'right place' be for me to look at the git code if it was
>> beneficial.
>
> In dir.c, add_exlude() does the parsing, excluded_from_list() handles the 
> logic.
> -- 
> Duy
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1390 / Virus Database: 1518/3783 - Release Date: 07/23/11
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: .gitignore for large files?
  2011-07-25  6:53   ` Philip Oakley
@ 2011-07-25 13:17     ` Nguyen Thai Ngoc Duy
  2011-07-25 18:59       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-07-25 13:17 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Git List

On Mon, Jul 25, 2011 at 1:53 PM, Philip Oakley <philipoakley@iee.org> wrote:
> Duy,
>
> The .git attributes is a good call. I'm thinking that it would be a global
> flag (i.e. is a max file limit in place at all on this repo), and then a set
> of limits, initially one for 'text' and one for 'binary' (and possibly a
> third for anything with an ext diff), and probably use a similar attribute
> format as used for setting diff attributes and its filters to set a per file
> type limit [if required]

While .gitattributes looks like a better place, it does not have
"exclude" attribute equivalence to .gitignore. If I remember
correctly, the way .gitignore and .gitattributes are implemented makes
it very hard to turn .gitignore into part of .gitattributes
implementation (gitattr checks .gitattributes of current dir first,
then upward to parents, while .gitgnore follows the opposite
direction).

But the other way might be simpler: introduce a new attribute, then
make use of the attribute in exclude code like autogenerated
.gitignore. I don't know. I have not thought through this idea.

But of course you can stay away from those by checking new attributes
in builtin/add.c and refusing to execute if someone wants to add a
file larger than a limit. You can avoid git-add internal business by
checking entries in to-be-written index, at label "finish" in
cmd_add(). This is the simplest way (and getting very close to using
hooks).

Anyway, good luck. Come back with some patches ;-)
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: .gitignore for large files?
  2011-07-25 13:17     ` Nguyen Thai Ngoc Duy
@ 2011-07-25 18:59       ` Junio C Hamano
  2011-07-25 21:36         ` Philip Oakley
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2011-07-25 18:59 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Philip Oakley, Git List

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> While .gitattributes looks like a better place, it does not have
> "exclude" attribute equivalence to .gitignore. If I remember
> correctly, the way .gitignore and .gitattributes are implemented makes
> it very hard to turn .gitignore into part of .gitattributes
> implementation (gitattr checks .gitattributes of current dir first,
> then upward to parents, while .gitgnore follows the opposite
> direction).

While I do not think it is necessarily a good idea to invent yet another
way to exclude and add it to the attributes mechanism (unless we will be
dropping the support for gitignore, which is not the case), I do not know
why you think the direction of the scan matters.

A more important difference is that the attribute mechanism covers the
actual paths, not intermediate directories, unlike gitignore does.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: .gitignore for large files?
  2011-07-25 18:59       ` Junio C Hamano
@ 2011-07-25 21:36         ` Philip Oakley
  0 siblings, 0 replies; 6+ messages in thread
From: Philip Oakley @ 2011-07-25 21:36 UTC (permalink / raw)
  To: Junio C Hamano, Nguyen Thai Ngoc Duy; +Cc: Git List

Subject: How to ignore large files?
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> While .gitattributes looks like a better place, it does not have
>> "exclude" attribute equivalence to .gitignore. If I remember
>> correctly, the way .gitignore and .gitattributes are implemented makes
>> it very hard to turn .gitignore into part of .gitattributes
>> implementation (gitattr checks .gitattributes of current dir first,
>> then upward to parents, while .gitgnore follows the opposite
>> direction).
>
> While I do not think it is necessarily a good idea to invent yet another
> way to exclude and add it to the attributes mechanism (unless we will be
> dropping the support for gitignore, which is not the case), I do not know
> why you think the direction of the scan matters.
>
> A more important difference is that the attribute mechanism covers the
> actual paths, not intermediate directories, unlike gitignore does.

The choice of any actual implementation would depend on both feasability and 
usefulness. Duy was pointing out (to me?) the different approaches used for 
respecting the gitattributes and (multiple) gitignore files. My first 
thought had been to use the .gitignore file but I then realised the 
potential problems of adding more metacharacters ( > and < ).

I've also just seen that 1.7.6 introduced the core.bigFileThreshold for 
memory management reasons, while this suggestion is to help folk avoid the 
mistake of unintentional committing of very large files without being 
warned.

Does the concept (of warning/ignoring when files are 'large') have any 
merit?

Philip 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-07-25 21:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-23 20:00 .gitignore for large files? Philip Oakley
2011-07-24  2:59 ` Nguyen Thai Ngoc Duy
2011-07-25  6:53   ` Philip Oakley
2011-07-25 13:17     ` Nguyen Thai Ngoc Duy
2011-07-25 18:59       ` Junio C Hamano
2011-07-25 21:36         ` Philip Oakley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).