* [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking
@ 2014-09-23 15:47 Jeff King
2014-09-23 16:23 ` Edward Thomson
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2014-09-23 15:47 UTC (permalink / raw)
To: git; +Cc: Edward Thomson, Kirill Smelkov
Doing so means that we do not actually get to see bogus
modes; they are converted into one of our known-good modes
by decode_tree_entry. We want to see the raw data so that we
can complain about it.
Signed-off-by: Jeff King <peff@peff.net>
---
As far as I can tell, fsck's mode-checking has been totally broken
basically forever. Which makes me a little nervous to fix it. :)
linux.git does have some bogus modes, but they are 100664, which is
specifically ignored here unless "fsck --strict" is in effect.
The implementation feels a bit hacky. The global is ugly, but it avoids
having to pass options all through the call chain of init_tree_entry,
update_tree_entry, etc.
Another option would be to have decode_tree_entry leave the raw mode in
the tree_desc, and only canonicalize it during tree_entry_extract (and
teach fsck to look at the raw data, not _extract). That would mean
reverting 7146e66 (tree-walk: finally switch over tree descriptors to
contain a pre-parsed entry, 2014-02-06). That commit says there's no
real benefit at the time, but that future code might benefit. I don't
know if that future code ever materialized (Kirill cc'd).
I thought at first that commit might have been responsible for this
breakage, but I don't think so. The canonicalization has happened in
tree_entry_extract since at least 2006, and we have always called that
function in fsck.
fsck.c | 2 ++
t/t1450-fsck.sh | 21 +++++++++++++++++++++
tree-walk.c | 4 +++-
tree-walk.h | 3 +++
4 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/fsck.c b/fsck.c
index 56156ff..ef79396 100644
--- a/fsck.c
+++ b/fsck.c
@@ -153,6 +153,7 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
unsigned o_mode;
const char *o_name;
+ decode_tree_raw = 1;
init_tree_desc(&desc, item->buffer, item->size);
o_mode = 0;
@@ -213,6 +214,7 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
o_name = name;
}
+ decode_tree_raw = 0;
retval = 0;
if (has_null_sha1)
retval += error_func(&item->object, FSCK_WARN, "contains entries pointing to null sha1");
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index b52397a..ba40c6f 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -176,6 +176,27 @@ test_expect_success 'malformatted tree object' '
grep "error in tree .*contains duplicate file entries" out
'
+# Basically an 8-bit clean version of sed 's/$1/$2/g'.
+munge_tree_bytes () {
+ perl -e '
+ binmode(STDIN); binmode(STDOUT);
+ $_ = do { local $/; <STDIN> };
+ s/$ARGV[0]/$ARGV[1]/g;
+ print
+ ' "$@"
+}
+
+test_expect_success 'bogus mode in tree' '
+ test_when_finished "remove_object \$T" &&
+ T=$(
+ git cat-file tree HEAD >good &&
+ munge_tree_bytes 100644 123456 <good >bad &&
+ git hash-object -w -t tree bad
+ ) &&
+ git fsck 2>out &&
+ grep "warning.*contains bad file modes" out
+'
+
test_expect_success 'tag pointing to nonexistent' '
cat >invalid-tag <<-\EOF &&
object ffffffffffffffffffffffffffffffffffffffff
diff --git a/tree-walk.c b/tree-walk.c
index 5dd9a71..baca766 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -5,6 +5,8 @@
#include "tree.h"
#include "pathspec.h"
+int decode_tree_raw;
+
static const char *get_mode(const char *str, unsigned int *modep)
{
unsigned char c;
@@ -37,7 +39,7 @@ static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned
/* Initialize the descriptor entry */
desc->entry.path = path;
- desc->entry.mode = canon_mode(mode);
+ desc->entry.mode = decode_tree_raw ? mode : canon_mode(mode);
desc->entry.sha1 = (const unsigned char *)(path + len);
}
diff --git a/tree-walk.h b/tree-walk.h
index ae7fb3a..9f78e85 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -25,6 +25,9 @@ static inline int tree_entry_len(const struct name_entry *ne)
return (const char *)ne->sha1 - ne->path - 1;
}
+/* If set, do not canonicalize modes in tree entries. */
+extern int decode_tree_raw;
+
void update_tree_entry(struct tree_desc *);
void init_tree_desc(struct tree_desc *desc, const void *buf, unsigned long size);
--
2.1.1.496.g1ba8081
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking
2014-09-23 15:47 [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking Jeff King
@ 2014-09-23 16:23 ` Edward Thomson
2014-09-23 16:30 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Edward Thomson @ 2014-09-23 16:23 UTC (permalink / raw)
To: Jeff King; +Cc: git, Kirill Smelkov
On Tue, Sep 23, 2014 at 11:47:51AM -0400, Jeff King wrote:
> As far as I can tell, fsck's mode-checking has been totally broken
> basically forever. Which makes me a little nervous to fix it. :)
> linux.git does have some bogus modes, but they are 100664, which is
> specifically ignored here unless "fsck --strict" is in effect.
I'm in favor of checking the mode in fsck, at least when --strict.
But I would suggest we be lax (by default) about other likely-to-exist
but strictly invalid modes to prevent peoples previously workable
repositories from being now broken.
I have, for example, encountered 100775 in the wild, and would argue that
like 100644, it should probably not fail unless we are in --strict mode.
Thanks-
-ed
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking
2014-09-23 16:23 ` Edward Thomson
@ 2014-09-23 16:30 ` Jeff King
2014-10-12 22:37 ` Ben Aveling
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2014-09-23 16:30 UTC (permalink / raw)
To: Edward Thomson; +Cc: git
[-cc Kirill, as his address seem out-of-date]
On Tue, Sep 23, 2014 at 04:23:43PM +0000, Edward Thomson wrote:
> On Tue, Sep 23, 2014 at 11:47:51AM -0400, Jeff King wrote:
> > As far as I can tell, fsck's mode-checking has been totally broken
> > basically forever. Which makes me a little nervous to fix it. :)
> > linux.git does have some bogus modes, but they are 100664, which is
> > specifically ignored here unless "fsck --strict" is in effect.
>
> I'm in favor of checking the mode in fsck, at least when --strict.
> But I would suggest we be lax (by default) about other likely-to-exist
> but strictly invalid modes to prevent peoples previously workable
> repositories from being now broken.
>
> I have, for example, encountered 100775 in the wild, and would argue that
> like 100644, it should probably not fail unless we are in --strict mode.
Yeah, I'd agree with that. The big question is: what breakage have we
seen in the wild? :)
I think treating 100775 the same as 100664 makes sense (want to do a
patch?). Do we know of any others? I guess we can collect them as time
goes on and reports come in. That's not the nicest thing for people with
such repos, but then again, their repos _are_ broken (and it's only
really a showstopper if they are trying to push to somebody with
receive.fsckObjects turned on).
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking
2014-09-23 16:30 ` Jeff King
@ 2014-10-12 22:37 ` Ben Aveling
2014-10-14 8:21 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Ben Aveling @ 2014-10-12 22:37 UTC (permalink / raw)
To: Jeff King, Edward Thomson; +Cc: git
Hi,
A question about fsck - is there a reason it doesn't have an option to
delete bad objects?
Regards, Ben
On 24/09/2014 02:30, Jeff King wrote:
> [-cc Kirill, as his address seem out-of-date]
>
> On Tue, Sep 23, 2014 at 04:23:43PM +0000, Edward Thomson wrote:
>
>> On Tue, Sep 23, 2014 at 11:47:51AM -0400, Jeff King wrote:
>>> As far as I can tell, fsck's mode-checking has been totally broken
>>> basically forever. Which makes me a little nervous to fix it. :)
>>> linux.git does have some bogus modes, but they are 100664, which is
>>> specifically ignored here unless "fsck --strict" is in effect.
>> I'm in favor of checking the mode in fsck, at least when --strict.
>> But I would suggest we be lax (by default) about other likely-to-exist
>> but strictly invalid modes to prevent peoples previously workable
>> repositories from being now broken.
>>
>> I have, for example, encountered 100775 in the wild, and would argue that
>> like 100644, it should probably not fail unless we are in --strict mode.
> Yeah, I'd agree with that. The big question is: what breakage have we
> seen in the wild? :)
>
> I think treating 100775 the same as 100664 makes sense (want to do a
> patch?). Do we know of any others? I guess we can collect them as time
> goes on and reports come in. That's not the nicest thing for people with
> such repos, but then again, their repos _are_ broken (and it's only
> really a showstopper if they are trying to push to somebody with
> receive.fsckObjects turned on).
>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking
2014-10-12 22:37 ` Ben Aveling
@ 2014-10-14 8:21 ` Jeff King
[not found] ` <543F074B.2050907@optusnet.com.au>
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2014-10-14 8:21 UTC (permalink / raw)
To: Ben Aveling; +Cc: Edward Thomson, git
On Mon, Oct 13, 2014 at 09:37:27AM +1100, Ben Aveling wrote:
> A question about fsck - is there a reason it doesn't have an option to
> delete bad objects?
If the objects are reachable, then deleting them would create other big
problems (i.e., we would be breaking the object graph!).
If they are not, then it is probably safest for them to go away via the
normal means (repack/prune via "git gc"). Deleting via fsck would mean
replicating the reachability and deletion code found elsewhere.
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-10-20 9:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-23 15:47 [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking Jeff King
2014-09-23 16:23 ` Edward Thomson
2014-09-23 16:30 ` Jeff King
2014-10-12 22:37 ` Ben Aveling
2014-10-14 8:21 ` Jeff King
[not found] ` <543F074B.2050907@optusnet.com.au>
2014-10-16 0:20 ` Jeff King
2014-10-19 12:40 ` Ben Aveling
2014-10-20 9:21 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).