From: Thomas Gummerer <t.gummerer@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Jeff King" <peff@peff.net>,
"Torsten Bögershausen" <tboegi@web.de>,
"brian m. carlson" <sandals@crustytoothpaste.net>,
"Lars Schneider" <larsxschneider@gmail.com>,
"Mike Hommey" <mh@glandium.org>,
git@vger.kernel.org
Subject: Re: [PATCH] travis-ci: run previously failed tests first, then slowest to fastest
Date: Mon, 25 Jan 2016 23:41:40 +0100 [thread overview]
Message-ID: <20160125224140.GN7100@hank> (raw)
In-Reply-To: <xmqqk2mxa7ug.fsf@gitster.mtv.corp.google.com>
On 01/25, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
>
> > On 01/24, Junio C Hamano wrote:
> >> To put it differently, if a blob stored at path has CRLF line
> >> endings and .gitattributes is changed after the fact to say that it
> >> must have LF line endings, we should treat it as a broken transitory
> >> state.
> >
> > Right, I wasn't considering this as a broken state, because t0025 uses
> > just this to transition between the states.
> >
> >> There may have to be a way to "fix" an already "wrong" blob
> >> in the index that is milder than "rm --cached && add .", but I do
> >> not think write_entry(), which is shared by all the normal codepaths
> >> that writes out to the working tree, is the best place to do so, if
> >> doing so forces the contents of the paths to be always re-checked,
> >> just in case the user is in such a broken transitory state.
> >
> > Maybe I'm misunderstanding something, but the contents of the paths
> > are only re-checked if we are in such a broken transition state, and
>
> What I do not understand here is how the added check ensures that
> "only if in such a broken transition state". would_convert_to_git()
> does not take the contents but is called only with the pathname to
> key into the attributes, so in a typical cross platform project
> where all the source files are "text" and the repository can set
> core.eol to adjust the end of line convention for its working tree,
> the added check has no way to differentiate the paths that are
> recorded with CRLF line endings in the object database by mistake,
> i.e. the ones in the broken transitory state, and all the other
> paths that are following the "text" and storing their blobs with LF
> line endings. The added check would trigger "is it racy" check to
> re-reads the contents that we have written out from the working tree
> for the paths with "wrong" blobs, but how would it avoid doing so
> for the paths whose blobs are already stored correctly?
>
> The new code affects not just "reset --hard", but everybody who
> writes out from the object database to the working tree and records
> that these paths are checked out in the index. How does the new
> code avoid destroying the performance for all paths?
I misunderstood the way would_convert_to_git() works, I should have
actually read the code, instead of just relying on my test, which was
even wrong. Sorry about the confusion, my patch does indeed hurt
the performance.
> > the file stored in git has crlf line endings, and thus would be
> > normalized. In this case we currently re-check the contents of that
> > file anyway, at least when the file and the index have the same mtime,
> > and we actually show the correct output.
>
> The right way to at that "correct output", I think, is that it
> happens to be shown that way by accident. It is questionable that
> it is correct to report that such a path is modified. Immediately
> after you check out a path to the working tree out of the index, via
> "reset --hard" and "checkout $path", by definition it ought to be
> the same between the working tree file and the index.
>
> Unless it is in this transititory broken state, that is.
>
> The "by accident" happens only because racy-git avoidance is being
> implemented in one particular way. Is about protecting people from
> making changes to the working tree files immediately after their
> previous contents are registered to the index (and the index files
> written to the disk), and immediately after we write things out of
> the index and by definition the result and the indexed blob ought to
> match there is no reason to re-read and recheck unless the working
> tree files are further edited.
>
> The current way the racy-git avoidance works by re-reading and
> re-checking the contents when the timestamps match is merely one
> possible implementation, and that is the only thing that produces
> your "correct" output most of the time in this test. We could have
> waited after writing the index time for a second before giving the
> control back to the user instead, which would have also allowed us
> to implement the racy-git avoidance, and in that alternate world,
> all these transitory broken paths would have been correctly reported
> as unmodified.
>
> IOW, I would think the test in question is insisting a single
> outcome for an operation whose result is undefined, and it is
> harmful to twist the system by pessimizing the common cases just
> to cater to this transititory broken state.
>
> I am not saying that we shouldn't have support for users to fix
> their repository and get out of this transititory broken state. A
> recent work by Torsten Bögershausen to have ls-files report the end
> of line convention used in the blob in the index and the settings
> that affect conversion for each path (among other things) is a step
> in the right direction. With a support like that, those who noticed
> that they by mistake added CRLF files to the index as-is when they
> wanted their project to be cross platform can recover from it by
> setting necessary attributes (i.e. mark them as "text") and then
> find paths that are broken out of "ls-files --eol" output to see
> which ones are not using lf end-of-line in the index.
>
> I do not think there is a canned command to help dealing with these
> broken paths right now. You would have to check them out of the
> index (you would get a CRLF file in the working tree in the example
> we are discussing), fix the line endings (you would run dos2unix on
> it in this example, as you would want "text=true" attribute) and
> "git add" them to recover manually, but I can imagine that Torsten's
> work can be extended to do all of these, without molesting the
> working tree files, with minimum work by the end user. That is:
>
> * Reuse Torsten's "ls-files --eol" code to find paths that record
> the blob in the index that does not follow the eol convention
> specified for the path;
>
> * For each of these index entries, run convert_to_working_tree() on
> the indexed contents, and then on the result of it, run
> convert_to_git(). The result is the blob that the index ought to
> have had, if it were to be consistent with the attribute
> settings. So add that to the index.
>
> * Write the index out.
>
> * Tell the user to commit or commit it automatically with a canned
> log message "fix broken encoding" or something.
Thanks for the thorough explanation, and the patch in the next email,
I'll have a look at that tomorrow.
--
Thomas
next prev parent reply other threads:[~2016-01-25 22:41 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-19 9:24 [PATCH] travis-ci: run previously failed tests first, then slowest to fastest larsxschneider
2016-01-19 19:12 ` Jeff King
2016-01-19 23:00 ` Junio C Hamano
2016-01-20 0:26 ` Mike Hommey
2016-01-20 1:46 ` Junio C Hamano
2016-01-20 1:56 ` Jeff King
2016-01-20 9:22 ` Lars Schneider
2016-01-22 2:33 ` brian m. carlson
2016-01-22 5:52 ` Jeff King
2016-01-22 6:07 ` Jeff King
2016-01-24 14:34 ` Thomas Gummerer
2016-01-24 20:03 ` Junio C Hamano
2016-01-24 22:05 ` Junio C Hamano
2016-01-25 14:42 ` Thomas Gummerer
2016-01-25 17:26 ` Junio C Hamano
2016-01-25 21:52 ` Junio C Hamano
2016-01-27 15:16 ` Clemens Buchacher
2016-01-27 19:05 ` Junio C Hamano
2016-01-27 20:49 ` Junio C Hamano
2016-01-28 7:10 ` Clemens Buchacher
2016-01-28 21:32 ` Junio C Hamano
2016-01-30 8:13 ` Clemens Buchacher
2016-02-01 18:17 ` Junio C Hamano
2016-02-01 19:33 ` Clemens Buchacher
2016-02-02 23:14 ` Junio C Hamano
2016-02-03 8:31 ` Junio C Hamano
2016-02-01 20:26 ` Torsten Bögershausen
2016-01-28 6:20 ` eol round trip Was: [PATCH] travis-ci: run previously failed Torsten Bögershausen
2016-01-25 22:41 ` Thomas Gummerer [this message]
2016-01-20 1:53 ` [PATCH] travis-ci: run previously failed tests first, then slowest to fastest Jeff King
2016-01-20 9:10 ` Lars Schneider
2016-01-19 20:00 ` Junio C Hamano
2016-01-19 22:53 ` Junio C Hamano
2016-01-19 23:06 ` Jeff King
2016-01-19 23:26 ` Junio C Hamano
2016-01-19 23:29 ` Jeff King
2016-01-19 23:30 ` Junio C Hamano
2016-01-19 23:27 ` Jeff King
2016-01-20 7:55 ` Johannes Schindelin
2016-01-20 9:04 ` Lars Schneider
2016-01-20 20:35 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160125224140.GN7100@hank \
--to=t.gummerer@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=larsxschneider@gmail.com \
--cc=mh@glandium.org \
--cc=peff@peff.net \
--cc=sandals@crustytoothpaste.net \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).