All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Jeff King" <peff@peff.net>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Lars Schneider" <larsxschneider@gmail.com>,
	"Mike Hommey" <mh@glandium.org>,
	git@vger.kernel.org
Subject: Re: [PATCH] travis-ci: run previously failed tests first, then slowest to fastest
Date: Mon, 25 Jan 2016 23:41:40 +0100	[thread overview]
Message-ID: <20160125224140.GN7100@hank> (raw)
In-Reply-To: <xmqqk2mxa7ug.fsf@gitster.mtv.corp.google.com>

On 01/25, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
>
> > On 01/24, Junio C Hamano wrote:
> >> To put it differently, if a blob stored at path has CRLF line
> >> endings and .gitattributes is changed after the fact to say that it
> >> must have LF line endings, we should treat it as a broken transitory
> >> state.
> >
> > Right, I wasn't considering this as a broken state, because t0025 uses
> > just this to transition between the states.
> >
> >> There may have to be a way to "fix" an already "wrong" blob
> >> in the index that is milder than "rm --cached && add .", but I do
> >> not think write_entry(), which is shared by all the normal codepaths
> >> that writes out to the working tree, is the best place to do so, if
> >> doing so forces the contents of the paths to be always re-checked,
> >> just in case the user is in such a broken transitory state.
> >
> > Maybe I'm misunderstanding something, but the contents of the paths
> > are only re-checked if we are in such a broken transition state, and
>
> What I do not understand here is how the added check ensures that
> "only if in such a broken transition state".  would_convert_to_git()
> does not take the contents but is called only with the pathname to
> key into the attributes, so in a typical cross platform project
> where all the source files are "text" and the repository can set
> core.eol to adjust the end of line convention for its working tree,
> the added check has no way to differentiate the paths that are
> recorded with CRLF line endings in the object database by mistake,
> i.e. the ones in the broken transitory state, and all the other
> paths that are following the "text" and storing their blobs with LF
> line endings.  The added check would trigger "is it racy" check to
> re-reads the contents that we have written out from the working tree
> for the paths with "wrong" blobs, but how would it avoid doing so
> for the paths whose blobs are already stored correctly?
>
> The new code affects not just "reset --hard", but everybody who
> writes out from the object database to the working tree and records
> that these paths are checked out in the index.  How does the new
> code avoid destroying the performance for all paths?

I misunderstood the way would_convert_to_git() works, I should have
actually read the code, instead of just relying on my test, which was
even wrong.  Sorry about the confusion, my patch does indeed hurt
the performance.

> > the file stored in git has crlf line endings, and thus would be
> > normalized.  In this case we currently re-check the contents of that
> > file anyway, at least when the file and the index have the same mtime,
> > and we actually show the correct output.
>
> The right way to at that "correct output", I think, is that it
> happens to be shown that way by accident.  It is questionable that
> it is correct to report that such a path is modified.  Immediately
> after you check out a path to the working tree out of the index, via
> "reset --hard" and "checkout $path", by definition it ought to be
> the same between the working tree file and the index.
>
> Unless it is in this transititory broken state, that is.
>
> The "by accident" happens only because racy-git avoidance is being
> implemented in one particular way.  Is about protecting people from
> making changes to the working tree files immediately after their
> previous contents are registered to the index (and the index files
> written to the disk), and immediately after we write things out of
> the index and by definition the result and the indexed blob ought to
> match there is no reason to re-read and recheck unless the working
> tree files are further edited.
>
> The current way the racy-git avoidance works by re-reading and
> re-checking the contents when the timestamps match is merely one
> possible implementation, and that is the only thing that produces
> your "correct" output most of the time in this test.  We could have
> waited after writing the index time for a second before giving the
> control back to the user instead, which would have also allowed us
> to implement the racy-git avoidance, and in that alternate world,
> all these transitory broken paths would have been correctly reported
> as unmodified.
>
> IOW, I would think the test in question is insisting a single
> outcome for an operation whose result is undefined, and it is
> harmful to twist the system by pessimizing the common cases just
> to cater to this transititory broken state.
>
> I am not saying that we shouldn't have support for users to fix
> their repository and get out of this transititory broken state.  A
> recent work by Torsten Bögershausen to have ls-files report the end
> of line convention used in the blob in the index and the settings
> that affect conversion for each path (among other things) is a step
> in the right direction.  With a support like that, those who noticed
> that they by mistake added CRLF files to the index as-is when they
> wanted their project to be cross platform can recover from it by
> setting necessary attributes (i.e. mark them as "text") and then
> find paths that are broken out of "ls-files --eol" output to see
> which ones are not using lf end-of-line in the index.
>
> I do not think there is a canned command to help dealing with these
> broken paths right now.  You would have to check them out of the
> index (you would get a CRLF file in the working tree in the example
> we are discussing), fix the line endings (you would run dos2unix on
> it in this example, as you would want "text=true" attribute) and
> "git add" them to recover manually, but I can imagine that Torsten's
> work can be extended to do all of these, without molesting the
> working tree files, with minimum work by the end user.  That is:
>
>  * Reuse Torsten's "ls-files --eol" code to find paths that record
>    the blob in the index that does not follow the eol convention
>    specified for the path;
>
>  * For each of these index entries, run convert_to_working_tree() on
>    the indexed contents, and then on the result of it, run
>    convert_to_git().  The result is the blob that the index ought to
>    have had, if it were to be consistent with the attribute
>    settings.  So add that to the index.
>
>  * Write the index out.
>
>  * Tell the user to commit or commit it automatically with a canned
>    log message "fix broken encoding" or something.

Thanks for the thorough explanation, and the patch in the next email,
I'll have a look at that tomorrow.

--
Thomas

  parent reply	other threads:[~2016-01-25 22:41 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-19  9:24 [PATCH] travis-ci: run previously failed tests first, then slowest to fastest larsxschneider
2016-01-19 19:12 ` Jeff King
2016-01-19 23:00   ` Junio C Hamano
2016-01-20  0:26     ` Mike Hommey
2016-01-20  1:46       ` Junio C Hamano
2016-01-20  1:56         ` Jeff King
2016-01-20  9:22         ` Lars Schneider
2016-01-22  2:33           ` brian m. carlson
2016-01-22  5:52             ` Jeff King
2016-01-22  6:07               ` Jeff King
2016-01-24 14:34                 ` Thomas Gummerer
2016-01-24 20:03                   ` Junio C Hamano
2016-01-24 22:05                     ` Junio C Hamano
2016-01-25 14:42                       ` Thomas Gummerer
2016-01-25 17:26                         ` Junio C Hamano
2016-01-25 21:52                           ` Junio C Hamano
2016-01-27 15:16                             ` Clemens Buchacher
2016-01-27 19:05                               ` Junio C Hamano
2016-01-27 20:49                                 ` Junio C Hamano
2016-01-28  7:10                                   ` Clemens Buchacher
2016-01-28 21:32                                     ` Junio C Hamano
2016-01-30  8:13                                       ` Clemens Buchacher
2016-02-01 18:17                                         ` Junio C Hamano
2016-02-01 19:33                                           ` Clemens Buchacher
2016-02-02 23:14                                             ` Junio C Hamano
2016-02-03  8:31                                               ` Junio C Hamano
2016-02-01 20:26                                           ` Torsten Bögershausen
2016-01-28  6:20                                 ` eol round trip Was: [PATCH] travis-ci: run previously failed Torsten Bögershausen
2016-01-25 22:41                           ` Thomas Gummerer [this message]
2016-01-20  1:53       ` [PATCH] travis-ci: run previously failed tests first, then slowest to fastest Jeff King
2016-01-20  9:10       ` Lars Schneider
2016-01-19 20:00 ` Junio C Hamano
2016-01-19 22:53   ` Junio C Hamano
2016-01-19 23:06     ` Jeff King
2016-01-19 23:26       ` Junio C Hamano
2016-01-19 23:29         ` Jeff King
2016-01-19 23:30         ` Junio C Hamano
2016-01-19 23:27       ` Jeff King
2016-01-20  7:55   ` Johannes Schindelin
2016-01-20  9:04   ` Lars Schneider
2016-01-20 20:35     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160125224140.GN7100@hank \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.