git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Additional FAQ entries
@ 2024-07-04  0:38 brian m. carlson
  2024-07-04  0:38 ` [PATCH v3 1/4] gitfaq: add documentation on proxies brian m. carlson
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-04  0:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

This series introduces some additional Git FAQ entries on various
topics.  They are all things I've seen in my professional life or on
Stack Overflow, so I've written documentation.

There were some suggestions in the past that the text "modify, tamper
with, or buffer" might be somewhat redundant, but I've chosen to keep
the text as it is to avoid arguments like, "Well, buffering the entire
request or response isn't really modifying it, so Git should just work
in that situation," when we already know that doesn't work.

Changes from v2 (partial):
* Add documentation on proxies to the configuration documentation as
  well.
* Mention some security problems that are known to occur with TLS MITM
  proxies.  This mirrors the similar Git LFS documentation.
* Provide a documentation example about how to use proxies with SSH.
* Recommend running a `git fsck` after syncing with rsync.

Changes from v1:
* Drop the monorepo patch for now; I want to revise it further.
* Reorder the working tree patch to place more warnings up front.
* Mention core.gitproxy and socat.
* Rephrase text in the EOL entry to read correctly and be easier to
  understand.
* Improve the commit message for the working tree FAQ entry to make it
  clearer that users wish to transfer uncommitted changes.

brian m. carlson (4):
  gitfaq: add documentation on proxies
  gitfaq: give advice on using eol attribute in gitattributes
  gitfaq: add entry about syncing working trees
  doc: mention that proxies must be completely transparent

 Documentation/config/http.txt |   5 ++
 Documentation/gitfaq.txt      | 105 ++++++++++++++++++++++++++++++++--
 2 files changed, 104 insertions(+), 6 deletions(-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/4] gitfaq: add documentation on proxies
  2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
@ 2024-07-04  0:38 ` brian m. carlson
  2024-07-04  0:38 ` [PATCH v3 2/4] gitfaq: give advice on using eol attribute in gitattributes brian m. carlson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-04  0:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

Many corporate environments and local systems have proxies in use.  Note
the situations in which proxies can be used and how to configure them.
At the same time, note what standards a proxy must follow to work with
Git.  Explicitly call out certain classes that are known to routinely
have problems reported various places online, including in the Git for
Windows issue tracker and on Stack Overflow, and recommend against the
use of such software, noting that they are associated with myriad
security problems (including, for example, breaking sandboxing and image
integrity[0], and, for TLS middleboxes, the use of insecure protocols
and ciphers and lack of certificate verification[1]). Don't mention the
specific nature of these security problems in the FAQ entry because they
are extremely numerous and varied and we wish to keep the FAQ entry
relatively brief.

[0] https://issues.chromium.org/issues/40285192
[1] https://faculty.cc.gatech.edu/~mbailey/publications/ndss17_interception.pdf

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 8c1f2d5675..e4125b1178 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -241,6 +241,42 @@ How do I know if I want to do a fetch or a pull?::
 	ignore the upstream changes.  A pull consists of a fetch followed
 	immediately by either a merge or rebase.  See linkgit:git-pull[1].
 
+[[proxy]]
+Can I use a proxy with Git?::
+	Yes, Git supports the use of proxies.  Git honors the standard `http_proxy`,
+	`https_proxy`, and `no_proxy` environment variables commonly used on Unix, and
+	it also can be configured with `http.proxy` and similar options for HTTPS (see
+	linkgit:git-config[1]).  The `http.proxy` and related options can be
+	customized on a per-URL pattern basis.  In addition, Git can in theory
+	function normally with transparent proxies that exist on the network.
++
+For SSH, Git can support a proxy using OpenSSH's `ProxyCommand`. Commonly used
+tools include `netcat` and `socat`.  However, they must be configured not to
+exit when seeing EOF on standard input, which usually means that `netcat` will
+require `-q` and `socat` will require a timeout with something like `-t 10`.
+This is required because the way the Git SSH server knows that no more requests
+will be made is an EOF on standard input, but when that happens, the server may
+not have yet processed the final request, so dropping the connection at that
+point would interrupt that request.
++
+An example configuration entry in `~/.ssh/config` with an HTTP proxy might look
+like this:
++
+----
+Host git.example.org
+    User git
+    ProxyCommand socat -t 10 - PROXY:proxy.example.org:%h:%p,proxyport=8080
+----
++
+Note that in all cases, for Git to work properly, the proxy must be completely
+transparent.  The proxy cannot modify, tamper with, or buffer the connection in
+any way, or Git will almost certainly fail to work.  Note that many proxies,
+including many TLS middleboxes, Windows antivirus and firewall programs other
+than Windows Defender and Windows Firewall, and filtering proxies fail to meet
+this standard, and as a result end up breaking Git.  Because of the many
+reports of problems and their poor security history, we recommend against the
+use of these classes of software and devices.
+
 Merging and Rebasing
 --------------------
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/4] gitfaq: give advice on using eol attribute in gitattributes
  2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
  2024-07-04  0:38 ` [PATCH v3 1/4] gitfaq: add documentation on proxies brian m. carlson
@ 2024-07-04  0:38 ` brian m. carlson
  2024-07-04  5:22   ` Junio C Hamano
  2024-07-04  0:38 ` [PATCH v3 3/4] gitfaq: add entry about syncing working trees brian m. carlson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: brian m. carlson @ 2024-07-04  0:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

In the FAQ, we tell people how to use the text attribute, but we fail to
explain what to do with the eol attribute.  As we ourselves have
noticed, most shell implementations do not care for carriage returns,
and as such, people will practically always want them to use LF endings.
Similar things can be said for batch files on Windows, except with CRLF
endings.

Since these are common things to have in a repository, let's help users
make a good decision by recommending that they use the gitattributes
file to correctly check out the endings.

In addition, let's correct the cross-reference to this question, which
originally referred to "the following entry", even though a new entry
has been inserted in between.  The cross-reference notation should
prevent this from occurring and provide a link in formats, such as HTML,
which support that.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index e4125b1178..cdc5f5f4f8 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -393,8 +393,9 @@ I'm on Windows and git diff shows my files as having a `^M` at the end.::
 +
 You can store the files in the repository with Unix line endings and convert
 them automatically to your platform's line endings.  To do that, set the
-configuration option `core.eol` to `native` and see the following entry for
-information about how to configure files as text or binary.
+configuration option `core.eol` to `native` and see
+<<recommended-storage-settings,the question on recommended storage settings>>
+for information about how to configure files as text or binary.
 +
 You can also control this behavior with the `core.whitespace` setting if you
 don't wish to remove the carriage returns from your line endings.
@@ -456,14 +457,26 @@ references, URLs, and hashes stored in the repository.
 +
 We also recommend setting a linkgit:gitattributes[5] file to explicitly mark
 which files are text and which are binary.  If you want Git to guess, you can
-set the attribute `text=auto`.  For example, the following might be appropriate
-in some projects:
+set the attribute `text=auto`.
++
+With text files, Git will generally ensure that LF endings are used in the
+repository, and will honor `core.autocrlf` and `core.eol` to decide what options
+to use when checking files out.  You can also override this by specifying a
+particular line ending such as `eol=lf` or `eol=crlf` if those files must always
+have that ending in the working tree (e.g., for functionality reasons).
++
+For example, generally shell files must have LF endings and batch files must
+have CRLF endings, so the following might be appropriate in some projects:
 +
 ----
 # By default, guess.
 *	text=auto
 # Mark all C files as text.
 *.c	text
+# Ensure all shell files have LF endings and all batch files have CRLF
+# endings in the working tree and both have LF in the repo.
+*.sh text eol=lf
+*.bat text eol=crlf
 # Mark all JPEG files as binary.
 *.jpg	binary
 ----

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/4] gitfaq: add entry about syncing working trees
  2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
  2024-07-04  0:38 ` [PATCH v3 1/4] gitfaq: add documentation on proxies brian m. carlson
  2024-07-04  0:38 ` [PATCH v3 2/4] gitfaq: give advice on using eol attribute in gitattributes brian m. carlson
@ 2024-07-04  0:38 ` brian m. carlson
  2024-07-04  5:21   ` Junio C Hamano
  2024-07-04  0:38 ` [PATCH v3 4/4] doc: mention that proxies must be completely transparent brian m. carlson
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: brian m. carlson @ 2024-07-04  0:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

Users very commonly want to sync their working tree with uncommitted
changes across machines, often to carry across in-progress work or
stashes.  Despite this not being a recommended approach, users want to
do it and are not dissuaded by suggestions not to, so let's recommend a
sensible technique.

The technique that many users are using is their preferred cloud syncing
service, which is a bad idea.  Users have reported problems where they
end up with duplicate files that won't go away (with names like "file.c
2"), broken references, oddly named references that have date stamps
appended to them, missing objects, and general corruption and data loss.
That's because almost all of these tools sync file by file, which is a
great technique if your project is a single word processing document or
spreadsheet, but is utterly abysmal for Git repositories because they
don't necessarily snapshot the entire repository correctly.  They also
tend to sync the files immediately instead of when the repository is
quiescent, so writing multiple files, as occurs during a commit or a gc,
can confuse the tools and lead to corruption.

We know that the old standby, rsync, is up to the task, provided that
the repository is quiescent, so let's suggest that and dissuade people
from using cloud syncing tools.  Let's tell people about common things
they should be aware of before doing this and that this is still
potentially risky.  Additionally, let's tell people that Git's security
model does not permit sharing working trees across users in case they
planned to do that.  While we'd still prefer users didn't try to do
this, hopefully this will lead them in a safer direction.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 48 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index cdc5f5f4f8..0e7f1c680d 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -83,8 +83,8 @@ Windows would be the configuration `"C:\Program Files\Vim\gvim.exe" --nofork`,
 which quotes the filename with spaces and specifies the `--nofork` option to
 avoid backgrounding the process.
 
-Credentials
------------
+Credentials and Transfers
+-------------------------
 
 [[http-credentials]]
 How do I specify my credentials when pushing over HTTP?::
@@ -185,6 +185,50 @@ Then, you can adjust your push URL to use `git@example_author` or
 `git@example_committer` instead of `git@example.org` (e.g., `git remote set-url
 git@example_author:org1/project1.git`).
 
+[[sync-working-tree]]
+How do I sync a working tree across systems?::
+	First, decide whether you want to do this at all.  Git works best when you
+	push or pull your work using the typical `git push` and `git fetch` commands
+	and isn't designed to share a working tree across systems.  This is
+	potentially risky and in some cases can cause repository corruption or data
+	loss.
++
+Usually, doing so will cause `git status` to need to re-read every file in the
+working tree.  Additionally, Git's security model does not permit sharing a
+working tree across untrusted users, so it is only safe to sync a working tree
+if it will only be used by a single user across all machines.
++
+It is important not to use a cloud syncing service to sync any portion of a Git
+repository, since this can cause corruption, such as missing objects, changed
+or added files, broken refs, and a wide variety of other corruption.  These
+services tend to sync file by file on a continuous basis and don't understand
+the structure of a Git repository.  This is especially bad if they sync the
+repository in the middle of it being updated, since that is very likely to
+cause incomplete or partial updates and therefore data loss.
++
+Therefore, it's better to push your work to either the other system or a central
+server using the normal push and pull mechanism.  However, this doesn't always
+preserve important data, like stashes, so some people prefer to share a working
+tree across systems.
++
+If you do this, the recommended approach is to use `rsync -a --delete-after`
+(ideally with an encrypted connection such as with `ssh`) on the root of
+repository.  You should ensure several things when you do this:
++
+* If you have additional worktrees or a separate Git directory, they must be
+  synced at the same time as the main working tree and repository.
+* You are comfortable with the destination directory being an exact copy of the
+  source directory, _deleting any data that is already there_.
+* The repository (including all worktrees and the Git directory) is in a
+  quiescent state for the duration of the transfer (that is, no operations of
+  any sort are taking place on it, including background operations like `git
+  gc` and operations invoked by your editor).
++
+Be aware that even with these recommendations, syncing in this way has some risk
+since it bypasses Git's normal integrity checking for repositories, so having
+backups is advised.  You may also wish to do a `git fsck` to verify the
+integrity of your data on the destination system after syncing.
+
 Common Issues
 -------------
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/4] doc: mention that proxies must be completely transparent
  2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
                   ` (2 preceding siblings ...)
  2024-07-04  0:38 ` [PATCH v3 3/4] gitfaq: add entry about syncing working trees brian m. carlson
@ 2024-07-04  0:38 ` brian m. carlson
  2024-07-04  1:25 ` [PATCH v3 0/4] Additional FAQ entries Junio C Hamano
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-04  0:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

We already document in the FAQ that proxies must be completely
transparent and not modify the request or response in any way, but add
similar documentation to the http.proxy entry.  We know that while the
FAQ is very useful, users sometimes are less likely to read in favor of
the documentation specific to an option or command, so adding it in both
places will help users be adequately informed.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/config/http.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt
index 2d4e0c9b86..a9c7480f6a 100644
--- a/Documentation/config/http.txt
+++ b/Documentation/config/http.txt
@@ -7,6 +7,11 @@ http.proxy::
 	linkgit:gitcredentials[7] for more information. The syntax thus is
 	'[protocol://][user[:password]@]proxyhost[:port]'. This can be overridden
 	on a per-remote basis; see remote.<name>.proxy
++
+Any proxy, however configured, must be completely transparent and must not
+modify, transform, or buffer the request or response in any way.  Proxies which
+are not completely transparent are known to cause various forms of breakage
+with Git.
 
 http.proxyAuthMethod::
 	Set the method with which to authenticate against the HTTP proxy. This

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] Additional FAQ entries
  2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
                   ` (3 preceding siblings ...)
  2024-07-04  0:38 ` [PATCH v3 4/4] doc: mention that proxies must be completely transparent brian m. carlson
@ 2024-07-04  1:25 ` Junio C Hamano
  2024-07-04  5:22 ` Junio C Hamano
  2024-07-09 23:37 ` [PATCH v4 " brian m. carlson
  6 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2024-07-04  1:25 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> This series introduces some additional Git FAQ entries on various
> topics.  They are all things I've seen in my professional life or on
> Stack Overflow, so I've written documentation.

Just to help other readers:

 v1
 https://lore.kernel.org/git/20211020010624.675562-1-sandals@crustytoothpaste.net/

 v2
 https://lore.kernel.org/git/20211107225525.431138-1-sandals@crustytoothpaste.net/

are where the previous discussions are found.

> There were some suggestions in the past that the text "modify, tamper
> with, or buffer" might be somewhat redundant, but I've chosen to keep
> the text as it is to avoid arguments like, "Well, buffering the entire
> request or response isn't really modifying it, so Git should just work
> in that situation," when we already know that doesn't work.
>
> Changes from v2 (partial):
> * Add documentation on proxies to the configuration documentation as
>   well.
> * Mention some security problems that are known to occur with TLS MITM
>   proxies.  This mirrors the similar Git LFS documentation.
> * Provide a documentation example about how to use proxies with SSH.
> * Recommend running a `git fsck` after syncing with rsync.
>
> Changes from v1:
> * Drop the monorepo patch for now; I want to revise it further.
> * Reorder the working tree patch to place more warnings up front.
> * Mention core.gitproxy and socat.
> * Rephrase text in the EOL entry to read correctly and be easier to
>   understand.
> * Improve the commit message for the working tree FAQ entry to make it
>   clearer that users wish to transfer uncommitted changes.
>
> brian m. carlson (4):
>   gitfaq: add documentation on proxies
>   gitfaq: give advice on using eol attribute in gitattributes
>   gitfaq: add entry about syncing working trees
>   doc: mention that proxies must be completely transparent
>
>  Documentation/config/http.txt |   5 ++
>  Documentation/gitfaq.txt      | 105 ++++++++++++++++++++++++++++++++--
>  2 files changed, 104 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: add entry about syncing working trees
  2024-07-04  0:38 ` [PATCH v3 3/4] gitfaq: add entry about syncing working trees brian m. carlson
@ 2024-07-04  5:21   ` Junio C Hamano
  2024-07-04 21:08     ` brian m. carlson
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2024-07-04  5:21 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> -Credentials
> ------------
> +Credentials and Transfers
> +-------------------------

I can see (and appreciate) that you struggled to find a good section
to piggyback on, instead of giving this topic its own section.  But
do these two make a good mix?  They seem to be totally different
topics.

> +It is important not to use a cloud syncing service to sync any portion of a Git
> +repository, since this can cause corruption, such as missing objects, changed
> +or added files, broken refs, and a wide variety of other corruption.  These
> +services tend to sync file by file on a continuous basis and don't understand
> +the structure of a Git repository.  This is especially bad if they sync the
> +repository in the middle of it being updated, since that is very likely to
> +cause incomplete or partial updates and therefore data loss.

A naïve reader may say "but isn't it the point of these cloud
syncing service that they will eventually catch up???" and we may
want to have a good story why it does not work.

    You create many objects in one repository in loose form, cloud
    syncing service kicks in to transfer them to the second
    repository, and then in the original repository an auto-gc kicks
    in so some of the loose objects fail to propagate.  The packfile
    that is the result of auto-gc will eventually propagate to the
    second repository, but before it completes, the second
    repository would be in an inconsistent state, and especially if
    the ref updates are propagated before objects, then the second
    repository will be in a corrupt state.  It would be a disaster
    if another auto-gc kicked in there.

is one scenario I came up with.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/4] gitfaq: give advice on using eol attribute in gitattributes
  2024-07-04  0:38 ` [PATCH v3 2/4] gitfaq: give advice on using eol attribute in gitattributes brian m. carlson
@ 2024-07-04  5:22   ` Junio C Hamano
  2024-07-04 21:10     ` brian m. carlson
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2024-07-04  5:22 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> In the FAQ, we tell people how to use the text attribute, but we fail to
> explain what to do with the eol attribute.  As we ourselves have
> noticed, most shell implementations do not care for carriage returns,
> and as such, people will practically always want them to use LF endings.
> Similar things can be said for batch files on Windows, except with CRLF
> endings.

Sounds good.

> Since these are common things to have in a repository, let's help users
> make a good decision by recommending that they use the gitattributes
> file to correctly check out the endings.
>
> In addition, let's correct the cross-reference to this question, which
> originally referred to "the following entry", even though a new entry
> has been inserted in between.  The cross-reference notation should
> prevent this from occurring and provide a link in formats, such as HTML,
> which support that.

Thanks for being forward-looking and extra careful.

> +With text files, Git will generally ensure that LF endings are used in the
> +repository, and will honor `core.autocrlf` and `core.eol` to decide what options
> +to use when checking files out.  You can also override this by specifying a
> +particular line ending such as `eol=lf` or `eol=crlf` if those files must always

"this" being ... Not what gets stored in the object database but
what is done to the working tree.

What is being "overridden" is that the earlier two mentioned here
are configuration variables that apply to _all_ text files in
general, and the attribute mechanism is a way to give settings that
are more tailored for each path.  I think the reason I found the
above a bit hard to understand when I read it for the first time was
because it didn't "click" that this paragraph was about configuration
giving the general default and attributes overriding it.  Perhaps...

    ... are used in the repository.  The `core.autocrlf` and
    `core.eol` configuration variables specify what line-ending
    convention is followed when any text file is checked out.  You
    can also use the `eol` attribute (e.g., "eol=crlf") to override
    which files get what line-ending treatment.

or something?

> +have that ending in the working tree (e.g., for functionality reasons).

I'd strike "(e.g., for functionality reasons)" out, as the next
paragraph makes it sufficiently clear.

> +For example, generally shell files must have LF endings and batch files must
> +have CRLF endings, so the following might be appropriate in some projects:
>  +
>  ----
>  # By default, guess.
>  *	text=auto
>  # Mark all C files as text.
>  *.c	text
> +# Ensure all shell files have LF endings and all batch files have CRLF
> +# endings in the working tree and both have LF in the repo.
> +*.sh text eol=lf
> +*.bat text eol=crlf
>  # Mark all JPEG files as binary.
>  *.jpg	binary
>  ----

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] Additional FAQ entries
  2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
                   ` (4 preceding siblings ...)
  2024-07-04  1:25 ` [PATCH v3 0/4] Additional FAQ entries Junio C Hamano
@ 2024-07-04  5:22 ` Junio C Hamano
  2024-07-04 21:23   ` brian m. carlson
  2024-07-09 23:37 ` [PATCH v4 " brian m. carlson
  6 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2024-07-04  5:22 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> This series introduces some additional Git FAQ entries on various
> topics.  They are all things I've seen in my professional life or on
> Stack Overflow, so I've written documentation.
>
> There were some suggestions in the past that the text "modify, tamper
> with, or buffer" might be somewhat redundant, but I've chosen to keep
> the text as it is to avoid arguments like, "Well, buffering the entire
> request or response isn't really modifying it, so Git should just work
> in that situation," when we already know that doesn't work.

Buffering the entire thing will break because ...?  Deadlock?  Or is
there anything more subtle going on?

Are we affected by any frame boundary (do we even notice?) that
happens at layer lower than our own pkt-line layer at all (i.e. we
sent two chunks and we fail to work on them correctly if the network
collapses them into one chunk, without changing a single byte, just
changing the number of read() system calls that reads them?)?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: add entry about syncing working trees
  2024-07-04  5:21   ` Junio C Hamano
@ 2024-07-04 21:08     ` brian m. carlson
  2024-07-06  5:50       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: brian m. carlson @ 2024-07-04 21:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

[-- Attachment #1: Type: text/plain, Size: 3031 bytes --]

On 2024-07-04 at 05:21:55, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > -Credentials
> > ------------
> > +Credentials and Transfers
> > +-------------------------
> 
> I can see (and appreciate) that you struggled to find a good section
> to piggyback on, instead of giving this topic its own section.  But
> do these two make a good mix?  They seem to be totally different
> topics.

I can try again.

> > +It is important not to use a cloud syncing service to sync any portion of a Git
> > +repository, since this can cause corruption, such as missing objects, changed
> > +or added files, broken refs, and a wide variety of other corruption.  These
> > +services tend to sync file by file on a continuous basis and don't understand
> > +the structure of a Git repository.  This is especially bad if they sync the
> > +repository in the middle of it being updated, since that is very likely to
> > +cause incomplete or partial updates and therefore data loss.
> 
> A naïve reader may say "but isn't it the point of these cloud
> syncing service that they will eventually catch up???" and we may
> want to have a good story why it does not work.
> 
>     You create many objects in one repository in loose form, cloud
>     syncing service kicks in to transfer them to the second
>     repository, and then in the original repository an auto-gc kicks
>     in so some of the loose objects fail to propagate.  The packfile
>     that is the result of auto-gc will eventually propagate to the
>     second repository, but before it completes, the second
>     repository would be in an inconsistent state, and especially if
>     the ref updates are propagated before objects, then the second
>     repository will be in a corrupt state.  It would be a disaster
>     if another auto-gc kicked in there.
> 
> is one scenario I came up with.

The most common situation we see is that refs tend to be renamed to
things like "refs/heads/main 2", which is obviously not a valid refname
and doesn't work, or the ref gets rolled back to an older version.
Working trees also get stuck into weird states where files keep coming
back or getting deleted, or the index gets two differently named copies,
neither of which is "index".

It is _less_ likely that objects are renamed, but it could be that the
tool thinks they've been legitimately deleted if the loose objects get
packed and then they do get deleted elsewhere without another source of
those objects existing.  I'm not sure how object loss happens in the
real world with these services, but there have been users reporting it
on StackOverflow, so I'm confident it does occur.

If we have users who ask about this, I'm happy to answer them on the
list.  I don't want to explain the various and sundry scenarios in the
FAQ entry in order to keep it short, but I can find several examples of
problems if need be.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/4] gitfaq: give advice on using eol attribute in gitattributes
  2024-07-04  5:22   ` Junio C Hamano
@ 2024-07-04 21:10     ` brian m. carlson
  0 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-04 21:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

On 2024-07-04 at 05:22:13, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > +With text files, Git will generally ensure that LF endings are used in the
> > +repository, and will honor `core.autocrlf` and `core.eol` to decide what options
> > +to use when checking files out.  You can also override this by specifying a
> > +particular line ending such as `eol=lf` or `eol=crlf` if those files must always
> 
> "this" being ... Not what gets stored in the object database but
> what is done to the working tree.
> 
> What is being "overridden" is that the earlier two mentioned here
> are configuration variables that apply to _all_ text files in
> general, and the attribute mechanism is a way to give settings that
> are more tailored for each path.  I think the reason I found the
> above a bit hard to understand when I read it for the first time was
> because it didn't "click" that this paragraph was about configuration
> giving the general default and attributes overriding it.  Perhaps...
> 
>     ... are used in the repository.  The `core.autocrlf` and
>     `core.eol` configuration variables specify what line-ending
>     convention is followed when any text file is checked out.  You
>     can also use the `eol` attribute (e.g., "eol=crlf") to override
>     which files get what line-ending treatment.
> 
> or something?

Sure, that sounds like a nice improvement.

> > +have that ending in the working tree (e.g., for functionality reasons).
> 
> I'd strike "(e.g., for functionality reasons)" out, as the next
> paragraph makes it sufficiently clear.

Sure, I can do that.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] Additional FAQ entries
  2024-07-04  5:22 ` Junio C Hamano
@ 2024-07-04 21:23   ` brian m. carlson
  2024-07-06  5:59     ` Junio C Hamano
  2024-07-06  6:47     ` Jeff King
  0 siblings, 2 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-04 21:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

[-- Attachment #1: Type: text/plain, Size: 2702 bytes --]

On 2024-07-04 at 05:22:27, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > This series introduces some additional Git FAQ entries on various
> > topics.  They are all things I've seen in my professional life or on
> > Stack Overflow, so I've written documentation.
> >
> > There were some suggestions in the past that the text "modify, tamper
> > with, or buffer" might be somewhat redundant, but I've chosen to keep
> > the text as it is to avoid arguments like, "Well, buffering the entire
> > request or response isn't really modifying it, so Git should just work
> > in that situation," when we already know that doesn't work.
> 
> Buffering the entire thing will break because ...?  Deadlock?  Or is
> there anything more subtle going on?

When we use the smart HTTP protocol, the server sends keep-alive and
status messages as one of the data streams, which is important because
(a) the user is usually impatient and wants to know what's going on and
(b) it may take a long time to pack the data, especially for large
repositories, and sending no data may result in the connection being
dropped or the client being served a 500 by an intermediate layer.  We
know this does happen and I've seen reports of it.

We've also seen some cases where proxies refuse to accept
Transfer-Encoding: chunked (let's party like it's 1999) and send a 411
back since there's no Content-Length header.  That's presumably because
they want to scan the contents for "bad" data all in one chunk, but Git
has to stream the contents unless the data fits in the buffer size.
(This is the one case where http.postBuffer actually makes a
difference.)  I very much doubt that the appliance actually wants to get
a 2 GiB payload to scan, since it probably doesn't have tons of memory
in the first place, but that is what it's asking for.

> Are we affected by any frame boundary (do we even notice?) that
> happens at layer lower than our own pkt-line layer at all (i.e. we
> sent two chunks and we fail to work on them correctly if the network
> collapses them into one chunk, without changing a single byte, just
> changing the number of read() system calls that reads them?)?

No, that's not a problem.  We read four bytes for the pkt-line header,
and then we read the entire body based on that length until we get all
of it.  This is also the way OpenSSL works for TLS packets and is known
to work well.  If the underlying TCP connection provides a partial or
incomplete packet (which can happen due to MTU), we'll just block until
the rest comes in, which is fine.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: add entry about syncing working trees
  2024-07-04 21:08     ` brian m. carlson
@ 2024-07-06  5:50       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2024-07-06  5:50 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> The most common situation we see is that refs tend to be renamed to
> things like "refs/heads/main 2", which is obviously not a valid refname
> and doesn't work, or the ref gets rolled back to an older version.
> Working trees also get stuck into weird states where files keep coming
> back or getting deleted, or the index gets two differently named copies,
> neither of which is "index".
>
> It is _less_ likely that objects are renamed, but it could be that the
> tool thinks they've been legitimately deleted if the loose objects get
> packed and then they do get deleted elsewhere without another source of
> those objects existing.

Yeah, any time two repositories that are "cloud synched" are
accessed simultaneously, all h*ll can easily break loose.  You may
move your 'master' branch to a commit while the other one may move
their 'master' branch to a different commit.  You may end up having
"master" that points at one of these commits but one of you may have
already lost the only reference to the commit you wanted to have at
the tip of your 'master' branch.  One of you may even trigger auto-gc
to spread the damage.

> If we have users who ask about this, I'm happy to answer them on the
> list.  I don't want to explain the various and sundry scenarios in the
> FAQ entry in order to keep it short, but I can find several examples of
> problems if need be.

OK, that approach would work as long as you are still involved in
the project, but having even one concrete example would help in the
longer term to (1) reduce the bus factor and (2) save time you do
not have to spend responding to every such question.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] Additional FAQ entries
  2024-07-04 21:23   ` brian m. carlson
@ 2024-07-06  5:59     ` Junio C Hamano
  2024-07-08  0:52       ` brian m. carlson
  2024-07-06  6:47     ` Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2024-07-06  5:59 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> Buffering the entire thing will break because ...?  Deadlock?  Or is
>> there anything more subtle going on?
>
> When we use the smart HTTP protocol, the server sends keep-alive and
> status messages as one of the data streams, which is important because
> (a) the user is usually impatient and wants to know what's going on and
> (b) it may take a long time to pack the data, especially for large
> repositories, and sending no data may result in the connection being
> dropped or the client being served a 500 by an intermediate layer.  We
> know this does happen and I've seen reports of it.

And this is an example of "a proxy that buffers the data, without
modifying or tampering with, would still break transport"?

> We've also seen some cases where proxies refuse to accept
> Transfer-Encoding: chunked (let's party like it's 1999) and send a 411
> back since there's no Content-Length header.

This is "a proxy that wanted to buffer the data but failed to do so"
that ended up modifying the data Gits sitting at both ends of the
connection can observe, so it is a bit different issue.  It clearly
falls into "modify or tampering with" category.

I forgot to say this clearly when I wrote the message you are
responding to, but I am trying to see if we can clarify the "or
buffer" part in "modify, tamper with, or buffer", as offhand I did
not think of a reason why a proxy would break the Git communication
if it receives a segment that was 2MB originally from upload-pack,
and forwards the contents of the segment in two 1MB segments without
tampering or modifying the payload bytes at all to fetch-pack.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] Additional FAQ entries
  2024-07-04 21:23   ` brian m. carlson
  2024-07-06  5:59     ` Junio C Hamano
@ 2024-07-06  6:47     ` Jeff King
  2024-07-06 17:18       ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2024-07-06  6:47 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Junio C Hamano, git, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee

On Thu, Jul 04, 2024 at 09:23:28PM +0000, brian m. carlson wrote:

> > Buffering the entire thing will break because ...?  Deadlock?  Or is
> > there anything more subtle going on?
> 
> When we use the smart HTTP protocol, the server sends keep-alive and
> status messages as one of the data streams, which is important because
> (a) the user is usually impatient and wants to know what's going on and
> (b) it may take a long time to pack the data, especially for large
> repositories, and sending no data may result in the connection being
> dropped or the client being served a 500 by an intermediate layer.  We
> know this does happen and I've seen reports of it.

Additionally, I think for non-HTTP transports (think proxying ssh
through socat or similar), buffering the v0 protocol is likely a total
disaster. The fetch protocol assumes both sides spewing at each other in
real time.

HTTP, even v0, follows a request/response model, so we're safer there. I
do think some amount of buffering is often going to be OK in practice.
You'd get delayed keep-alives and progress reports, which may range from
"annoying" to "something in the middle decided to time out". So I'm OK
with just telling people "make sure your proxies aren't buffering" as a
general rule, rather than trying to get into the nitty gritty of what is
going to break and how.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] Additional FAQ entries
  2024-07-06  6:47     ` Jeff King
@ 2024-07-06 17:18       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2024-07-06 17:18 UTC (permalink / raw)
  To: Jeff King
  Cc: brian m. carlson, git, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee

Jeff King <peff@peff.net> writes:

> On Thu, Jul 04, 2024 at 09:23:28PM +0000, brian m. carlson wrote:
>
>> > Buffering the entire thing will break because ...?  Deadlock?  Or is
>> > there anything more subtle going on?
>> 
>> When we use the smart HTTP protocol, the server sends keep-alive and
>> status messages as one of the data streams, which is important because
>> (a) the user is usually impatient and wants to know what's going on and
>> (b) it may take a long time to pack the data, especially for large
>> repositories, and sending no data may result in the connection being
>> dropped or the client being served a 500 by an intermediate layer.  We
>> know this does happen and I've seen reports of it.
>
> Additionally, I think for non-HTTP transports (think proxying ssh
> through socat or similar), buffering the v0 protocol is likely a total
> disaster. The fetch protocol assumes both sides spewing at each other in
> real time.

Yeah, beyond one "window" that a series of "have"s are allowed to be
in flight, no further "have"s are sent before seeing an "ack/nack"
response, so if you buffer too much, they can deadlock fairly easily.

> ... So I'm OK
> with just telling people "make sure your proxies aren't buffering" as a
> general rule, rather than trying to get into the nitty gritty of what is
> going to break and how.

Sounds fair.  Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] Additional FAQ entries
  2024-07-06  5:59     ` Junio C Hamano
@ 2024-07-08  0:52       ` brian m. carlson
  0 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-08  0:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Eric Sunshine, Derrick Stolee,
	Jeff King

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

On 2024-07-06 at 05:59:57, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> >> Buffering the entire thing will break because ...?  Deadlock?  Or is
> >> there anything more subtle going on?
> >
> > When we use the smart HTTP protocol, the server sends keep-alive and
> > status messages as one of the data streams, which is important because
> > (a) the user is usually impatient and wants to know what's going on and
> > (b) it may take a long time to pack the data, especially for large
> > repositories, and sending no data may result in the connection being
> > dropped or the client being served a 500 by an intermediate layer.  We
> > know this does happen and I've seen reports of it.
> 
> And this is an example of "a proxy that buffers the data, without
> modifying or tampering with, would still break transport"?

Yes.  The connection usually ends up dropped from the view of the
client, which is hard to debug (because it also looks like a network
problem, except often without any output from the remote side).
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 0/4] Additional FAQ entries
  2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
                   ` (5 preceding siblings ...)
  2024-07-04  5:22 ` Junio C Hamano
@ 2024-07-09 23:37 ` brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 1/4] gitfaq: add documentation on proxies brian m. carlson
                     ` (3 more replies)
  6 siblings, 4 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-09 23:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

This series introduces some additional Git FAQ entries on various
topics.  They are all things I've seen in my professional life or on
Stack Overflow, so I've written documentation.

Changes from v3:
* Improve text for eol options in .gitattributes.
* Split working tree syncing into its own category.
* Add some more explanation about what might go wrong in such a case.
* Rephrase some text to avoid repetition.

Changes from v2 (partial):
* Add documentation on proxies to the configuration documentation as
  well.
* Mention some security problems that are known to occur with TLS MITM
  proxies.  This mirrors the similar Git LFS documentation.
* Provide a documentation example about how to use proxies with SSH.
* Recommend running a `git fsck` after syncing with rsync.

Changes from v1:
* Drop the monorepo patch for now; I want to revise it further.
* Reorder the working tree patch to place more warnings up front.
* Mention core.gitproxy and socat.
* Rephrase text in the EOL entry to read correctly and be easier to
  understand.
* Improve the commit message for the working tree FAQ entry to make it
  clearer that users wish to transfer uncommitted changes.

brian m. carlson (4):
  gitfaq: add documentation on proxies
  gitfaq: give advice on using eol attribute in gitattributes
  gitfaq: add entry about syncing working trees
  doc: mention that proxies must be completely transparent

 Documentation/config/http.txt |   5 ++
 Documentation/gitfaq.txt      | 109 ++++++++++++++++++++++++++++++++--
 2 files changed, 110 insertions(+), 4 deletions(-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/4] gitfaq: add documentation on proxies
  2024-07-09 23:37 ` [PATCH v4 " brian m. carlson
@ 2024-07-09 23:37   ` brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 2/4] gitfaq: give advice on using eol attribute in gitattributes brian m. carlson
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-09 23:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

Many corporate environments and local systems have proxies in use.  Note
the situations in which proxies can be used and how to configure them.
At the same time, note what standards a proxy must follow to work with
Git.  Explicitly call out certain classes that are known to routinely
have problems reported various places online, including in the Git for
Windows issue tracker and on Stack Overflow, and recommend against the
use of such software, noting that they are associated with myriad
security problems (including, for example, breaking sandboxing and image
integrity[0], and, for TLS middleboxes, the use of insecure protocols
and ciphers and lack of certificate verification[1]). Don't mention the
specific nature of these security problems in the FAQ entry because they
are extremely numerous and varied and we wish to keep the FAQ entry
relatively brief.

[0] https://issues.chromium.org/issues/40285192
[1] https://faculty.cc.gatech.edu/~mbailey/publications/ndss17_interception.pdf

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 8c1f2d5675..e4125b1178 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -241,6 +241,42 @@ How do I know if I want to do a fetch or a pull?::
 	ignore the upstream changes.  A pull consists of a fetch followed
 	immediately by either a merge or rebase.  See linkgit:git-pull[1].
 
+[[proxy]]
+Can I use a proxy with Git?::
+	Yes, Git supports the use of proxies.  Git honors the standard `http_proxy`,
+	`https_proxy`, and `no_proxy` environment variables commonly used on Unix, and
+	it also can be configured with `http.proxy` and similar options for HTTPS (see
+	linkgit:git-config[1]).  The `http.proxy` and related options can be
+	customized on a per-URL pattern basis.  In addition, Git can in theory
+	function normally with transparent proxies that exist on the network.
++
+For SSH, Git can support a proxy using OpenSSH's `ProxyCommand`. Commonly used
+tools include `netcat` and `socat`.  However, they must be configured not to
+exit when seeing EOF on standard input, which usually means that `netcat` will
+require `-q` and `socat` will require a timeout with something like `-t 10`.
+This is required because the way the Git SSH server knows that no more requests
+will be made is an EOF on standard input, but when that happens, the server may
+not have yet processed the final request, so dropping the connection at that
+point would interrupt that request.
++
+An example configuration entry in `~/.ssh/config` with an HTTP proxy might look
+like this:
++
+----
+Host git.example.org
+    User git
+    ProxyCommand socat -t 10 - PROXY:proxy.example.org:%h:%p,proxyport=8080
+----
++
+Note that in all cases, for Git to work properly, the proxy must be completely
+transparent.  The proxy cannot modify, tamper with, or buffer the connection in
+any way, or Git will almost certainly fail to work.  Note that many proxies,
+including many TLS middleboxes, Windows antivirus and firewall programs other
+than Windows Defender and Windows Firewall, and filtering proxies fail to meet
+this standard, and as a result end up breaking Git.  Because of the many
+reports of problems and their poor security history, we recommend against the
+use of these classes of software and devices.
+
 Merging and Rebasing
 --------------------
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/4] gitfaq: give advice on using eol attribute in gitattributes
  2024-07-09 23:37 ` [PATCH v4 " brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 1/4] gitfaq: add documentation on proxies brian m. carlson
@ 2024-07-09 23:37   ` brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 3/4] gitfaq: add entry about syncing working trees brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 4/4] doc: mention that proxies must be completely transparent brian m. carlson
  3 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-09 23:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

In the FAQ, we tell people how to use the text attribute, but we fail to
explain what to do with the eol attribute.  As we ourselves have
noticed, most shell implementations do not care for carriage returns,
and as such, people will practically always want them to use LF endings.
Similar things can be said for batch files on Windows, except with CRLF
endings.

Since these are common things to have in a repository, let's help users
make a good decision by recommending that they use the gitattributes
file to correctly check out the endings.

In addition, let's correct the cross-reference to this question, which
originally referred to "the following entry", even though a new entry
has been inserted in between.  The cross-reference notation should
prevent this from occurring and provide a link in formats, such as HTML,
which support that.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index e4125b1178..058ef32a97 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -393,8 +393,9 @@ I'm on Windows and git diff shows my files as having a `^M` at the end.::
 +
 You can store the files in the repository with Unix line endings and convert
 them automatically to your platform's line endings.  To do that, set the
-configuration option `core.eol` to `native` and see the following entry for
-information about how to configure files as text or binary.
+configuration option `core.eol` to `native` and see
+<<recommended-storage-settings,the question on recommended storage settings>>
+for information about how to configure files as text or binary.
 +
 You can also control this behavior with the `core.whitespace` setting if you
 don't wish to remove the carriage returns from your line endings.
@@ -456,14 +457,26 @@ references, URLs, and hashes stored in the repository.
 +
 We also recommend setting a linkgit:gitattributes[5] file to explicitly mark
 which files are text and which are binary.  If you want Git to guess, you can
-set the attribute `text=auto`.  For example, the following might be appropriate
-in some projects:
+set the attribute `text=auto`.
++
+With text files, Git will generally ensure that LF endings are used in the
+repository.  The `core.autocrlf` and `core.eol` configuration variables specify
+what line-ending convention is followed when any text file is checked out.  You
+can also use the `eol` attribute (e.g., `eol=crlf`) to override which files get
+what line-ending treatment.
++
+For example, generally shell files must have LF endings and batch files must
+have CRLF endings, so the following might be appropriate in some projects:
 +
 ----
 # By default, guess.
 *	text=auto
 # Mark all C files as text.
 *.c	text
+# Ensure all shell files have LF endings and all batch files have CRLF
+# endings in the working tree and both have LF in the repo.
+*.sh text eol=lf
+*.bat text eol=crlf
 # Mark all JPEG files as binary.
 *.jpg	binary
 ----

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/4] gitfaq: add entry about syncing working trees
  2024-07-09 23:37 ` [PATCH v4 " brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 1/4] gitfaq: add documentation on proxies brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 2/4] gitfaq: give advice on using eol attribute in gitattributes brian m. carlson
@ 2024-07-09 23:37   ` brian m. carlson
  2024-07-09 23:37   ` [PATCH v4 4/4] doc: mention that proxies must be completely transparent brian m. carlson
  3 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-09 23:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

Users very commonly want to sync their working tree with uncommitted
changes across machines, often to carry across in-progress work or
stashes.  Despite this not being a recommended approach, users want to
do it and are not dissuaded by suggestions not to, so let's recommend a
sensible technique.

The technique that many users are using is their preferred cloud syncing
service, which is a bad idea.  Users have reported problems where they
end up with duplicate files that won't go away (with names like "file.c
2"), broken references, oddly named references that have date stamps
appended to them, missing objects, and general corruption and data loss.
That's because almost all of these tools sync file by file, which is a
great technique if your project is a single word processing document or
spreadsheet, but is utterly abysmal for Git repositories because they
don't necessarily snapshot the entire repository correctly.  They also
tend to sync the files immediately instead of when the repository is
quiescent, so writing multiple files, as occurs during a commit or a gc,
can confuse the tools and lead to corruption.

We know that the old standby, rsync, is up to the task, provided that
the repository is quiescent, so let's suggest that and dissuade people
from using cloud syncing tools.  Let's tell people about common things
they should be aware of before doing this and that this is still
potentially risky.  Additionally, let's tell people that Git's security
model does not permit sharing working trees across users in case they
planned to do that.  While we'd still prefer users didn't try to do
this, hopefully this will lead them in a safer direction.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 52 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 058ef32a97..f2917d142c 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -185,6 +185,58 @@ Then, you can adjust your push URL to use `git@example_author` or
 `git@example_committer` instead of `git@example.org` (e.g., `git remote set-url
 git@example_author:org1/project1.git`).
 
+Transfers
+---------
+
+[[sync-working-tree]]
+How do I sync a working tree across systems?::
+	First, decide whether you want to do this at all.  Git works best when you
+	push or pull your work using the typical `git push` and `git fetch` commands
+	and isn't designed to share a working tree across systems.  This is
+	potentially risky and in some cases can cause repository corruption or data
+	loss.
++
+Usually, doing so will cause `git status` to need to re-read every file in the
+working tree.  Additionally, Git's security model does not permit sharing a
+working tree across untrusted users, so it is only safe to sync a working tree
+if it will only be used by a single user across all machines.
++
+It is important not to use a cloud syncing service to sync any portion of a Git
+repository, since this can cause corruption, such as missing objects, changed
+or added files, broken refs, and a wide variety of other problems.  These
+services tend to sync file by file on a continuous basis and don't understand
+the structure of a Git repository.  This is especially bad if they sync the
+repository in the middle of it being updated, since that is very likely to
+cause incomplete or partial updates and therefore data loss.
++
+An example of the kind of corruption that can occur is conflicts over the state
+of refs, such that both sides end up with different commits on a branch that
+the other doesn't have.  This can result in important objects becoming
+unreferenced and possibly pruned by `git gc`, causing data loss.
++
+Therefore, it's better to push your work to either the other system or a central
+server using the normal push and pull mechanism.  However, this doesn't always
+preserve important data, like stashes, so some people prefer to share a working
+tree across systems.
++
+If you do this, the recommended approach is to use `rsync -a --delete-after`
+(ideally with an encrypted connection such as with `ssh`) on the root of
+repository.  You should ensure several things when you do this:
++
+* If you have additional worktrees or a separate Git directory, they must be
+  synced at the same time as the main working tree and repository.
+* You are comfortable with the destination directory being an exact copy of the
+  source directory, _deleting any data that is already there_.
+* The repository (including all worktrees and the Git directory) is in a
+  quiescent state for the duration of the transfer (that is, no operations of
+  any sort are taking place on it, including background operations like `git
+  gc` and operations invoked by your editor).
++
+Be aware that even with these recommendations, syncing in this way has some risk
+since it bypasses Git's normal integrity checking for repositories, so having
+backups is advised.  You may also wish to do a `git fsck` to verify the
+integrity of your data on the destination system after syncing.
+
 Common Issues
 -------------
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 4/4] doc: mention that proxies must be completely transparent
  2024-07-09 23:37 ` [PATCH v4 " brian m. carlson
                     ` (2 preceding siblings ...)
  2024-07-09 23:37   ` [PATCH v4 3/4] gitfaq: add entry about syncing working trees brian m. carlson
@ 2024-07-09 23:37   ` brian m. carlson
  3 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2024-07-09 23:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Eric Sunshine,
	Derrick Stolee, Jeff King

We already document in the FAQ that proxies must be completely
transparent and not modify the request or response in any way, but add
similar documentation to the http.proxy entry.  We know that while the
FAQ is very useful, users sometimes are less likely to read in favor of
the documentation specific to an option or command, so adding it in both
places will help users be adequately informed.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/config/http.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt
index 2d4e0c9b86..a9c7480f6a 100644
--- a/Documentation/config/http.txt
+++ b/Documentation/config/http.txt
@@ -7,6 +7,11 @@ http.proxy::
 	linkgit:gitcredentials[7] for more information. The syntax thus is
 	'[protocol://][user[:password]@]proxyhost[:port]'. This can be overridden
 	on a per-remote basis; see remote.<name>.proxy
++
+Any proxy, however configured, must be completely transparent and must not
+modify, transform, or buffer the request or response in any way.  Proxies which
+are not completely transparent are known to cause various forms of breakage
+with Git.
 
 http.proxyAuthMethod::
 	Set the method with which to authenticate against the HTTP proxy. This

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-07-09 23:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-04  0:38 [PATCH v3 0/4] Additional FAQ entries brian m. carlson
2024-07-04  0:38 ` [PATCH v3 1/4] gitfaq: add documentation on proxies brian m. carlson
2024-07-04  0:38 ` [PATCH v3 2/4] gitfaq: give advice on using eol attribute in gitattributes brian m. carlson
2024-07-04  5:22   ` Junio C Hamano
2024-07-04 21:10     ` brian m. carlson
2024-07-04  0:38 ` [PATCH v3 3/4] gitfaq: add entry about syncing working trees brian m. carlson
2024-07-04  5:21   ` Junio C Hamano
2024-07-04 21:08     ` brian m. carlson
2024-07-06  5:50       ` Junio C Hamano
2024-07-04  0:38 ` [PATCH v3 4/4] doc: mention that proxies must be completely transparent brian m. carlson
2024-07-04  1:25 ` [PATCH v3 0/4] Additional FAQ entries Junio C Hamano
2024-07-04  5:22 ` Junio C Hamano
2024-07-04 21:23   ` brian m. carlson
2024-07-06  5:59     ` Junio C Hamano
2024-07-08  0:52       ` brian m. carlson
2024-07-06  6:47     ` Jeff King
2024-07-06 17:18       ` Junio C Hamano
2024-07-09 23:37 ` [PATCH v4 " brian m. carlson
2024-07-09 23:37   ` [PATCH v4 1/4] gitfaq: add documentation on proxies brian m. carlson
2024-07-09 23:37   ` [PATCH v4 2/4] gitfaq: give advice on using eol attribute in gitattributes brian m. carlson
2024-07-09 23:37   ` [PATCH v4 3/4] gitfaq: add entry about syncing working trees brian m. carlson
2024-07-09 23:37   ` [PATCH v4 4/4] doc: mention that proxies must be completely transparent brian m. carlson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).