From: Elijah Newren <newren@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Subject: Re: [PATCH 00/25] Documentation fixes
Date: Mon, 9 Oct 2023 09:46:21 -0700 [thread overview]
Message-ID: <CABPp-BGsg8CnX1EXpeTwtQzEBPA3ZoTmLKGa1d7TqAg4aAB3EA@mail.gmail.com> (raw)
In-Reply-To: <ZSNbALj63zjzOURN@nand.local>
On Sun, Oct 8, 2023 at 6:44 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Sun, Oct 08, 2023 at 06:45:02AM +0000, Elijah Newren via GitGitGadget wrote:
> > It turns out that AI is pretty good at making small fixes to documentation;
> > certainly not perfect, but it provides quite good signal. Unfortunately,
> > there is a lot to sift through. Some points about my strategy:
>
> Quite interesting ;-).
>
> I'm curious to learn a little bit more about your
> strategy beyond what you wrote:
>
> - What tool did you use? ChatGPT? Something home-grown?
A mixture of gpt-4 and gpt-4-32k (I would have just used gpt-4, but
trying to give it a full file blows the token limit on several of
Git's documentation files).
Also, it was sent to an internally hosted instance. On this internal
instance, it seemed to require passing the
api-version=2023-03-15-preview parameter. I don't really know what
that parameter means, but I suspect it might have been some
6-months-ish old version of gpt-4?
> - (Assuming this was generated by some sort of LLM): what did you
> prompt it with?
Note that it was exactly one file per prompt, which was as follows:
"""
For the asciidoc file below, are there any typos, grammatical errors,
or wording problems? If so, please highlight them along with proposed
corrections:
--------------------
${FILE_CONTENTS}
"""
If I had to do it over, I'd be much more explicit about the output
format. Probably, "Please respond by outputting the full file, with
any corrections included. If there are no corrections, simply output
the original file as-is." which would allow me to simply diff the
output and look at the changes.
Also, I would probably specify that "The ascii doc file starts three
lines below, just after the line of dashes", hoping that would help it
avoid sometimes presuming that the dashes were part of the file.
> - What was the output format: the edited text in its entirety, or a
> patch that can be applied on top?
My wording was unfortunately vague, so I sometimes got human prose
instructing me with a change to make, sometimes I got a bulleted list
in the form "${old_text} -> ${new_text}", but most of the time it
printed the file (or a subset thereof) with corrections. I also had
all the output concatenated into one large file, which made it "fun"
to work through all the changes. Even when diffing files, I manually
applied any changes I saw to the actual file (which did risk
introducing new typos, and missing some of the corrections, but did
ensure I reviewed everything).
Also, not only did I get different output formats, but there were many
times the file was cut off at some point. I sometimes assumed that
just meant there were no changes outside that region, but there were
times where there was only one change and it had given me hundreds of
lines of context around it before it cut off, so it did leave me with
the feeling it might have only processed or responded to part of the
file.
There were also several times where the changes it suggested were a
no-op, making me wonder if it just failed or something -- I looked at
it really closely (including sometimes piping the output through xxd,
and thus once noticed a change of tab-after-period to
space-after-period), but when it was responding with human prose and
said something like "Change the sentence that reads '${old_version}'
-> '${old_version}', it made me wonder if something just went haywire
with the LLM and I should retry.
However, despite the above issues making me think there are more
documentation issues to be found with an LLM, I didn't re-check any
files unless I got an error with no output (e.g. excessive number of
tokens, or I've hit rate limits on using the API). I didn't bother,
because the firehose of changes it provided me even without those
caveats was far more than enough to deal with.
next prev parent reply other threads:[~2023-10-09 16:46 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-08 6:45 [PATCH 00/25] Documentation fixes Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 01/25] documentation: wording improvements Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 02/25] documentation: fix small error Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 03/25] documentation: fix typos Elijah Newren via GitGitGadget
2023-10-08 16:32 ` Ramsay Jones
2023-10-09 19:01 ` Junio C Hamano
2023-10-10 0:57 ` Elijah Newren
2023-10-08 6:45 ` [PATCH 04/25] documentation: fix apostrophe usage Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 05/25] documentation: add missing words Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 06/25] documentation: remove extraneous words Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 07/25] documentation: fix subject/verb agreement Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 08/25] documentation: employ consistent verb tense for a list Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 09/25] documentation: fix verb tense Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 10/25] documentation: fix adjective vs. noun Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 11/25] documentation: fix verb " Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 12/25] documentation: fix singular vs. plural Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 13/25] documentation: whitespace is already generally plural Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 14/25] documentation: fix choice of article Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 15/25] documentation: add missing article Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 16/25] documentation: remove unnecessary hyphens Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 17/25] documentation: add missing hyphens Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 18/25] documentation: use clearer prepositions Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 19/25] documentation: fix punctuation Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 20/25] documentation: fix capitalization Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 21/25] documentation: fix whitespace issues Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 22/25] documentation: add some commas where they are helpful Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 23/25] documentation: add missing fullstops Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 24/25] documentation: add missing quotes Elijah Newren via GitGitGadget
2023-10-08 6:45 ` [PATCH 25/25] documentation: add missing parenthesis Elijah Newren via GitGitGadget
2023-10-09 1:44 ` [PATCH 00/25] Documentation fixes Taylor Blau
2023-10-09 16:46 ` Elijah Newren [this message]
2023-10-16 21:54 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABPp-BGsg8CnX1EXpeTwtQzEBPA3ZoTmLKGa1d7TqAg4aAB3EA@mail.gmail.com \
--to=newren@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).