From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Jeff King <peff@peff.net>,
Jeff Hostetler <jeffhost@microsoft.com>,
Elijah Newren <newren@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [RFC PATCH 19/21] usage API: use C99 macros for {usage,usagef,die,error,warning,die}*()
Date: Wed, 29 Dec 2021 00:42:48 +0100 [thread overview]
Message-ID: <211229.868rw4i95a.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <9952005b-9174-7578-7718-e9576b27b4ce@jeffhostetler.com>
On Tue, Dec 28 2021, Jeff Hostetler wrote:
> On 12/27/21 6:01 PM, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Dec 27 2021, Jeff Hostetler wrote:
>>
>>> On 11/15/21 5:18 PM, Ævar Arnfjörð Bjarmason wrote:
>>> [...]
>>>> It might be a good change to remove the "fmt" key from the "error"
>>>> events as a follow-up change. As these few examples from running the
>>>> test suite show it's sometimes redundant (same as the "msg"), rather
>>>> useless (just a "%s"), or something we could now mostly aggregate by
>>>> file/line instead of the normalized printf format:
>>>> 1 file":"builtin/gc.c","line":1391,"msg":"'bogus' is not a
>>>> valid task","fmt":"'%s' is not a valid task"}
>>>> 1 file":"builtin/for-each-ref.c","line":89,"msg":"format: %(then) atom used more than once","fmt":"%s"}
>>>> 1 file":"builtin/fast-import.c","line":411,"msg":"Garbage after mark: N :202 :302x","fmt":"Garbage after mark: %s"}
>>>> "Mostly" here assumes that it would be OK if the aggregation changed
>>>> between git versions, which may be what all users of trace2 want. The
>>>> change that introduced the "fmt" code was ee4512ed481 (trace2: create
>>>> new combined trace facility, 2019-02-22), and the documentation change
>>>> was e544221d97a (trace2: Documentation/technical/api-trace2.txt,
>>>> 2019-02-22).
>>>> Both are rather vague on what problem "fmt" solved exactly, aside
>>>> from
>>>> the obvious one of it being impossible to do meaningful aggregations
>>>> due to the "file" and "line" being the same everywhere, which isn't
>>>> the case now.
>>>> In any case, let's leave "fmt" be for now, the above summary was
>>>> given
>>>> in case it's interesting to remove it in the future, e.g. to save
>>>> space in trace2 payloads.
>>>
>>> I added the "fmt" field so that we could do aggregations
>>> of error messages across multiple users without regard
>>> to what branch or filename or percentage or whatever was
>>> formatted into the actual "msg" written to stderr.
>>>
>>> The actual file:line wasn't useful (primarily because it
>>> was probably something in usage.c), but even if we fix that
>>> it might not be useful if we have users running 10 different
>>> versions of Git (because some people don't upgrade immediately).
>>>
>>> So I'd rather not kill it right now.
>> Thanks. I'm not trying to kill it, but just poking at what it was
>> for
>> exactly.
>> Depending on the answer to that perhaps we didn't need it anymore,
>> but
>> the explanation you provide (mostly) makes sense.
>> The "mostly" being because I'm assuming that you only need to deal
>> with
>> LC_ALL=C users?
>> I.e. the documented promise that you can group things by "fmt"
>> doesn't
>> hold if you're processing even streams from users who are using a
>> translated git, because we'll get the translated format string, not the
>> original.
>
> I just did a query on the data we've collected over the last
> few weeks and there are only English error messages in the
> database, so yes LC_ALL=C seems to be the norm.
Ah, that explains it.
>> For that we'd need to change the API from/to to:
>> - error(_("some format %s"), ...)
>> + error(N_("some format %s"), ...)
>
> So no, I don't think it is worth the complexity to change
> this. Besides, wouldn't you need to more machinery under
> the hood -- to emit the untranslated string to trace2 and
> the translated string to stderr? (As in, move the translation
> down a layer??)
Yes, hence N_(), it marks the string for translation, but doesn't
translate it. So the underlying function would call _() for what we emit
to stderr, but not for the "fmt" field.
> My "fmt" field is not worth that effort.
Probably not to you because you're deploying git into a monolingual
environment, but for anyone who is they'd need to maintain manual
groupings of messages by scraping our po/*.po files.
> And besides, my goal was only to get the "top 10 or 20 errors"
> across a large set of users. I guess I'm thinking of it as a
> sample rather than an exhaustive list, so it is OK if we don't
> capture the translated strings.
I suppose with enough users it wouldn't matter either way, but you'd get
quite a bit of fragmentation. You'd also have errors that don't differ
between translations (e.g. those "%s" cases) amplified in count, as they
won't fragment due to i18n.
> Something else to consider is the GDPR. The "fmt" string is
> generic (e.g. "path '%s' exists on disk, but not in the index")
> but doesn't leak an PII or otherwise sensitive data. Whereas
> the corresponding "msg" field does include the pathname in this
> example. So if someone is post-processing the data and aggregating,
> they may want to relay only the "fmt" field and not the "msg" field
> to their database.
Indeed.
> (Granted, there are lots of PII and GDPR problematic fields in
> the data stream that a post-processor would need to be aware of,
> but all of that is outside of the scope of the Trace2 logging.
> I only mention it here because the "fmt" field may be useful for
> reasons not previously discussed.)
I do think it would be a good thing to work on, i.e. to make the logging
less verbose, which as a side-effect would make it GDPR compliant.
next prev parent reply other threads:[~2021-12-28 23:48 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-15 22:18 [RFC PATCH 00/21] C99: show meaningful <file>:<line> in trace2 via macros Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 01/21] git-compat-util.h: clarify GCC v.s. C99-specific in comment Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 02/21] C99 support: hard-depend on C99 variadic macros Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 03/21] usage.c: add a die_message() routine Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 04/21] usage.c API users: use die_message() where appropriate Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 05/21] usage.c + gc: add and use a die_message_errno() Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 06/21] config API: don't use vreportf(), make it static in usage.c Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 07/21] common-main.c: call exit(), don't return Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 08/21] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 09/21] parse-options.[ch] API: use bug() to improve error output Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 10/21] receive-pack: use bug() and BUG_if_bug() Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 11/21] cache-tree.c: " Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 12/21] pack-objects: use BUG(...) not die("BUG: ...") Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 13/21] strbuf.h: " Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 14/21] usage API: create a new usage.h, move API docs there Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 15/21] usage.[ch] API users: use report_fn, not hardcoded prototype Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 16/21] usage.[ch] API: rename "warn" vars functions to "warning" Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 17/21] usage.c: move usage routines around Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 18/21] usage.c: move rename variables in " Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 19/21] usage API: use C99 macros for {usage,usagef,die,error,warning,die}*() Ævar Arnfjörð Bjarmason
2021-12-27 19:32 ` Jeff Hostetler
2021-12-27 23:01 ` Ævar Arnfjörð Bjarmason
2021-12-28 16:32 ` Jeff Hostetler
2021-12-28 18:51 ` Elijah Newren
2021-12-28 23:48 ` Ævar Arnfjörð Bjarmason
2021-12-29 2:15 ` Elijah Newren
2021-12-28 23:42 ` Ævar Arnfjörð Bjarmason [this message]
2021-12-29 16:13 ` Jeff Hostetler
2021-11-15 22:18 ` [RFC PATCH 20/21] usage API: make the "{usage,fatal,error,warning,BUG}: " translatable Ævar Arnfjörð Bjarmason
2021-11-15 22:18 ` [RFC PATCH 21/21] usage API: add "core.usageAddSource" config to add <file>:<line> Ævar Arnfjörð Bjarmason
2021-11-16 18:43 ` [RFC PATCH 00/21] C99: show meaningful <file>:<line> in trace2 via macros Taylor Blau
2021-11-16 18:58 ` Ævar Arnfjörð Bjarmason
2021-11-16 19:36 ` Taylor Blau
2021-11-16 20:16 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=211229.868rw4i95a.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=jonathantanmy@google.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).