From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AA51C433F5 for ; Mon, 4 Oct 2021 21:36:56 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 98A296120D for ; Mon, 4 Oct 2021 21:36:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 98A296120D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:50066 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mXVdm-00039v-Rn for qemu-devel@archiver.kernel.org; Mon, 04 Oct 2021 17:36:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57956) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mXVUK-0006Ts-Kw for qemu-devel@nongnu.org; Mon, 04 Oct 2021 17:27:09 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:41613) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mXVUH-0005fy-2q for qemu-devel@nongnu.org; Mon, 04 Oct 2021 17:27:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633382824; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k+XjGJLPnJEF8pYT2tlN26i72z3lhfyhTo/51gpl94M=; b=AlXDv0hlQK+zql6kDSGz5WRWCXxPMDN30VUmjfN76NY5KFsQX2Vl1/1enQlWSkEr2jE5mE h4azX/OX0Gu8UY31fM8qcjIcgoFP1Ls65Ma1kNaVcPVQoACDxuvfPZFXPEH8kgGG4+3lwp 85ztT2rfFOUhp9N1H71zDjSO1AmRd/k= Received: from mail-vs1-f71.google.com (mail-vs1-f71.google.com [209.85.217.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-362-iHnmdq10NWqhCHqdK9GFeg-1; Mon, 04 Oct 2021 17:27:02 -0400 X-MC-Unique: iHnmdq10NWqhCHqdK9GFeg-1 Received: by mail-vs1-f71.google.com with SMTP id e13-20020a67b64d000000b002d834de4dd7so1848142vsm.21 for ; Mon, 04 Oct 2021 14:27:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k+XjGJLPnJEF8pYT2tlN26i72z3lhfyhTo/51gpl94M=; b=sihDtKsqEanSGS65mkvvJsSOLf4WKi423XaDLiYbDbwvcReyDAZzcLCJZrJo+R/Qer /6pvEaMO444yq1x2hOE3jdiu7Uj+l1hLGPpdLzqVNSbZ36bmHAolmPpH5ed/d+bt394l GdLdoRR8DRMdNQ29qaJ270Q01cpZxgFHhmU1E4XeYPlGyBwoyxFNJkCpD59z7amYDpGP 61i8g+vKCB/YX4EKKOslP5bILkFIE/yxGviwiS7YUo9C6Nlre5b2mbtq/2H+hllkkUJ5 WD8xyGLlpZAWxeOnypfFW0wBKWm45lj6wMtnIRshauM7lG3017H5gmQswOwY2iBlGCEc vTqQ== X-Gm-Message-State: AOAM531bYG1UQ30AZWFuLaCn2yeCQ/Am/tTFhuqLhazffTlLjfcSuWEt 7yutoMh1B1kTlbq+J7U87eo7s35BH4IqGJhYTbf4wU75kqKf7UMQCpCcAYdXxJHZH57W0HleTT1 Fnn58loAv2cH23GVDa7JSwPw9RRZ9GXg= X-Received: by 2002:ab0:3c89:: with SMTP id a9mr9890590uax.32.1633382821742; Mon, 04 Oct 2021 14:27:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwHBYEPliaHn9GaN8oz7cozRGuUF1L0w3wxnbXaKEIw5mWRe1CSwPuDdnzgCZm6MSfxJxRuves+DbnhB9JykHc= X-Received: by 2002:ab0:3c89:: with SMTP id a9mr9890560uax.32.1633382821484; Mon, 04 Oct 2021 14:27:01 -0700 (PDT) MIME-Version: 1.0 References: <20210917054047.2042843-1-jsnow@redhat.com> <20210917054047.2042843-13-jsnow@redhat.com> <372c86ac-0061-2d9a-b366-72260d91bd75@redhat.com> In-Reply-To: From: John Snow Date: Mon, 4 Oct 2021 17:26:50 -0400 Message-ID: Subject: Re: [PATCH 12/15] iotests: Disable AQMP logging under non-debug modes To: Hanna Reitz Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=jsnow@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/alternative; boundary="000000000000db70ee05cd8d9269" Received-SPF: pass client-ip=170.10.133.124; envelope-from=jsnow@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.066, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Vladimir Sementsov-Ogievskiy , Eduardo Habkost , qemu-block@nongnu.org, qemu-devel , Cleber Rosa Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --000000000000db70ee05cd8d9269 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Oct 4, 2021 at 2:32 PM John Snow wrote: > > > On Mon, Oct 4, 2021 at 6:12 AM Hanna Reitz wrote: > >> On 18.09.21 04:14, John Snow wrote: >> > >> > >> > On Fri, Sep 17, 2021 at 8:58 PM John Snow > > > wrote: >> > >> > >> > >> > On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz > > > wrote: >> > >> > On 17.09.21 07:40, John Snow wrote: >> > > Disable the aqmp logger, which likes to (at the moment) >> > print out >> > > intermediate warnings and errors that cause session >> > termination; disable >> > > them so they don't interfere with the job output. >> > > >> > > Leave any "CRITICAL" warnings enabled though, those are ones >> > that we >> > > should never see, no matter what. >> > >> > I mean, looks OK to me, but from what I understand (i.e. >> little), >> > qmp_client doesn=E2=80=99t log CRITICAL messages, at least I c= an=E2=80=99t see >> > any. Only >> > ERRORs. >> > >> > >> > There's *one* critical message in protocol.py, used for a >> > circumstance that I *think* should be impossible. I do not think I >> > currently use any WARNING level statements. >> > >> > I guess I=E2=80=99m missing some CRITICAL messages in external >> > functions called >> > from qmp_client.py, but shouldn=E2=80=99t we still keep ERRORs= ? >> > >> > >> > ...Mayyyyyybe? >> > >> > The errors logged by AQMP are *almost always* raised as Exceptions >> > somewhere else, eventually. Sometimes when we encounter them in >> > one context, we need to save them and then re-raise them in a >> > different execution context. There's one good exception to this: >> > My pal, EOFError. >> > >> > If the reader context encounters EOF, it raises EOFError and this >> > causes a disconnect to be scheduled asynchronously. *Any* >> > Exception that causes a disconnect to be scheduled asynchronously >> > is dutifully logged as an ERROR. At this point in the code, we >> > don't really know if the user of the library considers this an >> > "error" yet or not. I've waffled a lot on how exactly to treat >> > this circumstance. ...Hm, I guess that's really the only case >> > where I have an error that really ought to be suppressed. I >> > suppose what I will do here is: if the exception happens to be an >> > EOFError I will drop the severity of the log message down to INFO. >> > I don't know why it takes being challenged on this stuff to start >> > thinking clearly about it, but here we are. Thank you for your >> > feedback :~) >> > >> > --js >> > >> > >> > Oh, CI testing reminds me of why I am a liar here. >> > >> > the mirror-top-perms test intentionally expects not to be able to >> > connect, but we're treated to these two additional lines of output: >> > >> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError >> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session: >> > EOFError >> > >> > Uh. I guess a temporary suppression in mirror-top-perms, then ...? >> >> Sounds right to me, if that=E2=80=99s simple enough. >> >> (By the way, I understand it right that you want to lower the severity >> of EOFErrors to INFO only on disconnect, right? Which is why they=E2=80= =99re >> still logged as ERRORs here, because they aren=E2=80=99t occurring on >> disconnects?) >> >> > More or less, yeah. > > When an EOFError causes the reader coroutine to halt (because it can't > read the next message), I decided (in v2) to drop that one particular > logging message down to "INFO", because it might -- or might not be -- an > expected occurrence from the point of view of whoever is managing the QMP > connection. Maybe it was expected (The test used qemu-guest-agent or > something else to make the guest shutdown, taking QEMU down with it witho= ut > the knowledge of the QMP library layer) or maybe it was unexpected (the Q= MP > remote really just disappeared from us on a whim). There's no way to know= , > so it probably isn't right to consider it an error. > > In the connection case, I left it as an ERROR because the caller asked us > to connect to an endpoint and we were unable to, which feels unambiguous. > It will be ultimately reported via Exceptions as qemu.aqmp.ConnectError, > with additional information available in fields of that exception object. > Even though the exception is reported to the caller, I decided to log the > occurrence anyway, because I felt like it should be the job of the librar= y > to make a good log and not the caller's responsibility to catch the > exception and then log it themselves. > > That does leave us with this atypical case though: the caller is > intentionally causing problems :) > > (Well, atypical for a user of the library who is using it to make a tool > they expect to work. Quite a typical case for tests in the abstract, thou= gh > we only seem to have precisely one usage of this pattern in iotests so fa= r.) > > >> > In most other cases I rather imagine we do want to see this kind of >> > output to help give more information about how things have failed -- >> > they ARE errors. We just happen to not care about them right here. >> >> Well, in fact we do expect them here, so we could even log them, but >> first I don=E2=80=99t know whether they=E2=80=99re stable enough for tha= t, and second >> this would break the =E2=80=9Cchoose the AQMP or legacy QMP back-end via= an >> environment variable=E2=80=9D thing. >> >> > Yeah, that's the other half of it, I came to realize. Just logging the > expected failure feels the most idiomatic, but it requires a new logging > filter (I have to filter out the PID from the logger name to make it work > consistently) and breaks the ability to switch. > > So I suppose "for now" just disabling the logger output for a critical > section and leaving a comment that states that once we're feeling confide= nt > about the new library (maybe after the next release when everything has > really gone through the testing wash cycle) we can remove the suppression > and just log the errors the normal way. > > >> > The only thing I don't exactly like about this is that it requires >> > some knowledge by the caller to know to disable it. It makes writing >> > negative tests a bit more annoying because the library leans so >> > heavily on yelling loudly when it encounters problems. >> >> Yeah. I=E2=80=99m afraid I don=E2=80=99t have a good idea on how to con= vey test writers >> how to suppress these errors, even if it were a simple one-liner (like >> `self.vm_b.set_log_threshold(logging.CRITICAL)`)... >> >> We could start an =E2=80=9Ciotests tips and tricks=E2=80=9D document, bu= t do we want to? >> >> Hanna >> >> > Hm, not just yet. More chances for un-checked stuff to bitrot. There's > only this one caller that poses a problem here, so it's probably not goin= g > to be too difficult to just update tests to expect the error logs. > > V2 soon. I already respun it, but it's been a week, so I will need to > scrub it down with a good proof-reading pass. Thanks for your patience, a= nd > I hope you enjoyed your PTO :~) > > Oh, uh, I got confused and I actually already sent V2 for this series. Apologies for the confusion. --js --000000000000db70ee05cd8d9269 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Mon, Oct 4, 2021 at 2:32 PM John S= now <jsnow@redhat.com> wrote:=


On Mon, Oct 4, 2021 at 6:12 AM Hanna Reitz <hreitz@redhat.com>= wrote:
On 18.09= .21 04:14, John Snow wrote:
>
>
> On Fri, Sep 17, 2021 at 8:58 PM John Snow <jsnow@redhat.com
> <mailto:jsnow= @redhat.com>> wrote:
>
>
>
>=C2=A0 =C2=A0 =C2=A0On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com<= br> >=C2=A0 =C2=A0 =C2=A0<mailto:hreitz@redhat.com>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0On 17.09.21 07:40, John Snow wrote: >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> Disable the aqmp logger, which l= ikes to (at the moment)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0print out
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> intermediate warnings and errors= that cause session
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0termination; disable
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> them so they don't interfere= with the job output.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> Leave any "CRITICAL" w= arnings enabled though, those are ones
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0that we
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> should never see, no matter what= .
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I mean, looks OK to me, but from what= I understand (i.e. little),
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qmp_client doesn=E2=80=99t log CRITIC= AL messages, at least I can=E2=80=99t see
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0any. Only
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ERRORs.
>
>
>=C2=A0 =C2=A0 =C2=A0There's *one* critical message in protocol.py, = used for a
>=C2=A0 =C2=A0 =C2=A0circumstance that I *think* should be impossible. I= do not think I
>=C2=A0 =C2=A0 =C2=A0currently use any WARNING level statements.
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I guess I=E2=80=99m missing some CRIT= ICAL messages in external
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0functions called
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from qmp_client.py, but shouldn=E2=80= =99t we still keep ERRORs?
>
>
>=C2=A0 =C2=A0 =C2=A0...Mayyyyyybe?
>
>=C2=A0 =C2=A0 =C2=A0The errors logged by AQMP are *almost always* raise= d as Exceptions
>=C2=A0 =C2=A0 =C2=A0somewhere else, eventually. Sometimes when we encou= nter them in
>=C2=A0 =C2=A0 =C2=A0one context, we need to save them and then re-raise= them in a
>=C2=A0 =C2=A0 =C2=A0different execution context. There's one good e= xception to this:
>=C2=A0 =C2=A0 =C2=A0My pal, EOFError.
>
>=C2=A0 =C2=A0 =C2=A0If the reader context encounters EOF, it raises EOF= Error and this
>=C2=A0 =C2=A0 =C2=A0causes a disconnect to be scheduled asynchronously.= *Any*
>=C2=A0 =C2=A0 =C2=A0Exception that causes a disconnect to be scheduled = asynchronously
>=C2=A0 =C2=A0 =C2=A0is dutifully logged as an ERROR. At this point in t= he code, we
>=C2=A0 =C2=A0 =C2=A0don't really know if the user of the library co= nsiders this an
>=C2=A0 =C2=A0 =C2=A0"error" yet or not. I've waffled a lo= t on how exactly to treat
>=C2=A0 =C2=A0 =C2=A0this circumstance. ...Hm, I guess that's really= the only case
>=C2=A0 =C2=A0 =C2=A0where I have an error that really ought to be suppr= essed. I
>=C2=A0 =C2=A0 =C2=A0suppose what I will do here is: if the exception ha= ppens to be an
>=C2=A0 =C2=A0 =C2=A0EOFError I will drop the severity of the log messag= e down to INFO.
>=C2=A0 =C2=A0 =C2=A0I don't know why it takes being challenged on t= his stuff to start
>=C2=A0 =C2=A0 =C2=A0thinking clearly about it, but here we are. Thank y= ou for your
>=C2=A0 =C2=A0 =C2=A0feedback :~)
>
>=C2=A0 =C2=A0 =C2=A0--js
>
>
> Oh, CI testing reminds me of why I am a liar here.
>
> the mirror-top-perms test intentionally expects not to be able to
> connect, but we're treated to these two additional lines of output= :
>
> +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError=
> +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session:=
> EOFError
>
> Uh. I guess a temporary suppression in mirror-top-perms, then ...?

Sounds right to me, if that=E2=80=99s simple enough.

(By the way, I understand it right that you want to lower the severity
of EOFErrors to INFO only on disconnect, right?=C2=A0 Which is why they=E2= =80=99re
still logged as ERRORs here, because they aren=E2=80=99t occurring on disco= nnects?)


More or less, yeah.

When an EOFError causes the reader coroutine = to halt (because it can't read the next message), I decided (in v2) to = drop that one particular logging message down to "INFO", because = it might -- or might not be -- an expected occurrence from the point of vie= w of whoever is managing the QMP connection. Maybe it was expected (The tes= t used qemu-guest-agent or something else to make the guest shutdown, takin= g QEMU down with it without the knowledge of the QMP library layer) or mayb= e it was unexpected (the QMP remote really just disappeared from us on a wh= im). There's no way to know, so it probably isn't right to consider= it an error.

In the connection case, I left it as an ERROR because the caller as= ked us to connect to an endpoint and we were unable to, which feels unambig= uous. It will be ultimately reported via Exceptions as qemu.aqmp.ConnectErr= or, with additional information available in fields of that exception objec= t. Even though the exception is reported to the caller, I decided to log th= e occurrence anyway, because I felt like it should be the job of the librar= y to make a good log and not the caller's responsibility to catch the e= xception and then log it themselves.

That does leave us with this atypical case t= hough: the caller is intentionally causing problems :)

(Well, atypical for a user= of the library who is using it to make a tool they expect to work. Quite a= typical case for tests in the abstract, though we only seem to have precis= ely one usage of this pattern in iotests so far.)
=C2=A0
> In most other cases I rather imagine we do want to see this kind of > output to help give more information about how things have failed -- <= br> > they ARE errors. We just happen to not care about them right here.

Well, in fact we do expect them here, so we could even log them, but
first I don=E2=80=99t know whether they=E2=80=99re stable enough for that, = and second
this would break the =E2=80=9Cchoose the AQMP or legacy QMP back-end via an=
environment variable=E2=80=9D thing.


Yeah, that's the other half of it,= I came to realize. Just logging the expected failure feels the most idioma= tic, but it requires a new logging filter (I have to filter out the PID fro= m the logger name to make it work consistently) and breaks the ability to s= witch.

So I suppose "for now" just disab= ling the logger output for a critical section and leaving a comment that st= ates that once we're feeling confident about the new library (maybe aft= er the next release when everything has really gone through the testing was= h cycle) we can remove the suppression and just log the errors the normal w= ay.
=C2=A0
> The only thing I don't exactly like about this is that it requires=
> some knowledge by the caller to know to disable it. It makes writing <= br> > negative tests a bit more annoying because the library leans so
> heavily on yelling loudly when it encounters problems.

Yeah.=C2=A0 I=E2=80=99m afraid I don=E2=80=99t have a good idea on how to c= onvey test writers
how to suppress these errors, even if it were a simple one-liner (like
`self.vm_b.set_log_threshold(logging.CRITICAL)`)...

We could start an =E2=80=9Ciotests tips and tricks=E2=80=9D document, but d= o we want to?

Hanna


Hm, not just yet. More chances for un-checked stuff to bitrot. = There's only this one caller that poses a problem here, so it's pro= bably not going to be too difficult to just update tests to expect the erro= r logs.

V2 soon. I already respun it, but it's been a week, so I will nee= d to scrub it down with a good proof-reading pass. Thanks for your patience= , and I hope you enjoyed your PTO :~)


=
Oh, uh, I got confused and I actually already sent V2 for this s= eries. Apologies for the confusion.

--js
=
--000000000000db70ee05cd8d9269--