From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59364)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1bwnmn-00050P-Bg
	for qemu-devel@nongnu.org; Wed, 19 Oct 2016 06:07:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1bwnmi-0005F8-Bf
	for qemu-devel@nongnu.org; Wed, 19 Oct 2016 06:07:49 -0400
Received: from mx1.redhat.com ([209.132.183.28]:33386)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <kwolf@redhat.com>) id 1bwnmi-0005EL-3r
	for qemu-devel@nongnu.org; Wed, 19 Oct 2016 06:07:44 -0400
Date: Wed, 19 Oct 2016 12:07:40 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20161019100740.GE5336@noname.redhat.com>
References: <1476657307-26165-1-git-send-email-mail@kevin-wolf.de>
	<E34ADA36-74A5-439D-9459-B5F753292CF9@daynix.com>
	<20161018142705.GK4706@noname.str.redhat.com>
	<22FF2CCB-272D-4202-A7F0-D29E848C366A@daynix.com>
	<20161019072510.GB5336@noname.redhat.com>
	<4635F072-CAD3-4FD1-978A-8E0136878CBD@daynix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <4635F072-CAD3-4FD1-978A-8E0136878CBD@daynix.com>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH] e1000e: Don't zero out buffer address in
 rx descriptor
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Dmitry Fleytman <dmitry@daynix.com>
Cc: Kevin Wolf <mail@kevin-wolf.de>, Qemu Developers <qemu-devel@nongnu.org>, jasowang@redhat.com

Am 19.10.2016 um 09:57 hat Dmitry Fleytman geschrieben:
>=20
>     On 19 Oct 2016, at 10:25 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>=20
>     Am 19.10.2016 um 08:48 hat Dmitry Fleytman geschrieben:
>=20
>            Another related thing that I noticed while debugging this an=
d
>         turning on
>            tracing is that the interrupt throttling timers kept firing =
even if
>            there was no activity at all. Something might be wrong, ther=
e, too.
>=20
>            Next thing I wondered why throttling was enabled at all beca=
use the
>         spec
>            says the default is 0 (turned off). So one thing that I'm pr=
etty
>         sure is
>            just a misunderstanding is the following defintion:
>=20
>            #define E1000E_MIN_XITR     (500) /* No more then 7813 inter=
rupts
>         per
>                                                second according to spec
>         10.2.4.2 */
>=20
>=20
>            As I understand it, the spec is just giving an example there=
 and
>         lower
>            values are valid as well. At the very least, 0 should be acc=
epted as
>         a
>            special case because it means "disabled" and it's specified =
to be
>         the
>            default.
>=20
>=20
>         Right, this according to the spec this value should be 0 by def=
ault and
>         throttling should be disabled.
>=20
>         Current device implementation does not allow specification of
>         throttling interval less than 500 and treats interval 0 as thro=
ttling
>         enabled with interval 500.
>=20
>         This is done by intention because according to the spec (10.2.4=
.2)
>         device cannot produce more than 7813 interrupts per second even=
 when
>         throttling is disabled. Therefore, even in case of interrupt st=
orm
>         (continuous interrupt re-injection by device), number of interr=
upts
>         produced by device is limited and CPU (driver) has enough time =
to do
>         its job and handle problematic interrupt state.
>=20
>=20
>     I think you're misinterpreting the spec here. This is the paragraph
>     we're talking about, right?
>=20
>        For example, if the interval is programmed to 500 (decimal), the
>        82574 guarantees the CPU is not interrupted by it for 128 =C2=B5=
s from
>        the last interrupt. The maximum observable interrupt rate from t=
he
>        82574 should never exceed 7813 interrupts/sec.
>=20
>     It says "for example", so this is just demonstrating how you can
>     calculate the effects of a specific throttling setting. It says tha=
t
>     _if_ you set ITR to 500, you get an interrupt at most every
>     500 * 256 ns =3D 128 =C2=B5s. And 1 / 128 =C2=B5s =3D 7821.5 Hz, so=
 this is the
>     effective maximum frequency that _this specific_ ITR setting allows=
.
>=20
>     I also don't think it would make any sense for hardware to be unabl=
e to
>     trigger interrupts more often than that. Triggering an interrupt is=
 not
>     a complex operation that involves a lot of calculation or anything.
>=20
>=20
> Hi Kevin,
>=20
> Yes, I assume that sentence
>=20
> =E2=80=9CThe maximum observable interrupt rate from the
> 82574 should never exceed 7813 interrupts/sec."
>=20
> is not a related to a specific case, but describes a generic limitation=
,
> however it might be I=E2=80=99m misreading the spec indeed.

For me everything hints at this being only an example: Not only do the
numbers match the example made in the previous sentence (which is
explicitly called an example) and look weird as a real limit, but it's
also in the same paragraph as the explicit example and the spec is
generally good at starting a new paragraph when talking about a new
aspect.

I don't care enough to actually make you change anything, but I wanted
you to be aware that the interpretation of the spec as coded into our
emulation isn't clear at all (in fact, I would think it's clear that
it's _not_ meant this way) and that real hardware probably doesn't do
the same thing as we do.

What we're doing may still have merit, as a workaround for a guest
driver bug.

>         Opposed to this, virtual device is able to raise interrupts wit=
h rate
>         limited by CPU speed only therefore driver has no chance to fix
>         interrupt storm condition.
>=20
>         Windows e1000e drivers rely on upper limit for number of interr=
upts
>         per second in some cases and absence of this limit leads to inf=
inite
>         interrupt storms.
>=20
>         To summarise, while usage of throttling mechanisms is a little =
bit
>         different from what specification says, effective emulated devi=
ce
>         behavior is totally compliant to the real device.
>=20
>=20
>     So Windows doesn't configure ITR (i.e. it is 0) even though it can'=
t
>     handle unlimited interrupts? That would be a driver bug then, and
>     perhaps an important enough one to keep a workaround in our code. B=
ut
>     then let's be explicit that this is a workaround for a Windows bug =
and
>     not mandated by the spec.
>=20
>     I'm not sure in what setup you produced this error, but possibly a
>     reason why this doesn't happen with real hardware isn't the NIC its=
elf
>     but the backend: Communication with the host can obviously be faste=
r
>     than talking to a physical network (so if you were doing the latter=
, the
>     rate in the VM wouldn't be limited by the CPU, but by the physical
>     network).
>=20
>=20
> This issue is reproduced on device disable and not related=20
> to intensive device/backend communication. One RX packet with
> right timing is enough to trigged the problem.
>=20
> The same issue was fixed in e1000 device some time ago as well.

Commit 9596ef7c was good in flagging it as a guest driver bug. Only a
later series brought in the questionable spec interpretation.

Kevin