Re: [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Duyck <alexander.h.duyck@intel.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Dmitry Fleytman <dmitry@daynix.com>,
	Yan Vugenfirer <yan@daynix.com>,
	Chris Webb <chris.webb@elastichosts.com>,
	qemu-devel@nongnu.org,
	Richard Davies <richard.davies@elastichosts.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled.
Date: Thu, 18 Oct 2012 09:06:55 -0700	[thread overview]
Message-ID: <5080291F.5020000@intel.com> (raw)
In-Reply-To: <CAJSP0QUmLupy9kbFuCzSSmGwHafeH-35Y5V9RD0=krZs8bucDw@mail.gmail.com>

On 10/18/2012 07:31 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <dmitry@daynix.com> wrote:
>> The real purpose of check_rxov it a bit confusing indeed, mainly
>> because of unclear name (rename?),
>> however it works as following:
>>
>> There are 2 possible when RDT == RDH for RX ring:
>>     1. Device used all the buffers from ring, no empty buffers available
>>     2. Driver fully refilled the ring and all buffers are empty and ready to use

The 2nd case is not true.  We should only have RDT == RDH when the ring
is empty.  If RDT == RDH and the ring is full then we have a bug in the
driver.  The driver should only ever allow RDT to be one less than head,
or ring size - 1 if head is 0.

>> check_rxov is used to distinguish these 2 cases:
>>     1. It must be 1 initially (init, reset, etc.)
>>     2. It must be set to one when device uses buffer
>>     3. It must be set to 0 when driver adds buffer to the ring
>> check_rxov == 1 - ring is empty
>> check_rxov == 0 - ring is full
>>
>> Indeed, RX init sequence doesn't look logical, however this is the way
>> all Intel driver behave from e1000 and up to ixgbe.
>> Also see some explanation here:
>> http://permalink.gmane.org/gmane.linux.kernel/1375917
>>
>> If we drop check_rxov and always treat RDH == RDT as empty ring we'll
>> probably get correct behavior for current Linux driver's code (needs
>> testing of course),
>> however we have no idea how Windows drivers work.

The windows driver should work the same way.  If RDH == RDT the hardware
will treat that as a empty ring and will hang.  If there is a driver
that is setting RDH == RDT to indicate the ring is full please let us
know as that is likely a buggy driver.

> Thanks, for the great explanation, Dmitry.
>
> Alexander: I CCed you because I hope you might be able to explain what
> the 82540EM card does when a driver sets RDT to the value of RDH.  The
> QEMU NIC emulation code treats this as a full ring (i.e. the
> descriptors are valid and will be filled in by the hardware).  Does
> the real hardware act like this or will it treat this condition as
> ring empty (i.e. if the driver sets RDT to the value of RDH then the
> hardware stops receive because there are no descriptors available)?
>
> I can't find a statement in the Intel datasheet about what happens
> when the driver sets RDT = RDH.  The QEMU check_rxov variable is
> trying to distinguish between ring empty (RDH has moved to RDT) and
> ring full (driver has set RDH = RDT because the full descriptor ring
> is available).

If RDT == RDH then we should stop receiving traffic.  As far as I know
all of our e1000 hardware treat RDT == RDH as an empty ring state.  All
of the drivers should have code in place to stop it.  For example the
E1000_DESC_UNUSED macro should be returning ring size - 1 in the case of
RDT == RDH which will result in the head being 0 and the tail being ring
size - 2.

> Dmitry: At this point we'd need to test what happens on real hardware
> when RDH = RDT in order to be able to remove check_rxov.  As you
> mentioned, with the Linux e1000 driver we don't see ring full RDH =
> RDT:
>
>         /* call E1000_DESC_UNUSED which always leaves
>          * at least 1 descriptor unused to make sure
>          * next_to_use != next_to_clean */
>         for (i = 0; i < adapter->num_rx_queues; i++) {
>                 struct e1000_rx_ring *ring = &adapter->rx_ring[i];
>                 adapter->alloc_rx_buf(adapter, ring,
>                                       E1000_DESC_UNUSED(ring));
>         }
>
> Here some sample output from a QEMU printf, notice how RDH is never
> the same as RDT once rx begins:
>
> set_rdt rdh=0 rdt_old=0 rdt_new=0
> set_rdt rdh=0 rdt_old=0 rdt_new=254
> set_rdt rdh=1 rdt_old=254 rdt_new=255
> set_rdt rdh=2 rdt_old=255 rdt_new=0
> set_rdt rdh=3 rdt_old=0 rdt_new=1
> set_rdt rdh=4 rdt_old=1 rdt_new=2
> set_rdt rdh=5 rdt_old=2 rdt_new=3
> set_rdt rdh=6 rdt_old=3 rdt_new=4
> set_rdt rdh=7 rdt_old=4 rdt_new=5
> set_rdt rdh=9 rdt_old=5 rdt_new=7
> set_rdt rdh=10 rdt_old=7 rdt_new=8
> set_rdt rdh=11 rdt_old=8 rdt_new=9
> set_rdt rdh=12 rdt_old=9 rdt_new=10
> set_rdt rdh=13 rdt_old=10 rdt_new=11
> set_rdt rdh=14 rdt_old=11 rdt_new=12
>
> The iPXE 'intel' driver (supports e1000 cards) also does not set RDH =
> RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a
> time.
>
> The reason I'm digging into the need for check_rxov is because it's a
> dangerous piece of code to have.  If check_rxov logic is ever out of
> sync we risk memory corruption.  I'd really like to drop it
> completely.
>
> Stefan

There should be no need for check_rxov.  As far as I know none of our
drivers will ever set RDT == RDH if there are descriptors available on
the ring.

Thanks,

Alex

next prev parent reply	other threads:[~2012-10-18 16:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 18:31 [Qemu-devel] [PATCH 0/2] E1000 RX/Live migration bugs fixed Dmitry Fleytman
2012-10-17 18:31 ` [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled Dmitry Fleytman
2012-10-18  7:31   ` Stefan Hajnoczi
2012-10-18  8:08     ` Dmitry Fleytman
2012-10-18  8:09   ` Stefan Hajnoczi
2012-10-18  8:34     ` Dmitry Fleytman
2012-10-18 14:31       ` Stefan Hajnoczi
2012-10-18 16:06         ` Alexander Duyck [this message]
2012-10-18 16:12           ` Dmitry Fleytman
2012-10-17 18:31 ` [Qemu-devel] [PATCH 2/2] Add check_rxov into VMState Dmitry Fleytman
2012-10-18  7:24   ` Stefan Hajnoczi
2012-10-18  8:06     ` Dmitry Fleytman
2012-10-18 14:56   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5080291F.5020000@intel.com \
    --to=alexander.h.duyck@intel.com \
    --cc=chris.webb@elastichosts.com \
    --cc=dmitry@daynix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.davies@elastichosts.com \
    --cc=stefanha@gmail.com \
    --cc=yan@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).