Re: [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@gmail.com>
To: Dmitry Fleytman <dmitry@daynix.com>
Cc: Yan Vugenfirer <yan@daynix.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	Chris Webb <chris.webb@elastichosts.com>,
	qemu-devel@nongnu.org,
	Richard Davies <richard.davies@elastichosts.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled.
Date: Thu, 18 Oct 2012 16:31:47 +0200	[thread overview]
Message-ID: <CAJSP0QUmLupy9kbFuCzSSmGwHafeH-35Y5V9RD0=krZs8bucDw@mail.gmail.com> (raw)
In-Reply-To: <CAGHCxheoZtSGcHkfJcoNT62QPC1Um=O2sOXhUDqtzc==Nrowyw@mail.gmail.com>

On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <dmitry@daynix.com> wrote:
> The real purpose of check_rxov it a bit confusing indeed, mainly
> because of unclear name (rename?),
> however it works as following:
>
> There are 2 possible when RDT == RDH for RX ring:
>     1. Device used all the buffers from ring, no empty buffers available
>     2. Driver fully refilled the ring and all buffers are empty and ready to use
>
> check_rxov is used to distinguish these 2 cases:
>     1. It must be 1 initially (init, reset, etc.)
>     2. It must be set to one when device uses buffer
>     3. It must be set to 0 when driver adds buffer to the ring
> check_rxov == 1 - ring is empty
> check_rxov == 0 - ring is full
>
> Indeed, RX init sequence doesn't look logical, however this is the way
> all Intel driver behave from e1000 and up to ixgbe.
> Also see some explanation here:
> http://permalink.gmane.org/gmane.linux.kernel/1375917
>
> If we drop check_rxov and always treat RDH == RDT as empty ring we'll
> probably get correct behavior for current Linux driver's code (needs
> testing of course),
> however we have no idea how Windows drivers work.

Thanks, for the great explanation, Dmitry.

Alexander: I CCed you because I hope you might be able to explain what
the 82540EM card does when a driver sets RDT to the value of RDH.  The
QEMU NIC emulation code treats this as a full ring (i.e. the
descriptors are valid and will be filled in by the hardware).  Does
the real hardware act like this or will it treat this condition as
ring empty (i.e. if the driver sets RDT to the value of RDH then the
hardware stops receive because there are no descriptors available)?

I can't find a statement in the Intel datasheet about what happens
when the driver sets RDT = RDH.  The QEMU check_rxov variable is
trying to distinguish between ring empty (RDH has moved to RDT) and
ring full (driver has set RDH = RDT because the full descriptor ring
is available).

Dmitry: At this point we'd need to test what happens on real hardware
when RDH = RDT in order to be able to remove check_rxov.  As you
mentioned, with the Linux e1000 driver we don't see ring full RDH =
RDT:

        /* call E1000_DESC_UNUSED which always leaves
         * at least 1 descriptor unused to make sure
         * next_to_use != next_to_clean */
        for (i = 0; i < adapter->num_rx_queues; i++) {
                struct e1000_rx_ring *ring = &adapter->rx_ring[i];
                adapter->alloc_rx_buf(adapter, ring,
                                      E1000_DESC_UNUSED(ring));
        }

Here some sample output from a QEMU printf, notice how RDH is never
the same as RDT once rx begins:

set_rdt rdh=0 rdt_old=0 rdt_new=0
set_rdt rdh=0 rdt_old=0 rdt_new=254
set_rdt rdh=1 rdt_old=254 rdt_new=255
set_rdt rdh=2 rdt_old=255 rdt_new=0
set_rdt rdh=3 rdt_old=0 rdt_new=1
set_rdt rdh=4 rdt_old=1 rdt_new=2
set_rdt rdh=5 rdt_old=2 rdt_new=3
set_rdt rdh=6 rdt_old=3 rdt_new=4
set_rdt rdh=7 rdt_old=4 rdt_new=5
set_rdt rdh=9 rdt_old=5 rdt_new=7
set_rdt rdh=10 rdt_old=7 rdt_new=8
set_rdt rdh=11 rdt_old=8 rdt_new=9
set_rdt rdh=12 rdt_old=9 rdt_new=10
set_rdt rdh=13 rdt_old=10 rdt_new=11
set_rdt rdh=14 rdt_old=11 rdt_new=12

The iPXE 'intel' driver (supports e1000 cards) also does not set RDH =
RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a
time.

The reason I'm digging into the need for check_rxov is because it's a
dangerous piece of code to have.  If check_rxov logic is ever out of
sync we risk memory corruption.  I'd really like to drop it
completely.

Stefan

next prev parent reply	other threads:[~2012-10-18 14:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 18:31 [Qemu-devel] [PATCH 0/2] E1000 RX/Live migration bugs fixed Dmitry Fleytman
2012-10-17 18:31 ` [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled Dmitry Fleytman
2012-10-18  7:31   ` Stefan Hajnoczi
2012-10-18  8:08     ` Dmitry Fleytman
2012-10-18  8:09   ` Stefan Hajnoczi
2012-10-18  8:34     ` Dmitry Fleytman
2012-10-18 14:31       ` Stefan Hajnoczi [this message]
2012-10-18 16:06         ` Alexander Duyck
2012-10-18 16:12           ` Dmitry Fleytman
2012-10-17 18:31 ` [Qemu-devel] [PATCH 2/2] Add check_rxov into VMState Dmitry Fleytman
2012-10-18  7:24   ` Stefan Hajnoczi
2012-10-18  8:06     ` Dmitry Fleytman
2012-10-18 14:56   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJSP0QUmLupy9kbFuCzSSmGwHafeH-35Y5V9RD0=krZs8bucDw@mail.gmail.com' \
    --to=stefanha@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=chris.webb@elastichosts.com \
    --cc=dmitry@daynix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.davies@elastichosts.com \
    --cc=yan@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).