From: Alexander Duyck <alexander.h.duyck@intel.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Dmitry Fleytman <dmitry@daynix.com>,
Yan Vugenfirer <yan@daynix.com>,
Chris Webb <chris.webb@elastichosts.com>,
qemu-devel@nongnu.org,
Richard Davies <richard.davies@elastichosts.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled.
Date: Thu, 18 Oct 2012 09:06:55 -0700 [thread overview]
Message-ID: <5080291F.5020000@intel.com> (raw)
In-Reply-To: <CAJSP0QUmLupy9kbFuCzSSmGwHafeH-35Y5V9RD0=krZs8bucDw@mail.gmail.com>
On 10/18/2012 07:31 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <dmitry@daynix.com> wrote:
>> The real purpose of check_rxov it a bit confusing indeed, mainly
>> because of unclear name (rename?),
>> however it works as following:
>>
>> There are 2 possible when RDT == RDH for RX ring:
>> 1. Device used all the buffers from ring, no empty buffers available
>> 2. Driver fully refilled the ring and all buffers are empty and ready to use
The 2nd case is not true. We should only have RDT == RDH when the ring
is empty. If RDT == RDH and the ring is full then we have a bug in the
driver. The driver should only ever allow RDT to be one less than head,
or ring size - 1 if head is 0.
>> check_rxov is used to distinguish these 2 cases:
>> 1. It must be 1 initially (init, reset, etc.)
>> 2. It must be set to one when device uses buffer
>> 3. It must be set to 0 when driver adds buffer to the ring
>> check_rxov == 1 - ring is empty
>> check_rxov == 0 - ring is full
>>
>> Indeed, RX init sequence doesn't look logical, however this is the way
>> all Intel driver behave from e1000 and up to ixgbe.
>> Also see some explanation here:
>> http://permalink.gmane.org/gmane.linux.kernel/1375917
>>
>> If we drop check_rxov and always treat RDH == RDT as empty ring we'll
>> probably get correct behavior for current Linux driver's code (needs
>> testing of course),
>> however we have no idea how Windows drivers work.
The windows driver should work the same way. If RDH == RDT the hardware
will treat that as a empty ring and will hang. If there is a driver
that is setting RDH == RDT to indicate the ring is full please let us
know as that is likely a buggy driver.
> Thanks, for the great explanation, Dmitry.
>
> Alexander: I CCed you because I hope you might be able to explain what
> the 82540EM card does when a driver sets RDT to the value of RDH. The
> QEMU NIC emulation code treats this as a full ring (i.e. the
> descriptors are valid and will be filled in by the hardware). Does
> the real hardware act like this or will it treat this condition as
> ring empty (i.e. if the driver sets RDT to the value of RDH then the
> hardware stops receive because there are no descriptors available)?
>
> I can't find a statement in the Intel datasheet about what happens
> when the driver sets RDT = RDH. The QEMU check_rxov variable is
> trying to distinguish between ring empty (RDH has moved to RDT) and
> ring full (driver has set RDH = RDT because the full descriptor ring
> is available).
If RDT == RDH then we should stop receiving traffic. As far as I know
all of our e1000 hardware treat RDT == RDH as an empty ring state. All
of the drivers should have code in place to stop it. For example the
E1000_DESC_UNUSED macro should be returning ring size - 1 in the case of
RDT == RDH which will result in the head being 0 and the tail being ring
size - 2.
> Dmitry: At this point we'd need to test what happens on real hardware
> when RDH = RDT in order to be able to remove check_rxov. As you
> mentioned, with the Linux e1000 driver we don't see ring full RDH =
> RDT:
>
> /* call E1000_DESC_UNUSED which always leaves
> * at least 1 descriptor unused to make sure
> * next_to_use != next_to_clean */
> for (i = 0; i < adapter->num_rx_queues; i++) {
> struct e1000_rx_ring *ring = &adapter->rx_ring[i];
> adapter->alloc_rx_buf(adapter, ring,
> E1000_DESC_UNUSED(ring));
> }
>
> Here some sample output from a QEMU printf, notice how RDH is never
> the same as RDT once rx begins:
>
> set_rdt rdh=0 rdt_old=0 rdt_new=0
> set_rdt rdh=0 rdt_old=0 rdt_new=254
> set_rdt rdh=1 rdt_old=254 rdt_new=255
> set_rdt rdh=2 rdt_old=255 rdt_new=0
> set_rdt rdh=3 rdt_old=0 rdt_new=1
> set_rdt rdh=4 rdt_old=1 rdt_new=2
> set_rdt rdh=5 rdt_old=2 rdt_new=3
> set_rdt rdh=6 rdt_old=3 rdt_new=4
> set_rdt rdh=7 rdt_old=4 rdt_new=5
> set_rdt rdh=9 rdt_old=5 rdt_new=7
> set_rdt rdh=10 rdt_old=7 rdt_new=8
> set_rdt rdh=11 rdt_old=8 rdt_new=9
> set_rdt rdh=12 rdt_old=9 rdt_new=10
> set_rdt rdh=13 rdt_old=10 rdt_new=11
> set_rdt rdh=14 rdt_old=11 rdt_new=12
>
> The iPXE 'intel' driver (supports e1000 cards) also does not set RDH =
> RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a
> time.
>
> The reason I'm digging into the need for check_rxov is because it's a
> dangerous piece of code to have. If check_rxov logic is ever out of
> sync we risk memory corruption. I'd really like to drop it
> completely.
>
> Stefan
There should be no need for check_rxov. As far as I know none of our
drivers will ever set RDT == RDH if there are descriptors available on
the ring.
Thanks,
Alex
next prev parent reply other threads:[~2012-10-18 16:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-17 18:31 [Qemu-devel] [PATCH 0/2] E1000 RX/Live migration bugs fixed Dmitry Fleytman
2012-10-17 18:31 ` [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled Dmitry Fleytman
2012-10-18 7:31 ` Stefan Hajnoczi
2012-10-18 8:08 ` Dmitry Fleytman
2012-10-18 8:09 ` Stefan Hajnoczi
2012-10-18 8:34 ` Dmitry Fleytman
2012-10-18 14:31 ` Stefan Hajnoczi
2012-10-18 16:06 ` Alexander Duyck [this message]
2012-10-18 16:12 ` Dmitry Fleytman
2012-10-17 18:31 ` [Qemu-devel] [PATCH 2/2] Add check_rxov into VMState Dmitry Fleytman
2012-10-18 7:24 ` Stefan Hajnoczi
2012-10-18 8:06 ` Dmitry Fleytman
2012-10-18 14:56 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5080291F.5020000@intel.com \
--to=alexander.h.duyck@intel.com \
--cc=chris.webb@elastichosts.com \
--cc=dmitry@daynix.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.davies@elastichosts.com \
--cc=stefanha@gmail.com \
--cc=yan@daynix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.