Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Ding Hui <dinghui@sangfor.com.cn>
To: Stefan Hajnoczi <stefanha@gmail.com>, ASM <asm@asm.pp.ru>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	dmitry.fleytman@gmail.com
Subject: Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Date: Mon, 23 May 2022 20:47:03 +0800	[thread overview]
Message-ID: <3dbd93db-e098-7a07-6000-4e055ffe8c94@sangfor.com.cn> (raw)
In-Reply-To: <20200102110504.GG121208@stefanha-x1.localdomain>

Hi ASM

I think I meet the almost exactly same issue with ASM (qemu e1000 tap, 
guest run dpdk), in our environment, rather than your github project

It seems that the desc (not only status filed, we also traced 
buffer_addr and csum) changed to old value suddenly after guest dpdk set 
status to 0

So I googled the issue, and found the bug at 
https://bugs.launchpad.net/qemu/+bug/1853123 and this mail list.

Can you tell me what's your workaround, and do you find out the root cause?

Thanks a lot

On 2020/1/2 19:05, Stefan Hajnoczi wrote:
> On Mon, Dec 30, 2019 at 01:10:15PM +0300, ASM wrote:
>>> It could be a bug in QEMU's e1000 emulation - maybe it's not doing
>>> things in the correct order and causes a race condition with the DPDK
>>> polling driver - or it could be a bug in the DPDK e1000 driver regarding
>>> the order in which the descriptor ring and RX Head/Tail MMIO registers
>>> are updated.
>>
>>
>> What did I understand:
>> * DPDK and Kernel drivers work like simular with ring. It don't
>> analize Head, but check STATUS.
>> This is a bit strange but completely correct driver behavior. If the
>> driver writes to memory, it expects
>> this value to be written. The problem is definitely not in the DPDK
>> and in the kernel driver.
>> * This problem appears on KVM, but not appears on tcg.
>> * Most similar to a bug in QEMU e1000 emulation. The e1000 emulation
>> read and writes to some
>> memory and same times, same as dpdk driver.
>>
>>
>> As I understand it, KVM explicitly prohibits access to shared memory.
>> It is obvious that us need to
>> protect (RCU) all STATUS registers of all buffers. There can be a lot
>> of buffers and they can be
>> scattered throughout the memory.
> 
> I don't agree with this reasoning because these descriptor rings are
> designed to support simultaneous access by the driver and the device (if
> done with care!).  What QEMU and the driver are attempting is typical.
> 
> The important thing for safe shared memory access is that both the
> driver and the device know who is allowed to write to a memory location
> at a point in time.  As long as both the driver and device don't write
> to the same location it is possible to safely use the data structure.
> 
> The typical pattern that a driver or device uses to publish information
> is:
> 
>    1. Fill in all fields but don't set the STATUS bit yet.
>    2. Memory barrier or other memory ordering directive to ensure that
>       fields have been written.  This operation might be implicit.
>    3. Set the STATUS bit in a separate operation.
> 
> On the read side there should be:
> 
>    1. Load the STATUS field and check the bit.  If it's clear then try
>       again.
>    2. Memory barrier or other memory ordering directive to ensure that
>       fields have been written.  This operation might be implicit.
>    3. Load all the fields.
> 
> Can you explain why the STATUS fields need to be protected?  My
> understanding is that the STATUS field is owned by the device at this
> point in time.  The device writes to the STATUS field and the driver
> polls it - this is safe.
> 
> I think the issue is bugs in the code.  The DPDK code looks questionable
> (https://git.dpdk.org/dpdk/tree/drivers/net/e1000/em_rxtx.c#n715):
> 
>    volatile struct e1000_rx_desc *rxdp;
>    ...
>    rxdp = &rx_ring[rx_id];
>    status = rxdp->status;
>    if (! (status & E1000_RXD_STAT_DD))
>        break;
>    rxd = *rxdp;
> 
> Although there is a conditional branch on rxdp->status, the CPU may load
> *rxdp before loading rxdp->status.  These pointers are volatile but
> that's not enough to prevent the CPU from reordering memory accesses
> (the compiler emits regular load instructions without memory ordering
> directives).
> 
> This is probably not the bug that you're seeing, but it suggests there
> are more problems.
> 
> Stefan
> 


-- 
Thanks,
- Ding Hui

     prev parent reply	other threads:[~2022-05-23 13:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAMmAVbWzrYWZBXwKxSd-f5SXmq6qP1ok8abvyKJhp3=REEaMPA@mail.gmail.com>
2019-11-20 17:36 ` PCI memory sync question (kvm,dpdk,e1000,packet stalled) ASM
2019-11-21 14:05   ` Stefan Hajnoczi
2019-11-27 12:39     ` ASM
2019-12-19 14:58       ` Stefan Hajnoczi
2019-12-30 10:10         ` ASM
2020-01-02 11:05           ` Stefan Hajnoczi
2022-05-23 12:47             ` Ding Hui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3dbd93db-e098-7a07-6000-4e055ffe8c94@sangfor.com.cn \
    --to=dinghui@sangfor.com.cn \
    --cc=asm@asm.pp.ru \
    --cc=dmitry.fleytman@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).