linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Haotian Wang <haotian.wang@sifive.com>
Cc: kishon@ti.com, lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	jasowang@redhat.com, linux-pci@vger.kernel.org,
	haotian.wang@duke.edu
Subject: Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function
Date: Thu, 5 Sep 2019 03:07:15 -0400	[thread overview]
Message-ID: <20190905025823-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20190903203938.899-1-haotian.wang@sifive.com>

On Tue, Sep 03, 2019 at 04:39:38PM -0400, Haotian Wang wrote:
> Hi Michael,
> 
> Thank you for your feedback!
> 
> On Tue, Sep 3, 2019 at 2:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Aug 23, 2019 at 02:31:45PM -0700, Haotian Wang wrote:
> > > This endpoint function enables the PCI endpoint to establish a virtual
> > > ethernet link with the PCI host. The main features are:
> > > 
> > > - Zero modification of PCI host kernel. The only requirement for the
> > >   PCI host is to enable virtio, virtio_pci, virtio_pci_legacy and
> > >   virito_net.
> > 
> > Do we need to support legacy? Why not just the modern interface?
> > Even if yes, limiting device
> > to only legacy support is not a good idea.
> 
> I absolutely agree with you on modern interfaces being better. The issue
> here is that I did not support legacy because of compatibility reasons
> but because I was forced to choose legacy.
> 
> In the summer, I asked the hardware team whether I had read-write access
> to the capabilities registers from the endpoint but did not receive a
> response back then.
> 
> Now I can write the code using modern virtio but I cannot easily verify
> the epf will actually function on the hardware.
> 
> Reading and writing of capabilities list registers requires patches to
> the pci endpoint framework and the designware endpoint controller as
> well. I will probably work on that after I resolve these other issues.
> 
> > > +	if (!atomic_xchg(pending, 0))
> > > +		usleep_range(check_queues_usec_min,
> > > +			     check_queues_usec_max);
> > 
> > What's the usleep hackery doing? Set it too low and you
> > waste cycles. Set it too high and your latency suffers.
> > It would be nicer to just use a completion or something like this.
> 
> If the pending bit is set, the kthread will go directly into another
> round. The usleep is here because, in case the pending bit is not set,
> the kthread waits a certain while and then checks for buffers anyway as
> a sort of "fallback" check.
> 
> Problem with completion is that there is no condition to complete on. I
> can change the usleep_range() to schedule() if that is a more sensible
> thing to do.
> 
> If you mean wait until the pending bit is set, I can do that. Back when
> I wrote this module, the reason for not doing that was the endpoint
> might fail to catch notification from the host.
> 
> If you are interested, here is a more detailed expanation.


So the below basically means the communication is racy.
Yes using timers help recover from that, but in
a way that is very expensive, and will also lead
to latency spikes.

> > > +static int pci_epf_virtio_catch_notif(void *data)
> > > +{
> > > +	u16 changed;
> > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> > > +	void __iomem *avail_idx;
> > > +	u16 event;
> > > +#endif
> > > +
> > > +	register const __virtio16 default_notify = epf_cpu_to_virtio16(2);
> > > +
> > > +	struct pci_epf_virtio *const epf_virtio = data;
> > > +	atomic_t *const pending = epf_virtio->pending;
> > > +
> > > +	while (!kthread_should_stop()) {
> > > +		changed = epf_virtio16_to_cpu(epf_virtio->legacy_cfg->q_notify);
> > > +		if (changed != 2) {
> > > +			epf_virtio->legacy_cfg->q_notify = default_notify;
> > > +			/* The pci host has made changes to virtqueues */
> > > +			if (changed)
> > > +				atomic_cmpxchg(pending, 0, 1);
> > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> > > +			avail_idx = IO_MEMBER_PTR(epf_virtio->avail[changed],
> > > +						  struct vring_avail,
> > > +						  idx);
> > > +			event = epf_ioread16(avail_idx) + event_suppression;
> > > +			write_avail_event(epf_virtio->used[changed], event);
> > > +#endif
> > > +		}
> > > +		usleep_range(notif_poll_usec_min,
> > > +			     notif_poll_usec_max);
> > > +	}
> > > +	return 0;
> > > +}
> 
> The pending bit is set if the notification polling thread sees a value
> in legacy_cfg->q_notify that is not 2, because the PCI host virtio_pci
> will write either 0 when its rx queue consumes something or 1 if its tx
> queue has offered a new buffer. My endpoing function will then set that
> value back to 2. In this process there are numerous things that can go
> wrong.
> 
> The host may write multiple 0 or 1's and the endpoint can only
> detect one of them in an notif_poll usleep interval.

Right. Notifications weren't designed to be implemented on top of RW
memory like this: the assumption was all notifications are buffered.
So if you implement modern instead, different queues can use
different addresses.

> 
> The host may write
> some non-2 value as the endpoint code just finishes detecting the last
> non-2 value and reverting that value back to 2, effectively nullifying
> the new non-2 value.
> 
> The host may decide to write a non-2 value
> immediately after the endpoint revert that value back to 2 but before
> the endpoint code finishes the current loop of execution, effectively
> making the value not reverted back to 2.
> 
> All these and other problems are made worse by the fact that the PCI
> host Linux usually runs on much faster cores than the one on PCI
> endpoint. This is why relying completely on pending bits is not always
> safe. Hence the "fallback" check using usleep hackery exists.
> Nevertheless I welcome any suggestion, because I do not like this
> treatment myself either.

As long as you have a small number of queues, you can poll both
of them. And to resolve racing with host, re-check
rings after you write 2 into the selector
(btw you also need a bunch of memory barriers, atomics don't
imply them automatically).


> > > +	net_cfg->max_virtqueue_pairs = (__force __u16)epf_cpu_to_virtio16(1);
> > 
> > You don't need this without VIRTIO_NET_F_MQ.
> 
> Noted.
> 
> Best,
> Haotian

  reply	other threads:[~2019-09-05  7:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 21:31 [PATCH] pci: endpoint: functions: Add a virtnet EP function Haotian Wang
2019-08-26 10:51 ` Kishon Vijay Abraham I
2019-08-26 21:59   ` Haotian Wang
2019-08-27  8:12     ` Kishon Vijay Abraham I
2019-08-27 18:01       ` Haotian Wang
2019-08-30  6:11 ` Jason Wang
2019-08-30 23:06   ` Haotian Wang
2019-09-02  3:50     ` Jason Wang
2019-09-02 20:05       ` Haotian Wang
2019-09-03 10:42         ` Jason Wang
2019-09-04  0:55           ` Haotian Wang
2019-09-04 21:58           ` Haotian Wang
2019-09-05  2:56             ` Jason Wang
2019-09-05  3:28               ` Haotian Wang
2019-11-25 12:49               ` Kishon Vijay Abraham I
2019-11-26  9:58                 ` Jason Wang
2019-11-26 12:35                   ` Kishon Vijay Abraham I
2019-11-26 21:55                     ` Alan Mikhak
2019-11-26 22:01                       ` Alan Mikhak
2019-11-27  3:04                         ` Jason Wang
2019-09-03  6:25 ` Michael S. Tsirkin
2019-09-03 20:39   ` Haotian Wang
2019-09-05  7:07     ` Michael S. Tsirkin [this message]
2019-09-05 16:15       ` Haotian Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905025823-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=haotian.wang@duke.edu \
    --cc=haotian.wang@sifive.com \
    --cc=jasowang@redhat.com \
    --cc=kishon@ti.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).