All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Haotian Wang <haotian.wang@sifive.com>
Cc: kishon@ti.com, lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	jasowang@redhat.com, linux-pci@vger.kernel.org,
	haotian.wang@duke.edu
Subject: Re: [PATCH] pci: endpoint: functions: Add a virtnet EP function
Date: Thu, 5 Sep 2019 03:07:15 -0400	[thread overview]
Message-ID: <20190905025823-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20190903203938.899-1-haotian.wang@sifive.com>

On Tue, Sep 03, 2019 at 04:39:38PM -0400, Haotian Wang wrote:
> Hi Michael,
> 
> Thank you for your feedback!
> 
> On Tue, Sep 3, 2019 at 2:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Aug 23, 2019 at 02:31:45PM -0700, Haotian Wang wrote:
> > > This endpoint function enables the PCI endpoint to establish a virtual
> > > ethernet link with the PCI host. The main features are:
> > > 
> > > - Zero modification of PCI host kernel. The only requirement for the
> > >   PCI host is to enable virtio, virtio_pci, virtio_pci_legacy and
> > >   virito_net.
> > 
> > Do we need to support legacy? Why not just the modern interface?
> > Even if yes, limiting device
> > to only legacy support is not a good idea.
> 
> I absolutely agree with you on modern interfaces being better. The issue
> here is that I did not support legacy because of compatibility reasons
> but because I was forced to choose legacy.
> 
> In the summer, I asked the hardware team whether I had read-write access
> to the capabilities registers from the endpoint but did not receive a
> response back then.
> 
> Now I can write the code using modern virtio but I cannot easily verify
> the epf will actually function on the hardware.
> 
> Reading and writing of capabilities list registers requires patches to
> the pci endpoint framework and the designware endpoint controller as
> well. I will probably work on that after I resolve these other issues.
> 
> > > +	if (!atomic_xchg(pending, 0))
> > > +		usleep_range(check_queues_usec_min,
> > > +			     check_queues_usec_max);
> > 
> > What's the usleep hackery doing? Set it too low and you
> > waste cycles. Set it too high and your latency suffers.
> > It would be nicer to just use a completion or something like this.
> 
> If the pending bit is set, the kthread will go directly into another
> round. The usleep is here because, in case the pending bit is not set,
> the kthread waits a certain while and then checks for buffers anyway as
> a sort of "fallback" check.
> 
> Problem with completion is that there is no condition to complete on. I
> can change the usleep_range() to schedule() if that is a more sensible
> thing to do.
> 
> If you mean wait until the pending bit is set, I can do that. Back when
> I wrote this module, the reason for not doing that was the endpoint
> might fail to catch notification from the host.
> 
> If you are interested, here is a more detailed expanation.


So the below basically means the communication is racy.
Yes using timers help recover from that, but in
a way that is very expensive, and will also lead
to latency spikes.

> > > +static int pci_epf_virtio_catch_notif(void *data)
> > > +{
> > > +	u16 changed;
> > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> > > +	void __iomem *avail_idx;
> > > +	u16 event;
> > > +#endif
> > > +
> > > +	register const __virtio16 default_notify = epf_cpu_to_virtio16(2);
> > > +
> > > +	struct pci_epf_virtio *const epf_virtio = data;
> > > +	atomic_t *const pending = epf_virtio->pending;
> > > +
> > > +	while (!kthread_should_stop()) {
> > > +		changed = epf_virtio16_to_cpu(epf_virtio->legacy_cfg->q_notify);
> > > +		if (changed != 2) {
> > > +			epf_virtio->legacy_cfg->q_notify = default_notify;
> > > +			/* The pci host has made changes to virtqueues */
> > > +			if (changed)
> > > +				atomic_cmpxchg(pending, 0, 1);
> > > +#ifdef CONFIG_PCI_EPF_VIRTIO_SUPPRESS_NOTIFICATION
> > > +			avail_idx = IO_MEMBER_PTR(epf_virtio->avail[changed],
> > > +						  struct vring_avail,
> > > +						  idx);
> > > +			event = epf_ioread16(avail_idx) + event_suppression;
> > > +			write_avail_event(epf_virtio->used[changed], event);
> > > +#endif
> > > +		}
> > > +		usleep_range(notif_poll_usec_min,
> > > +			     notif_poll_usec_max);
> > > +	}
> > > +	return 0;
> > > +}
> 
> The pending bit is set if the notification polling thread sees a value
> in legacy_cfg->q_notify that is not 2, because the PCI host virtio_pci
> will write either 0 when its rx queue consumes something or 1 if its tx
> queue has offered a new buffer. My endpoing function will then set that
> value back to 2. In this process there are numerous things that can go
> wrong.
> 
> The host may write multiple 0 or 1's and the endpoint can only
> detect one of them in an notif_poll usleep interval.

Right. Notifications weren't designed to be implemented on top of RW
memory like this: the assumption was all notifications are buffered.
So if you implement modern instead, different queues can use
different addresses.

> 
> The host may write
> some non-2 value as the endpoint code just finishes detecting the last
> non-2 value and reverting that value back to 2, effectively nullifying
> the new non-2 value.
> 
> The host may decide to write a non-2 value
> immediately after the endpoint revert that value back to 2 but before
> the endpoint code finishes the current loop of execution, effectively
> making the value not reverted back to 2.
> 
> All these and other problems are made worse by the fact that the PCI
> host Linux usually runs on much faster cores than the one on PCI
> endpoint. This is why relying completely on pending bits is not always
> safe. Hence the "fallback" check using usleep hackery exists.
> Nevertheless I welcome any suggestion, because I do not like this
> treatment myself either.

As long as you have a small number of queues, you can poll both
of them. And to resolve racing with host, re-check
rings after you write 2 into the selector
(btw you also need a bunch of memory barriers, atomics don't
imply them automatically).


> > > +	net_cfg->max_virtqueue_pairs = (__force __u16)epf_cpu_to_virtio16(1);
> > 
> > You don't need this without VIRTIO_NET_F_MQ.
> 
> Noted.
> 
> Best,
> Haotian

  reply	other threads:[~2019-09-05  7:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 21:31 [PATCH] pci: endpoint: functions: Add a virtnet EP function Haotian Wang
2019-08-26 10:51 ` Kishon Vijay Abraham I
2019-08-26 21:59   ` Haotian Wang
2019-08-27  8:12     ` Kishon Vijay Abraham I
2019-08-27 18:01       ` Haotian Wang
2019-08-30  6:11 ` Jason Wang
2019-08-30 23:06   ` Haotian Wang
2019-09-02  3:50     ` Jason Wang
2019-09-02 20:05       ` Haotian Wang
2019-09-03 10:42         ` Jason Wang
2019-09-04  0:55           ` Haotian Wang
2019-09-04 21:58           ` Haotian Wang
2019-09-05  2:56             ` Jason Wang
2019-09-05  3:28               ` Haotian Wang
2019-11-25 12:49               ` Kishon Vijay Abraham I
2019-11-26  9:58                 ` Jason Wang
2019-11-26 12:35                   ` Kishon Vijay Abraham I
2019-11-26 21:55                     ` Alan Mikhak
2019-11-26 22:01                       ` Alan Mikhak
2019-11-27  3:04                         ` Jason Wang
2019-09-03  6:25 ` Michael S. Tsirkin
2019-09-03 20:39   ` Haotian Wang
2019-09-05  7:07     ` Michael S. Tsirkin [this message]
2019-09-05 16:15       ` Haotian Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905025823-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=haotian.wang@duke.edu \
    --cc=haotian.wang@sifive.com \
    --cc=jasowang@redhat.com \
    --cc=kishon@ti.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.