From: Wei Liu <wei.liu2@citrix.com>
To: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
David Vrabel <david.vrabel@citrix.com>, <netdev@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH net-next 2/2] xen-netback: Turn off the carrier if the guest is not able to receive
Date: Tue, 5 Aug 2014 13:45:52 +0100 [thread overview]
Message-ID: <20140805124552.GE11230@zion.uk.xensource.com> (raw)
In-Reply-To: <1407165658-20146-3-git-send-email-zoltan.kiss@citrix.com>
On Mon, Aug 04, 2014 at 04:20:58PM +0100, Zoltan Kiss wrote:
[...]
> struct xenvif {
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index fbdadb3..48a55cd 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -78,8 +78,12 @@ int xenvif_poll(struct napi_struct *napi, int budget)
> /* This vif is rogue, we pretend we've there is nothing to do
> * for this vif to deschedule it from NAPI. But this interface
> * will be turned off in thread context later.
> + * Also, if a guest doesn't post enough slots to receive data on one of
> + * its queues, the carrier goes down and NAPI is descheduled here so
> + * the guest can't send more packets until it's ready to receive.
> */
> - if (unlikely(queue->vif->disabled)) {
> + if (unlikely(queue->vif->disabled ||
> + !netif_carrier_ok(queue->vif->dev))) {
> napi_complete(napi);
> return 0;
> }
> @@ -97,7 +101,16 @@ int xenvif_poll(struct napi_struct *napi, int budget)
> static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
> {
> struct xenvif_queue *queue = dev_id;
> + struct netdev_queue *net_queue =
> + netdev_get_tx_queue(queue->vif->dev, queue->id);
>
> + /* QUEUE_STATUS_RX_PURGE_EVENT is only set if either QDisc was off OR
> + * the carrier went down and this queue was previously blocked
> + */
Could you change "blocked" to "stalled" so that the comment matches the
code closely?
> + if (unlikely(netif_tx_queue_stopped(net_queue) ||
> + (!netif_carrier_ok(queue->vif->dev) &&
> + test_bit(QUEUE_STATUS_RX_STALLED, &queue->status))))
> + set_bit(QUEUE_STATUS_RX_PURGE_EVENT, &queue->status);
> xenvif_kick_thread(queue);
>
> return IRQ_HANDLED;
> @@ -125,16 +138,14 @@ void xenvif_wake_queue(struct xenvif_queue *queue)
> netif_tx_wake_queue(netdev_get_tx_queue(dev, id));
> }
>
> -/* Callback to wake the queue and drain it on timeout */
> -static void xenvif_wake_queue_callback(unsigned long data)
> +/* Callback to wake the queue's thread and turn the carrier off on timeout */
> +static void xenvif_rx_stalled(unsigned long data)
> {
> struct xenvif_queue *queue = (struct xenvif_queue *)data;
>
> if (xenvif_queue_stopped(queue)) {
> - netdev_err(queue->vif->dev, "draining TX queue\n");
> - queue->rx_queue_purge = true;
> + set_bit(QUEUE_STATUS_RX_PURGE_EVENT, &queue->status);
> xenvif_kick_thread(queue);
> - xenvif_wake_queue(queue);
> }
> }
>
[...]
> static inline int tx_work_todo(struct xenvif_queue *queue)
> @@ -1935,6 +1934,75 @@ static void xenvif_start_queue(struct xenvif_queue *queue)
> xenvif_wake_queue(queue);
> }
>
> +/* Only called from the queue's thread, it handles the situation when the guest
> + * doesn't post enough requests on the receiving ring.
> + * First xenvif_start_xmit disables QDisc and start a timer, and then either the
> + * timer fires, or the guest send an interrupt after posting new request. If it
> + * is the timer, the carrier is turned off here.
> + * */
Please remove that extra "*".
> +static void xenvif_rx_purge_event(struct xenvif_queue *queue)
> +{
> + /* Either the last unsuccesful skb or at least 1 slot should fit */
> + int needed = queue->rx_last_skb_slots ?
> + queue->rx_last_skb_slots : 1;
> +
> + /* It is assumed that if the guest post new slots after this, the RX
> + * interrupt will set the QUEUE_STATUS_RX_PURGE_EVENT bit and wake up
> + * the thread again
> + */
Basically in this state machine you have a tuple (RX_STALLED bit,
PURGE_EVENT bit, carrier state). This whole state transaction is very
scary, any chance you can draw a graph like the xenbus state machine in
xenbus.c?
I fear that after three month noone can easily understand this code
unless he / she spends half a day reading the code. And without defining
what state is legal it's very hard to tell what behavior is expected and
what is not.
> + set_bit(QUEUE_STATUS_RX_STALLED, &queue->status);
> + if (!xenvif_rx_ring_slots_available(queue, needed)) {
> + rtnl_lock();
> + if (netif_carrier_ok(queue->vif->dev)) {
> + /* Timer fired and there are still no slots. Turn off
> + * everything except the interrupts
> + */
> + netif_carrier_off(queue->vif->dev);
> + skb_queue_purge(&queue->rx_queue);
> + queue->rx_last_skb_slots = 0;
> + if (net_ratelimit())
> + netdev_err(queue->vif->dev, "Carrier off due to lack of guest response on queue %d\n", queue->id);
Line too long.
> + } else {
> + /* Probably an another queue already turned the carrier
> + * off, make sure nothing is stucked in the internal
> + * queue of this queue
> + */
> + skb_queue_purge(&queue->rx_queue);
> + queue->rx_last_skb_slots = 0;
> + }
> + rtnl_unlock();
> + } else if (!netif_carrier_ok(queue->vif->dev)) {
> + unsigned int num_queues = queue->vif->num_queues;
> + unsigned int i;
> + /* The carrier was down, but an interrupt kicked
> + * the thread again after new requests were
> + * posted
> + */
> + clear_bit(QUEUE_STATUS_RX_STALLED,
> + &queue->status);
> + rtnl_lock();
> + netif_carrier_on(queue->vif->dev);
> + netif_tx_wake_all_queues(queue->vif->dev);
> + rtnl_unlock();
> +
> + for (i = 0; i < num_queues; i++) {
> + struct xenvif_queue *temp = &queue->vif->queues[i];
> +
> + xenvif_napi_schedule_or_enable_events(temp);
> + }
> + if (net_ratelimit())
> + netdev_err(queue->vif->dev, "Carrier on again\n");
> + } else {
> + /* Queuing were stopped, but the guest posted
> + * new requests and sent an interrupt
> + */
> + clear_bit(QUEUE_STATUS_RX_STALLED,
> + &queue->status);
> + del_timer_sync(&queue->rx_stalled);
> + xenvif_start_queue(queue);
> + }
> +}
> +
> int xenvif_kthread_guest_rx(void *data)
> {
> struct xenvif_queue *queue = data;
> @@ -1944,8 +2012,12 @@ int xenvif_kthread_guest_rx(void *data)
> wait_event_interruptible(queue->wq,
> rx_work_todo(queue) ||
> queue->vif->disabled ||
> + test_bit(QUEUE_STATUS_RX_PURGE_EVENT, &queue->status) ||
Line too long.
> kthread_should_stop());
>
> + if (kthread_should_stop())
> + break;
> +
> /* This frontend is found to be rogue, disable it in
> * kthread context. Currently this is only set when
> * netback finds out frontend sends malformed packet,
> @@ -1955,24 +2027,21 @@ int xenvif_kthread_guest_rx(void *data)
> */
> if (unlikely(queue->vif->disabled && queue->id == 0))
> xenvif_carrier_off(queue->vif);
I think you also need to check vif->disabled flag in your following code
so that you don't accidently re-enable a rogue vif in a queue whose id
!= 0.
Further more "disabled" can be transformed to a bit in vif->status.
You can incorporate such change in your previous patch or a separate
prerequisite patch.
> -
> - if (kthread_should_stop())
> - break;
> -
> - if (queue->rx_queue_purge) {
> + else if (unlikely(test_and_clear_bit(QUEUE_STATUS_RX_PURGE_EVENT,
> + &queue->status))) {
> + xenvif_rx_purge_event(queue);
> + } else if (!netif_carrier_ok(queue->vif->dev)) {
> + /* Another queue stalled and turned the carrier off, so
> + * purge the internal queue of queues which were not
> + * blocked
> + */
"blocked" -> "stalled"?
In theory even one queue stalls all other queues can still make
progress, isn't it?
Wei.
next prev parent reply other threads:[~2014-08-05 12:45 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-04 15:20 [PATCH net-next 0/2] xen-netback: Changes around carrier handling Zoltan Kiss
2014-08-04 15:20 ` [PATCH net-next 1/2] xen-netback: Using a new state bit instead of carrier Zoltan Kiss
2014-08-05 12:45 ` Wei Liu
2014-08-05 12:45 ` Wei Liu
2014-08-06 18:25 ` Zoltan Kiss
2014-08-06 18:25 ` Zoltan Kiss
2014-08-04 15:20 ` Zoltan Kiss
2014-08-04 15:20 ` [PATCH net-next 2/2] xen-netback: Turn off the carrier if the guest is not able to receive Zoltan Kiss
2014-08-04 15:20 ` Zoltan Kiss
2014-08-05 12:45 ` Wei Liu [this message]
2014-08-06 19:18 ` Zoltan Kiss
2014-08-06 19:18 ` Zoltan Kiss
2014-08-05 12:45 ` Wei Liu
2014-08-04 15:34 ` [PATCH net-next 0/2] xen-netback: Changes around carrier handling Zoltan Kiss
2014-08-04 15:34 ` Zoltan Kiss
2014-08-05 23:07 ` David Miller
2014-08-06 0:00 ` Wei Liu
2014-08-06 1:50 ` David Miller
2014-08-06 8:54 ` Wei Liu
2014-08-06 8:54 ` Wei Liu
2014-08-06 1:50 ` David Miller
2014-08-06 0:00 ` Wei Liu
2014-08-06 19:20 ` Zoltan Kiss
2014-08-06 19:20 ` Zoltan Kiss
2014-08-06 21:01 ` David Miller
2014-08-06 21:01 ` David Miller
2014-08-07 15:51 ` Zoltan Kiss
2014-08-08 5:28 ` David Miller
2014-08-08 5:28 ` David Miller
2014-08-07 15:51 ` Zoltan Kiss
2014-08-07 16:49 ` Zoltan Kiss
2014-08-07 16:49 ` Zoltan Kiss
2014-08-08 5:29 ` David Miller
2014-08-08 5:29 ` David Miller
2014-08-05 23:07 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140805124552.GE11230@zion.uk.xensource.com \
--to=wei.liu2@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=zoltan.kiss@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.