Re: [PATCH net-next 3/3] sunvnet: Schedule maybe_tx_wakeup as a tasklet from ldc_rx path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: David Miller <davem@davemloft.net>
Cc: raghuram.kothakota@oracle.com, netdev@vger.kernel.org
Subject: Re: [PATCH net-next 3/3] sunvnet: Schedule maybe_tx_wakeup as a tasklet from ldc_rx path
Date: Wed, 13 Aug 2014 07:20:10 -0400	[thread overview]
Message-ID: <20140813112010.GC16865@oracle.com> (raw)
In-Reply-To: <20140812.212513.96441522917129709.davem@davemloft.net>

On (08/12/14 21:25), David Miller wrote:
> 
> However, don't get ahead of yourself.  All I'm saying in the context
> of reviewing this patch #3 is that you should leave vnet_tx_timeout()
> alone and just put what you were putting into vnet_tx_timeout() into
> maybe_tx_wakeup().

And just leave vnet_tx_timeout() as before (i.e, "XXX implement me"?)?

Sure, I can generate that (as a stand-alone unified-diff patch?) 
it would be the minimum fix needed to avoid the deadlock on 
netif_tx_lock() itself.

> This doesn't match my understanding.   My understanding is that when
> an LDC connection resets, the peer on the other side should automatically
> try to re-bind or whatever is necessary to re-establish the LDC link.
> 
> It's supposed to be fault tolerant in as many situations as possible
> without the need for explicit adminstrator intervention.

Ideally, yes. But that's not what I was observing when I tested this.

> But besides that, an LDC reset is a big hammer.  So I would say that it
> should be deferred to the last possible point.

Correct, and that's why it feels ok (to me) to do this when the
DRING_STOPPED fails (so that we also fail subsequent ldc_write's
from vnet_start_xmit()) but not in any other case.

Note that, as I pointed out to Raghuram already, doing ldc_disconnect
does *not* send any reset events. The code may have other missing
functionality. 

> So initially we do the backoff spinning loop synchronously.  If that
> doesn't succeed, we schedule a workqueue that can poll the LDC channel
> some more trying to do the write, in a sleepable context (so a loop
> with delays and preemption points) until a much larger timeout.  Only
> at this second timeout do we declare things to be serious enough for
> an LDC channel reset.
> 
> > So for case 1 we could do something very similar- I haven't tried this yet,
> > but here's what I'm thinking: vnet_start_xmit() should not do a
> > netif_stop_queue.  Instead it should do a (new function) "vio_stop_queue()"
> > which sets some (new) VIO_DR_FLOW_CONTROLLED  state in the vio_driver_state, 
> > (or perhaps some flag bit in the ldc_channel?) and also marks a 
> > (new) boolean is_flow_controlled state variable in the struct vnet 
> > as TRUE.
> > 
> > The modified maybe_tx_wakeup would check is_flow_controlled on the vnet,
> > and call a new vnet_wake_queue() to reset VIO_DR_FLOW_CONTROLLED
> > if appropriate.
> > 
> > vnet_start_xmit should drop packets as long as VIO_DR_FLOW_CONTROLLED
> > is set.
> 
> Again, as I stated yesterday, you really cannot do this.  Head of line
> blocking when one peer gets wedged is absolutely necessary.

In the light of the subsequent discussion about head-of-line blocking
(and it's applicability per ldc_channel, instead of per-net_device),
do we still think the above has issues, or is otherwise not worth
doing?

Further in the thread, David Miller wrote:

> So the question is how to manage this on the driver side, and the most
> natural way I see do this would be to use multiple TX netdev queues
> and a custom netdev_ops->ndo_select_queue() method which selects the
> queue based upon the peer that would be selected.

This might work well for the current sunvnet model (and even
there, it has limitations- if all the traffic is going out 
via the switchport to a gateway, you are again back to the 
head-of-line blocking model).

But how generally useful is this model? For point-to-multipoint links
like ethernet, I dont think you actually track one channel per
peer.  

--Sowmini

next prev parent reply	other threads:[~2014-08-13 11:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-12 14:35 [PATCH net-next 3/3] sunvnet: Schedule maybe_tx_wakeup as a tasklet from ldc_rx path Sowmini Varadhan
2014-08-12 22:13 ` David Miller
2014-08-13  1:58   ` Sowmini Varadhan
2014-08-13  4:25     ` David Miller
2014-08-13 11:20       ` Sowmini Varadhan [this message]
2014-08-13 23:29         ` David Miller
2014-08-13  4:26     ` Raghuram Kothakota
2014-08-13  4:31       ` David Miller
2014-08-13  5:14         ` Raghuram Kothakota
2014-08-13  5:29           ` David Miller
2014-08-13  5:44             ` Raghuram Kothakota
2014-08-13  6:11               ` David Miller
2014-08-13  6:37                 ` Raghuram Kothakota
2014-08-13 10:51       ` Sowmini Varadhan
2014-08-13 21:59         ` Raghuram Kothakota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140813112010.GC16865@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=raghuram.kothakota@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.