Re: [PATCH net-next 3/3] sunvnet: Schedule maybe_tx_wakeup as a tasklet from ldc_rx path

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: David Miller <davem@davemloft.net>
Cc: raghuram.kothakota@oracle.com, netdev@vger.kernel.org
Subject: Re: [PATCH net-next 3/3] sunvnet: Schedule maybe_tx_wakeup as a tasklet from ldc_rx path
Date: Wed, 13 Aug 2014 07:20:10 -0400	[thread overview]
Message-ID: <20140813112010.GC16865@oracle.com> (raw)
In-Reply-To: <20140812.212513.96441522917129709.davem@davemloft.net>

On (08/12/14 21:25), David Miller wrote:
> 
> However, don't get ahead of yourself.  All I'm saying in the context
> of reviewing this patch #3 is that you should leave vnet_tx_timeout()
> alone and just put what you were putting into vnet_tx_timeout() into
> maybe_tx_wakeup().

And just leave vnet_tx_timeout() as before (i.e, "XXX implement me"?)?

Sure, I can generate that (as a stand-alone unified-diff patch?) 
it would be the minimum fix needed to avoid the deadlock on 
netif_tx_lock() itself.

> This doesn't match my understanding.   My understanding is that when
> an LDC connection resets, the peer on the other side should automatically
> try to re-bind or whatever is necessary to re-establish the LDC link.
> 
> It's supposed to be fault tolerant in as many situations as possible
> without the need for explicit adminstrator intervention.

Ideally, yes. But that's not what I was observing when I tested this.

> But besides that, an LDC reset is a big hammer.  So I would say that it
> should be deferred to the last possible point.

Correct, and that's why it feels ok (to me) to do this when the
DRING_STOPPED fails (so that we also fail subsequent ldc_write's
from vnet_start_xmit()) but not in any other case.

Note that, as I pointed out to Raghuram already, doing ldc_disconnect
does *not* send any reset events. The code may have other missing
functionality. 

> So initially we do the backoff spinning loop synchronously.  If that
> doesn't succeed, we schedule a workqueue that can poll the LDC channel
> some more trying to do the write, in a sleepable context (so a loop
> with delays and preemption points) until a much larger timeout.  Only
> at this second timeout do we declare things to be serious enough for
> an LDC channel reset.
> 
> > So for case 1 we could do something very similar- I haven't tried this yet,
> > but here's what I'm thinking: vnet_start_xmit() should not do a
> > netif_stop_queue.  Instead it should do a (new function) "vio_stop_queue()"
> > which sets some (new) VIO_DR_FLOW_CONTROLLED  state in the vio_driver_state, 
> > (or perhaps some flag bit in the ldc_channel?) and also marks a 
> > (new) boolean is_flow_controlled state variable in the struct vnet 
> > as TRUE.
> > 
> > The modified maybe_tx_wakeup would check is_flow_controlled on the vnet,
> > and call a new vnet_wake_queue() to reset VIO_DR_FLOW_CONTROLLED
> > if appropriate.
> > 
> > vnet_start_xmit should drop packets as long as VIO_DR_FLOW_CONTROLLED
> > is set.
> 
> Again, as I stated yesterday, you really cannot do this.  Head of line
> blocking when one peer gets wedged is absolutely necessary.

In the light of the subsequent discussion about head-of-line blocking
(and it's applicability per ldc_channel, instead of per-net_device),
do we still think the above has issues, or is otherwise not worth
doing?

Further in the thread, David Miller wrote:

> So the question is how to manage this on the driver side, and the most
> natural way I see do this would be to use multiple TX netdev queues
> and a custom netdev_ops->ndo_select_queue() method which selects the
> queue based upon the peer that would be selected.

This might work well for the current sunvnet model (and even
there, it has limitations- if all the traffic is going out 
via the switchport to a gateway, you are again back to the 
head-of-line blocking model).

But how generally useful is this model? For point-to-multipoint links
like ethernet, I dont think you actually track one channel per
peer.  

--Sowmini

next prev parent reply	other threads:[~2014-08-13 11:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-12 14:35 [PATCH net-next 3/3] sunvnet: Schedule maybe_tx_wakeup as a tasklet from ldc_rx path Sowmini Varadhan
2014-08-12 22:13 ` David Miller
2014-08-13  1:58   ` Sowmini Varadhan
2014-08-13  4:25     ` David Miller
2014-08-13 11:20       ` Sowmini Varadhan [this message]
2014-08-13 23:29         ` David Miller
2014-08-13  4:26     ` Raghuram Kothakota
2014-08-13  4:31       ` David Miller
2014-08-13  5:14         ` Raghuram Kothakota
2014-08-13  5:29           ` David Miller
2014-08-13  5:44             ` Raghuram Kothakota
2014-08-13  6:11               ` David Miller
2014-08-13  6:37                 ` Raghuram Kothakota
2014-08-13 10:51       ` Sowmini Varadhan
2014-08-13 21:59         ` Raghuram Kothakota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140813112010.GC16865@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=raghuram.kothakota@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).