netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: davem@davemloft.net, raghuram.kothakota@oracle.com,
	netdev@vger.kernel.org
Subject: Re: [PATCH net-next 1/2] sunvnet: Process Rx data packets in a BH handler
Date: Wed, 1 Oct 2014 15:39:14 -0400	[thread overview]
Message-ID: <20141001193914.GL17706@oracle.com> (raw)
In-Reply-To: <1412190559.16704.59.camel@edumazet-glaptop2.roam.corp.google.com>

On (10/01/14 12:09), Eric Dumazet wrote:
> > -
> > +	/* BH context cannot call netif_receive_skb */
> > +	netif_rx_ni(skb);
> 
> Really ? What about the standard and less expensive netif_receive_skb ?

I can't use netif_receive_skb in this case:
the TCP retransmit timers are softirq context. They can pre-empt here, 
and result in a deadlock on socket locks. E.g.,

tcp_write_timer+0xc/0xa0 <-- wants sk_lock
call_timer_fn+0x24/0x120
run_timer_softirq+0x214/0x2a0
__do_softirq+0xb8/0x200
do_softirq+0x8c/0xc0
local_bh_enable+0xac/0xc0
ip_finish_output+0x254/0x4a0
ip_output+0xc4/0xe0
ip_local_out+0x2c/0x40
ip_queue_xmit+0x140/0x3c0
tcp_transmit_skb+0x448/0x740
tcp_write_xmit+0x220/0x480
__tcp_push_pending_frames+0x38/0x100
tcp_rcv_established+0x214/0x780
tcp_v4_do_rcv+0x154/0x300
tcp_v4_rcv+0x6cc/0xa60   <-- takes sk_lock
  :
netif_receive_skb
 

Ideally I would have liked to  use netif_receive_skb (it boosts perf)
but I had to back off for this reason.

> > +
> > +	struct mutex            vnet_rx_mutex; /* serializes rx_workq */
> > +	struct work_struct      rx_work;
> > +	struct workqueue_struct *rx_workq;
> > +
> >  };
> 
> Could you describe in the changelog why all this is needed ?

So I gave a short summary in the cover letter, but more details

- processing packets in ldc_rx context risks live-lock
- I experimented with a few things, including NAPI, and just using a simple tasklet
  to take care of the data packet handling. With Both NAPI and tasklet, I'm able
  to use netif_receive_skb safely, however, mpstat shows that one CPU ends up
  doing all the processing, and scaling was inhibited.
- further, with  NAPI the budget gets in the way. 

Regarding your other comments"
  "You basically found a way to overcome NAPI standard limits (budget of 64)"
As I said in the cover letter, coercing a budget on sunvnet ends up actually
hurting perf significantly, as we end up sending additional stop/start messages.
To achieve that budget, we'd have to keep a lot more state in vnet to remember
the position in the stream but *not* send a STOP/START, and instead resume
at the next napi_schedule from where we left off.

Doing all this would end up just re-inventing much of the code in process_backlog
anyway.

--Sowmini

 

      reply	other threads:[~2014-10-01 19:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-01 18:56 [PATCH net-next 1/2] sunvnet: Process Rx data packets in a BH handler Sowmini Varadhan
2014-10-01 19:09 ` Eric Dumazet
2014-10-01 19:39   ` Sowmini Varadhan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141001193914.GL17706@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=raghuram.kothakota@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).