From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sowmini Varadhan Subject: Re: [PATCH net-next 1/2] sunvnet: Process Rx data packets in a BH handler Date: Wed, 1 Oct 2014 15:39:14 -0400 Message-ID: <20141001193914.GL17706@oracle.com> References: <20141001185615.GH17706@oracle.com> <1412190559.16704.59.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: davem@davemloft.net, raghuram.kothakota@oracle.com, netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from aserp1040.oracle.com ([141.146.126.69]:34199 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750884AbaJATjU (ORCPT ); Wed, 1 Oct 2014 15:39:20 -0400 Content-Disposition: inline In-Reply-To: <1412190559.16704.59.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On (10/01/14 12:09), Eric Dumazet wrote: > > - > > + /* BH context cannot call netif_receive_skb */ > > + netif_rx_ni(skb); > > Really ? What about the standard and less expensive netif_receive_skb ? I can't use netif_receive_skb in this case: the TCP retransmit timers are softirq context. They can pre-empt here, and result in a deadlock on socket locks. E.g., tcp_write_timer+0xc/0xa0 <-- wants sk_lock call_timer_fn+0x24/0x120 run_timer_softirq+0x214/0x2a0 __do_softirq+0xb8/0x200 do_softirq+0x8c/0xc0 local_bh_enable+0xac/0xc0 ip_finish_output+0x254/0x4a0 ip_output+0xc4/0xe0 ip_local_out+0x2c/0x40 ip_queue_xmit+0x140/0x3c0 tcp_transmit_skb+0x448/0x740 tcp_write_xmit+0x220/0x480 __tcp_push_pending_frames+0x38/0x100 tcp_rcv_established+0x214/0x780 tcp_v4_do_rcv+0x154/0x300 tcp_v4_rcv+0x6cc/0xa60 <-- takes sk_lock : netif_receive_skb Ideally I would have liked to use netif_receive_skb (it boosts perf) but I had to back off for this reason. > > + > > + struct mutex vnet_rx_mutex; /* serializes rx_workq */ > > + struct work_struct rx_work; > > + struct workqueue_struct *rx_workq; > > + > > }; > > Could you describe in the changelog why all this is needed ? So I gave a short summary in the cover letter, but more details - processing packets in ldc_rx context risks live-lock - I experimented with a few things, including NAPI, and just using a simple tasklet to take care of the data packet handling. With Both NAPI and tasklet, I'm able to use netif_receive_skb safely, however, mpstat shows that one CPU ends up doing all the processing, and scaling was inhibited. - further, with NAPI the budget gets in the way. Regarding your other comments" "You basically found a way to overcome NAPI standard limits (budget of 64)" As I said in the cover letter, coercing a budget on sunvnet ends up actually hurting perf significantly, as we end up sending additional stop/start messages. To achieve that budget, we'd have to keep a lot more state in vnet to remember the position in the stream but *not* send a STOP/START, and instead resume at the next napi_schedule from where we left off. Doing all this would end up just re-inventing much of the code in process_backlog anyway. --Sowmini