From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yasushi SHOJI Subject: Re: sh_eth.c::sh_eth_rx(): mdp->rx_skbuff[entry] can be NULL Date: Wed, 21 Oct 2015 16:26:39 +0900 Message-ID: <87eggosmvk.wl@dns1.atmark-techno.com> References: <87oafusy11.wl@dns1.atmark-techno.com> <5626A881.2080701@cogentembedded.com> Mime-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Cc: netdev@vger.kernel.org To: sergei.shtylyov@cogentembedded.com Return-path: Received: from p654789.hkidff01.ap.so-net.ne.jp ([121.101.71.137]:53172 "EHLO gw.atmark-techno.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751860AbbJUH0q (ORCPT ); Wed, 21 Oct 2015 03:26:46 -0400 Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by gw.atmark-techno.com (Postfix) with ESMTPS id CCA9820503 for ; Wed, 21 Oct 2015 16:26:43 +0900 (JST) Received: by padhk11 with SMTP id hk11so46893082pad.1 for ; Wed, 21 Oct 2015 00:26:42 -0700 (PDT) In-Reply-To: <5626A881.2080701@cogentembedded.com> Sender: netdev-owner@vger.kernel.org List-ID: Hi Sergei, Thank your for your reply. On Wed, 21 Oct 2015 05:48:01 +0900, Sergei Shtylyov wrote: > > On 10/19/2015 06:01 PM, Yasushi SHOJI wrote: > > > I'm not that familiar with this code base so I'm note including any > > patch yet. I appreciate if someone with insight in this code give a > > quick look and tell me that it's a real one or not. if this is a real > > case, I can take a deep look. > > If you got the oops, it's real. Thanks for the reporting. I guess I > should check the new ravb driver as well... > Do you want to try fixing the bug yourself? Sure. I can dive in to this. I appreciate if someone who has worked on sh_eth.c give me some design advises or tell me the initial design thoughts / what was the intention when allocation if failed. My idea right now is to simply invalidate the descriptor when netdev_alloc_skb() failed. When next packet arrived, in near future, the driver can try again to allocate the buffer and update the corresponding descriptor if succeeds. If memory is not yet available when the controller is trying to use the invalid descriptor, the controller will see it and DMA will stop. Is it acceptable path to go? Here is how I understand this driver: sh_eth.c uses napi to handle incoming packet. It calls __napi_schedule() in its interrupt handler. sh_eth_poll() is the pool method of the driver. sh_eth_poll calls sh_eth_rx() to handle incoming packet. And this is the function I need to fix. The driver utilizes array of sk_buffs for tx and rx. For rx, the driver has an array of pointers of sk_buffs, rx_skbuff[]. This rx_skbuff[] is filled with sk_buffs in sh_eth_ring_format() which is called when the driver is open()ed. The controller, the driver is targeted to, is GETHER. This controller is capable to DMA transfer the received packets to the system memory. The link between them is described in "Receive Descriptor". A receive descriptor corresponds to one sk_buff. The controller expects array of descriptors in the system memory and treat it as a ring, meaning that the controller process each descriptor one by one. Once the controller finished the last descriptor, it will go back to the first one. To achieve zero copy, the driver push the sk_buffs filled with received packet to the netdev core with netif_receive_skb() then netdev_alloc_skb() sk_buffs in the sh_eth_rx(), the poll method of the driver, and update the corresponding descriptor. If the allocation failed, it just leave the function, leaving old pointer in the descriptor as is. In some future, the controller will access the descriptor and writes to the old memory address. (I haven't checked the state of bits in the descriptor yet) BTW, if any one has a bit of time, I have questions regarding to the atomic allocation: Q1) is it OK to _always_ use napi_schedule()? Q2) is it OK to netdev_alloc_skb() in napi poll method? (sh_eth_rx) this, combined with Q1, leads to allocate sk_buff with GFP_ATOMIC all the time. Q3) if not, any alternative way to allocate sk_buff? ie. a) allocate sk_buff in pool out of interrupt context, and get sk_buff from it in softirq context. b) use dev_alloc_pages() instead. The slab allocator seem to fail if slabs are full but still order 0 pages are available. -- yashi