From: Eric Dumazet <eric.dumazet@gmail.com>
To: Luis Henriques <luis.henriques@canonical.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>,
Neil Horman <nhorman@tuxdriver.com>,
netdev@vger.kernel.org, Jay Cliburn <jcliburn@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
stable@vger.kernel.org
Subject: Re: [net PATCH] atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
Date: Sat, 27 Jul 2013 12:49:42 -0700 [thread overview]
Message-ID: <1374954582.3669.12.camel@edumazet-glaptop> (raw)
In-Reply-To: <87k3kbdcmy.fsf@canonical.com>
On Sat, 2013-07-27 at 20:30 +0100, Luis Henriques wrote:
> Ben Hutchings <bhutchings@solarflare.com> writes:
>
> > On Sat, 2013-07-27 at 01:02 +0100, Ben Hutchings wrote:
> >> On Fri, 2013-07-26 at 12:47 -0400, Neil Horman wrote:
> >> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> >> > guarantees about the suitability of the memory for use in DMA. As a result
> >> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
> >> > reset:
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
> >> >
> >> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> >> > where you can set the gfp_mask explicitly to include GFP_DMA.
> >>
> >> This is a really bad idea. GFP_DMA means allocation from the ISA DMA
> >> region (< 16 MB). pci_map_single() takes care of allocating a bounce
> >> buffer if necessary.
> >>
> >> Ben.
> >>
> >> > Tested by two reporters in the above bug, who have the hardware to validate it.
> >> > Both report immediate cessation of the problem with this patch
> > [...]
> >
> > So perhaps the chip somehow fails to support a full 32-bit address
> > (which is the current DMA mask), though given that there are 64 address
> > bits in RX descriptors this seems unlikely. And the most likely result
> > of that would be memory corruption, not a stall.
> >
> > Alternately, perhaps more likely, there's something wrong with the
> > driver's error handling. If atl1_alloc_rx_buffer() fails then the RX
> > queue could run dry. Depending on how the hardware is designed, that
> > could result in a complete RX stall (no RX buffers available => no RX
> > completions => no attempt to allocate more RX buffers).
> >
> > Maybe your change makes it less likely for atl1_alloc_rx_buffer() to
> > fail. On a modern PC the (ISA) DMA zone is basically unused whereas
> > bounce buffers might be more contended. Did you try adding some logging
> > for failure of pci_map_single()?
> >
> > Ben.
>
> Just to add a little bit more context (and hopefully not noise), I
> started seeing this issue on 3.7. Bisection resulted on the following
> first bad commit:
>
> 69b08f6 net: use bigger pages in __netdev_alloc_frag
>
> Reverting this commit (and e5e6730 "skbuff: Move definition of
> NETDEV_FRAG_PAGE_MAX_SIZE") solved the problem.
>
> Note also that I'm seeing this issue on a 32 bits system (64 bits
> isn't supported). This initially made me think the problem could be
> related with this as 69b08f6 log explicitly refers to 32/64 bit
> archs. But I failed to find any obvious issue with the patch.
>
> Cheers,
Unfortunately, nothing makes sense here. It looks like a possible
hardware defect on some memory ranges, as only atl1c is impacted.
Restricting allocations to DMA zone would probably avoid this bug
completely, but it adds obvious problems as DMA zone is so small.
Say you have some tcp flows with application not reading received queue
fast enough, DMA zone will be depleted and network will just hang.
This driver never used GFP_DMA, so adding them now seems quite strange.
next prev parent reply other threads:[~2013-07-27 19:49 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-26 16:47 [net PATCH] atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring Neil Horman
2013-07-26 16:56 ` Luis Henriques
2013-07-26 17:02 ` Neil Horman
2013-07-26 22:56 ` David Miller
2013-07-27 16:25 ` Luis Henriques
2013-07-27 0:02 ` Ben Hutchings
2013-07-27 0:24 ` Ben Hutchings
2013-07-27 19:30 ` Luis Henriques
2013-07-27 19:49 ` Eric Dumazet [this message]
2013-07-27 21:30 ` Ben Hutchings
2013-07-27 23:59 ` Eric Dumazet
2013-07-28 3:02 ` David Miller
2013-07-28 10:44 ` Neil Horman
2013-07-28 16:15 ` Eric Dumazet
2013-07-28 18:53 ` Neil Horman
2013-07-28 19:21 ` Eric Dumazet
2013-07-28 20:08 ` Eric Dumazet
2013-07-28 20:22 ` Ben Hutchings
2013-07-28 23:01 ` Eric Dumazet
2013-07-28 23:20 ` Eric Dumazet
2013-07-28 23:25 ` Eric Dumazet
2013-07-28 23:38 ` Neil Horman
2013-07-29 0:07 ` Ben Hutchings
2013-07-29 0:21 ` David Miller
2013-07-29 0:26 ` Eric Dumazet
2013-07-29 9:55 ` Luis Henriques
2013-07-29 10:57 ` Eric Dumazet
2013-07-29 12:09 ` Luis Henriques
2013-07-29 15:30 ` Eric Dumazet
2013-07-29 17:24 ` Eric Dumazet
2013-07-30 8:53 ` Luis Henriques
2013-07-31 2:11 ` David Miller
2013-07-31 17:48 ` Benjamin Poirier
2013-07-31 17:56 ` Eric Dumazet
2013-07-31 19:01 ` David Miller
2013-08-01 1:57 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1374954582.3669.12.camel@edumazet-glaptop \
--to=eric.dumazet@gmail.com \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=jcliburn@gmail.com \
--cc=luis.henriques@canonical.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox