Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: asix: fix bad header length bug
From: Emil Goode @ 2014-02-06 15:02 UTC (permalink / raw)
  To: David Laight
  Cc: David S. Miller, Ming Lei, Mark Brown, Jeff Kirsher, Glen Turner,
	linux-usb@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6B98CA@AcuExch.aculab.com>

Hello David,

Thank's for the review.

On Thu, Feb 06, 2014 at 01:37:12PM +0000, David Laight wrote:
> From: Emil Goode
> > The AX88772B occasionally send rx packets that cross urb boundaries
> > and the remaining partial packet is sent with no header.
> > When the buffer with a partial packet is of less number of octets
> > than the value of hard_header_len the buffer is discarded by the
> > usbnet module. This is causing dropped packages and error messages
> > in dmesg.
> > 
> > This can be reproduced by using ping with a packet size
> > between 1965-1976.
> 
> I think this can affect other USB ethernet drivers.
> Probably most of the ones that explicitly set rx_urb_len.
> 
> The ax88179_178a driver sets massive 20k receive urb.

The ax88179_178a has it's own bind function, so I believe it's
not affected by this change.

> I've seen over 10k of data in a single urb, dunno if it
> can actually generate more than 20k - possibly if the usb3 link
> is loaded with other traffic.
> It would be much more efficient for it to use an aligned 4k urb
> and then merge the fragment into skbs.
> 
> Once you've set:
> +	dev->net->hard_header_len = 0; /* Partial packets have no header */
> try setting the mtu to a multiple of 1k.

I tried setting the mtu to 2000 and when using ping with a large enough
packet size to fill the urb to rx_urb_size all packages are dropped.

> There is a very odd check in usbnet_change_mtu() that tries to stop the
> receive urb_length being a multiple of the usb packet size.

This is very odd indeed!

>
> This code looks as though it is hoping that the usb controller will discard
> any full length bulk messages after finding a short buffer.
> I suspect that might be just wishful thinking!
> 
> 	David
> 
> 
> 

^ permalink raw reply

* RE: [PATCH] bnx2x: Update to FW 7.8.19
From: Yuval Mintz @ 2014-02-06 15:26 UTC (permalink / raw)
  To: dwmw2@infradead.org, ben@decadent.org.uk
  Cc: netdev@vger.kernel.org, Dmitry Kravkov, Ariel Elior
In-Reply-To: <1390921061-22019-1-git-send-email-yuvalmin@broadcom.com>

> This new firmware fixes several bugs:
>  1. HW attention appears and traffic stops when iSCSI firmware tries to
>     retransmit iSCSI login command when the iSCSI login is carrying data
>     not aligned to 4-bytes.
>  2. FCoE traffic fails to run when running in switch-independent multi-function
>     mode and there's more than one interface supporting FCoE on a given port.
>  3. While two ports are running FCoE with at least one of them has a function
>     number (>1) on the same engine in a 4-port device a zeroed CQE is given,
>     causing FCoE traffic to stop.

Hi, any news about this one?

Thanks,
Yuval

^ permalink raw reply

* RE: [PATCH] net: asix: fix bad header length bug
From: David Laight @ 2014-02-06 15:28 UTC (permalink / raw)
  To: 'Igor Gnatenko', Emil Goode
  Cc: David S. Miller, Ming Lei, Mark Brown, Jeff Kirsher, Glen Turner,
	linux-usb@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1391692768.2843.4.camel@X1Carbon.localdomain>

From: Igor Gnatenko
> On Thu, 2014-02-06 at 13:56 +0100, Emil Goode wrote:
> > The AX88772B occasionally send rx packets that cross urb boundaries
> > and the remaining partial packet is sent with no header.
> > When the buffer with a partial packet is of less number of octets
> > than the value of hard_header_len the buffer is discarded by the
> > usbnet module. This is causing dropped packages and error messages
> > in dmesg.
> >
> > This can be reproduced by using ping with a packet size
> > between 1965-1976.
> >
> > The bug has been reported here:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=29082
> >
> > Signed-off-by: Emil Goode <emilgoode@gmail.com>
> Reported-and-tested-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
> > ---
> >  drivers/net/usb/asix_devices.c |    1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
> > index 9765a7d..120bb29 100644
> > --- a/drivers/net/usb/asix_devices.c
> > +++ b/drivers/net/usb/asix_devices.c
> > @@ -455,6 +455,7 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
> >  	dev->net->ethtool_ops = &ax88772_ethtool_ops;
> >  	dev->net->needed_headroom = 4; /* cf asix_tx_fixup() */
> >  	dev->net->needed_tailroom = 4; /* cf asix_tx_fixup() */
> > +	dev->net->hard_header_len = 0; /* Partial packets have no header */

That is the wrong place for the fix.

It should only be done when rx_urb_size is set to a multiple of the usb
packet size.
That is only done for some of the supported devices.

In fact, if the rx_urb_size is a multiple of the usb frame size (or 1k)
then maybe the usbnet code should assume that the driver is capable
of processing ethernet frames that cross usb packet boundaries and
not delete short packets at all - regardless of the hard_header_len.

	David


^ permalink raw reply

* RE: AX88179_178A USB3 ethernet adapter performance issue
From: David Laight @ 2014-02-06 15:39 UTC (permalink / raw)
  To: 'Daniel J Blueman', Freddy Xin, linux-usb@vger.kernel.org,
	'Sarah Sharp', Greg Kroah-Hartman
  Cc: Netdev
In-Reply-To: <CAMVG2ssoBwk56n-k5VB9yO2LP4wc+gTv4o_17r-WYakwE6h2uQ@mail.gmail.com>

From: Daniel J Blueman
> Hi Freddy et al,

I've copied this to linux-usb.

> I'm experiencing poor network performance using an ASIX AX88179_178A
> USB3 to ethernet adapter using any recent linux kernel (eg 3.11),
> using an Intel XHCI USB3 controller.

There a several problems with the xhci driver that show up when
trying to use the ax88179_178a driver.
Unfortunately some of the fixes have caused regressions on other
host controllers with disk transfers.
The patches may, or may not, be present in your kernel.
The 'scatter-gather' support that is used in order to enable TSO
is particularly good at exercising the buggy code paths.

> Running iperf tests between one host with a gigabit PCIe NIC, via a
> gigabit switch to the other host with various interfaces:
> 
> PCIe bcm957762: send 818Mb/s, recv 910Mb/s
> USB2 smsc75xx: send 341Mb/s, recv 330Mb/s
> USB3 ax88179_178a: send 347Mb/s, recv 18.7Mb/s
> 
> Are you able to reproduce the same 19Mb/s receive rate there?

It might be that you are only actually running at USB2 speeds
(check with lsusb -t).

I have seem line rate Ge from my ax88179 card, but only with a
patched kernel.

	David

> 
> Many thanks,
>   Daniel
> --
> Daniel J Blueman
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2] ipv6: enable anycast addresses as source addresses in ICMPv6 error messages
From: Nicolas Dichtel @ 2014-02-06 16:29 UTC (permalink / raw)
  To: François-Xavier Le Bail, netdev@vger.kernel.org
  Cc: David Stevens, Bill Fink, Hannes Frederic Sowa, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki Yoshifuji,
	Patrick McHardy
In-Reply-To: <1391697041.6325.YahooMailNeo@web125505.mail.ne1.yahoo.com>

Le 06/02/2014 15:30, François-Xavier Le Bail a écrit :
>> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>
>> To: François-Xavier Le Bail <fx.lebail@yahoo.com>; "netdev@vger.kernel.org" <netdev@vger.kernel.org>
>> Cc: David Stevens <dlstevens@us.ibm.com>; Bill Fink <billfink@mindspring.com>; Hannes Frederic Sowa <hannes@stressinduktion.org>; David S. Miller <davem@davemloft.net>; Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>; James Morris <jmorris@namei.org>; Hideaki Yoshifuji <yoshfuji@linux-ipv6.org>; Patrick McHardy <kaber@trash.net>
>> Sent: Thursday, February 6, 2014 3:01 PM
>> Subject: Re: [PATCH net-next v2] ipv6: enable anycast addresses as source addresses in ICMPv6 error messages
>>
>> Le 06/02/2014 13:38, François-Xavier Le Bail a écrit :
>>>>   From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>>
>>>
>>>>   Subject: Re: [PATCH net-next v2] ipv6: enable anycast addresses as
>> source addresses in ICMPv6 error messages
>>>>
>>>>   Le 19/01/2014 17:00, Francois-Xavier Le Bail a écrit :
>>>>
>>>>>     - Uses ipv6_anycast_destination() in icmp6_send().
>>>>>
>>>>>     Suggested-by: Bill Fink <billfink@mindspring.com>
>>>>>     Signed-off-by: Francois-Xavier Le Bail
>> <fx.lebail@yahoo.com>
>>>>   This patch causes an Oops on my target.
>>>
>>>   What is your target ?
>> x86 32bits
>>
>>>
>>>>   Here is the step to reproduce it:
>>>>   modprobe sit
>>>>   ip link add sit1 type sit remote 10.16.0.121 local 10.16.0.249
>>>>   ip l s sit1 up
>>>>   ip -6 a a dev sit1 2001:1234::123 remote 2001:1234::121
>>>>   ping6 2001:1234::121
>>>
>>>   I cannot reproduce this in my target (updated net-next x86_64) and
>>>   iproute2 from git.
>> I use linus tree (3.14-rc1+).
>>
>>>   Can you send me your config file ?
>> See attachment.
>>
>>
>>>
>>>>   The problem is that ipv6_anycast_destination() uses unconditionally
>>>>   skb_dst(skb), which is NULL in this case.
>>>>
>>>>   Not sure what is the best way to fix this, any suggestions?
>>>
>>>   I will try to reproduce first and see.
>> Note that the peer was not set up, hence the ping didn't work.
>> ipip6_err() calls ipip6_err_gen_icmpv6_unreach() which will drop the dst
>> before calling icmpv6_send().
>>
>>
>> Here is the backtrace:
>> [  387.786155] BUG: unable to handle kernel NULL pointer dereference at 00000096
>> [  387.787291] IP: [<c12f1568>] icmp6_send+0x79/0x596
>
> [...]
>
>> [  387.790055]  [<f85ce03b>] ? tunnel64_err+0x16/0x25 [tunnel4]
>
> Thanks for these informations.
>
> Can you test an alternative replacing:
>
> test on: ipv6_anycast_destination(skb)
> by
> test on: ipv6_chk_acast_addr_src(net, skb->dev, &hdr->daddr)
Ok, I will do it tomorrow.

Thank you,
Nicolas

^ permalink raw reply

* [PATCH-net] drivers/net: fix build warning in ethernet/sfc/tx.c
From: Paul Gortmaker @ 2014-02-06 16:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers, Paul Gortmaker, Jon Cooper, Ben Hutchings

Commit ee45fd92c739db5b7950163d91dfe5f016af6d24 ("sfc: Use TX PIO
for sufficiently small packets") introduced the following warning:

drivers/net/ethernet/sfc/tx.c: In function 'efx_enqueue_skb':
drivers/net/ethernet/sfc/tx.c:432:1: warning: label 'finish_packet' defined but not used

Stick the label inside the same #ifdef that the code which calls
it uses.  Note that this is only seen for arch that do not set
ARCH_HAS_IOREMAP_WC, such as arm, mips, sparc, ..., as the others
enable the write combining code and hence use the label.

Cc: Jon Cooper <jcooper@solarflare.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 drivers/net/ethernet/sfc/tx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index c49d1fb16965..75d11fa4eb0a 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -429,7 +429,9 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	}
 
 	/* Transfer ownership of the skb to the final buffer */
+#ifdef EFX_USE_PIO
 finish_packet:
+#endif
 	buffer->skb = skb;
 	buffer->flags = EFX_TX_BUF_SKB | dma_flags;
 
-- 
1.8.5.2

^ permalink raw reply related

* Re: [PATCH v2 0/5] net: phy: Ethernet PHY powerdown optimization
From: Ezequiel Garcia @ 2014-02-06 16:57 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David Miller, f.fainelli, mugunthanvnm, netdev, linux-arm-kernel,
	linux-kernel, Andrew Lunn
In-Reply-To: <52F141AE.8010402@gmail.com>

Hi Sebastian,

On Tue, Feb 04, 2014 at 08:38:22PM +0100, Sebastian Hesselbarth wrote:
> On 12/17/2013 08:43 PM, David Miller wrote:
> > From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> > Date: Fri, 13 Dec 2013 10:20:24 +0100
> >
> >> This is v2 of the ethernet PHY power optimization patches to reduce
> >> power consumption of network PHYs with link that are either unused or
> >> the corresponding netdev is down.
> >>
> >> Compared to the last version, this patch set drops a patch to disable
> >> unused PHYs after late initcall, as it is not compatible with a modular
> >> mdio bus [1]. I'll investigate different ways to have a modular mdio bus
> >> driver get notified when driver loading is done.
> >>
> >> Again, a branch with v2 applied to v3.13-rc2 can also be found at
> >> https://github.com/shesselba/linux-dove.git topic/ethphy-power-v2
> >>
> >> [1] http://www.spinics.net/lists/arm-kernel/msg293028.html
> >
> > Series applied, thanks.
> >
[..]
> 
> as expected the above patches create a Linux to bootloader dependency
> that surfaces dumb bootloaders not initializing PHYs correctly.
> 
> Andrew has a Kirkwood based board that does not power-up and restart
> auto-negotiation on the powered down PHY after a warm restart. While
> this specific bootloader allows a soft-workaround by issuing the
> required PHY writes before accessing the interface, others may not.
> 

I'm also having this same issue on Kirkwood USI Topkick, running
v3.14-rc1.
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH-net] drivers/net: fix build warning in ethernet/sfc/tx.c
From: Shradha Shah @ 2014-02-06 16:59 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: netdev, linux-net-drivers, Jon Cooper, Ben Hutchings
In-Reply-To: <1391705112-27869-1-git-send-email-paul.gortmaker@windriver.com>

On 02/06/2014 04:45 PM, Paul Gortmaker wrote:
> Commit ee45fd92c739db5b7950163d91dfe5f016af6d24 ("sfc: Use TX PIO
> for sufficiently small packets") introduced the following warning:
> 
> drivers/net/ethernet/sfc/tx.c: In function 'efx_enqueue_skb':
> drivers/net/ethernet/sfc/tx.c:432:1: warning: label 'finish_packet' defined but not used
> 
> Stick the label inside the same #ifdef that the code which calls
> it uses.  Note that this is only seen for arch that do not set
> ARCH_HAS_IOREMAP_WC, such as arm, mips, sparc, ..., as the others
> enable the write combining code and hence use the label.
> 
> Cc: Jon Cooper <jcooper@solarflare.com>
> Cc: Ben Hutchings <bhutchings@solarflare.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Acked-by: Shradha Shah <sshah@solarflare.com>

> ---
>  drivers/net/ethernet/sfc/tx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
> index c49d1fb16965..75d11fa4eb0a 100644
> --- a/drivers/net/ethernet/sfc/tx.c
> +++ b/drivers/net/ethernet/sfc/tx.c
> @@ -429,7 +429,9 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
>  	}
>  
>  	/* Transfer ownership of the skb to the final buffer */
> +#ifdef EFX_USE_PIO
>  finish_packet:
> +#endif
>  	buffer->skb = skb;
>  	buffer->flags = EFX_TX_BUF_SKB | dma_flags;
>  
> 

^ permalink raw reply

* Re: AX88179_178A USB3 ethernet adapter performance issue
From: Sarah Sharp @ 2014-02-06 17:12 UTC (permalink / raw)
  To: David Laight
  Cc: 'Daniel J Blueman', Freddy Xin,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Greg Kroah-Hartman, Netdev
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6B9A21-VkEWCZq2GCInGFn1LkZF6NBPR1lH4CV8@public.gmane.org>

On Thu, Feb 06, 2014 at 03:39:02PM +0000, David Laight wrote:
> From: Daniel J Blueman
> > Hi Freddy et al,
> 
> I've copied this to linux-usb.
> 
> > I'm experiencing poor network performance using an ASIX AX88179_178A
> > USB3 to ethernet adapter using any recent linux kernel (eg 3.11),
> > using an Intel XHCI USB3 controller.
> 
> There a several problems with the xhci driver that show up when
> trying to use the ax88179_178a driver.
> Unfortunately some of the fixes have caused regressions on other
> host controllers with disk transfers.
> The patches may, or may not, be present in your kernel.
> The 'scatter-gather' support that is used in order to enable TSO
> is particularly good at exercising the buggy code paths.

Scatter gather was not added until 3.12.  That does not explain the
issues with a 3.11 kernel.  (Unless we're hitting the 64-KB boundary
corner case a lot in 3.11.)

> > Running iperf tests between one host with a gigabit PCIe NIC, via a
> > gigabit switch to the other host with various interfaces:
> > 
> > PCIe bcm957762: send 818Mb/s, recv 910Mb/s
> > USB2 smsc75xx: send 341Mb/s, recv 330Mb/s
> > USB3 ax88179_178a: send 347Mb/s, recv 18.7Mb/s
> > 
> > Are you able to reproduce the same 19Mb/s receive rate there?

Which kernel are these numbers for?

Any chance you can test this adapter on Windows?  I would be interested
to see whether the send and recv numbers are also asymmetrical there.

> It might be that you are only actually running at USB2 speeds
> (check with lsusb -t).
> 
> I have seem line rate Ge from my ax88179 card, but only with a
> patched kernel.

David, did you mean you have the same line rate as Daniel?

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: AX88179_178A USB3 ethernet adapter performance issue
From: David Laight @ 2014-02-06 17:17 UTC (permalink / raw)
  To: 'Sarah Sharp'
  Cc: 'Daniel J Blueman', Freddy Xin, linux-usb@vger.kernel.org,
	Greg Kroah-Hartman, Netdev
In-Reply-To: <20140206171239.GB16792@xanatos>

From: Sarah Sharp
> > I have seem line rate Ge from my ax88179 card, but only with a
> > patched kernel.
> 
> David, did you mean you have the same line rate as Daniel?

No I meant I've seen it saturate a Ge link (with sufficiently large frames).

With very small frames the tx rate gets limited to a nice round number.
Possibly because the usb message rate limit.
It is high enough that it really doesn't matter, the code could put multiple
frames in a single URB, but I'm not sure the NAPI processing loop makes that
easy.

The receive side will run at a higher packet rate.

	David

^ permalink raw reply

* [PATCH net] netpoll: fix netconsole IPv6 setup
From: Sabrina Dubroca @ 2014-02-06 17:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, Sabrina Dubroca

Currently, to make netconsole start over IPv6, the source address
needs to be specified. Without a source address, netpoll_parse_options
assumes we're setting up over IPv4 and the destination IPv6 address is
rejected.

Check if the IP version has been forced by a source address before
checking for a version mismatch when parsing the destination address.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 net/core/netpoll.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c03f3de..a664f78 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -948,6 +948,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 {
 	char *cur=opt, *delim;
 	int ipv6;
+	bool ipversion_set = false;
 
 	if (*cur != '@') {
 		if ((delim = strchr(cur, '@')) == NULL)
@@ -960,6 +961,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 	cur++;
 
 	if (*cur != '/') {
+		ipversion_set = true;
 		if ((delim = strchr(cur, '/')) == NULL)
 			goto parse_failed;
 		*delim = 0;
@@ -1002,7 +1004,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 	ipv6 = netpoll_parse_ip_addr(cur, &np->remote_ip);
 	if (ipv6 < 0)
 		goto parse_failed;
-	else if (np->ipv6 != (bool)ipv6)
+	else if (ipversion_set && np->ipv6 != (bool)ipv6)
 		goto parse_failed;
 	else
 		np->ipv6 = (bool)ipv6;
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH] net: use __GFP_NORETRY for high order allocations
From: Eric Dumazet @ 2014-02-06 18:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, David Rientjes, linux-kernel@vger.kernel.org

From: Eric Dumazet <edumazet@google.com>

sock_alloc_send_pskb() & sk_page_frag_refill()
have a loop trying high order allocations to prepare
skb with low number of fragments as this increases performance.

Problem is that under memory pressure/fragmentation, this can
trigger OOM while the intent was only to try the high order
allocations, then fallback to order-0 allocations.

We had various reports from unexpected regressions.

According to David, setting __GFP_NORETRY should be fine,
as the asynchronous compaction is still enabled, and this
will prevent OOM from kicking as in :

CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0,
oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled
CFSClientEventm 

Call Trace:
 [<ffffffff8043766c>] dump_header+0xe1/0x23e
 [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
 [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
 [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
 [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
 [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
 [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
 [<ffffffff80295fa0>] tcp_sendmsg+0x560/0xee0
 [<ffffffff802a5037>] inet_sendmsg+0x67/0x100
 [<ffffffff80283c9c>] __sock_sendmsg_nosec+0x6c/0x90
 [<ffffffff80283e85>] sock_sendmsg+0xc5/0xf0
 [<ffffffff802847b6>] __sys_sendmsg+0x136/0x430
 [<ffffffff80284ec8>] sys_sendmsg+0x88/0x110
 [<ffffffff80711472>] system_call_fastpath+0x16/0x1b
Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Rientjes <rientjes@google.com>
---
 net/core/sock.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 0c127dcdf6a8..5b6a9431b017 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 			while (order) {
 				if (npages >= 1 << order) {
 					page = alloc_pages(sk->sk_allocation |
-							   __GFP_COMP | __GFP_NOWARN,
+							   __GFP_COMP |
+							   __GFP_NOWARN |
+							   __GFP_NORETRY,
 							   order);
 					if (page)
 						goto fill_page;
@@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
 		gfp_t gfp = prio;
 
 		if (order)
-			gfp |= __GFP_COMP | __GFP_NOWARN;
+			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
 		pfrag->page = alloc_pages(gfp, order);
 		if (likely(pfrag->page)) {
 			pfrag->offset = 0;

^ permalink raw reply related

* Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe
From: Dan Williams @ 2014-02-06 19:12 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Konrad Rzeszutek Wilk, Wei Liu, Francois Romieu,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <932118392.20140206152722@eikelenboom.it>

[-- Attachment #1: Type: text/plain, Size: 667 bytes --]

On Thu, Feb 6, 2014 at 6:27 AM, Sander Eikelenboom <linux@eikelenboom.it> wrote:
>>>> Not using it seems to prevent the warning, but before 3.14 i have never seen this (with r8169.use_dac=1)
>
>> If you are still hitting this with the patch:
>
>>   59f2e7df574c dma-debug: fix overlap detection
>
>> ...then I'm more inclined to think it is an actual positive report.
>
>> If you don't mind I'll send some debug patches to narrow this down.
>
> Please do .. sounds better than bisecting :-)
>

Hi, attached is a patch that should give some insight whether the
driver is triggering many overlapping mappings.  Try it on top of
3.14-rc1.

Thank you for the debug help!

[-- Attachment #2: debug-overlap --]
[-- Type: application/octet-stream, Size: 1965 bytes --]

debug overlap overflow

From: Dan Williams <dan.j.williams@intel.com>


---
 drivers/net/ethernet/realtek/r8169.c |    6 ++++++
 lib/dma-debug.c                      |    7 ++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 91a67ae8f17b..3bb2c1e000be 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5809,6 +5809,9 @@ static void rtl8169_unmap_tx_skb(struct device *d, struct ring_info *tx_skb,
 {
 	unsigned int len = tx_skb->len;
 
+	trace_printk("%s %s: unmap addr: %#llx len: %u\n",
+		     dev_driver_string(d), dev_name(d),
+		     (unsigned long long) le64_to_cpu(desc->addr), len);
 	dma_unmap_single(d, le64_to_cpu(desc->addr), len, DMA_TO_DEVICE);
 
 	desc->opts1 = 0x00;
@@ -5999,6 +6002,9 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	len = skb_headlen(skb);
 	mapping = dma_map_single(d, skb->data, len, DMA_TO_DEVICE);
+	trace_printk("%s %s: map addr: %p dma: %#llx len: %d\n",
+		     dev_driver_string(d), dev_name(d),
+		     skb->data, (unsigned long long) mapping, len);
 	if (unlikely(dma_mapping_error(d, mapping))) {
 		if (net_ratelimit())
 			netif_err(tp, drv, dev, "Failed to map TX DMA!\n");
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 2defd1308b04..7b22c8f5928e 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -486,9 +486,10 @@ static void active_pfn_inc_overlap(unsigned long pfn)
 	 * debug_dma_assert_idle() as the pfn may be marked idle
 	 * prematurely.
 	 */
-	WARN_ONCE(overlap > ACTIVE_PFN_MAX_OVERLAP,
-		  "DMA-API: exceeded %d overlapping mappings of pfn %lx\n",
-		  ACTIVE_PFN_MAX_OVERLAP, pfn);
+	if (WARN_ONCE(overlap > ACTIVE_PFN_MAX_OVERLAP,
+		      "DMA-API: exceeded %d overlapping mappings of pfn %lx\n",
+		      ACTIVE_PFN_MAX_OVERLAP, pfn))
+		ftrace_dump(DUMP_ALL);
 }
 
 static int active_pfn_dec_overlap(unsigned long pfn)

^ permalink raw reply related

* [RFC PATCH] udp4: Don't take socket reference in receive path
From: Tom Herbert @ 2014-02-06 19:58 UTC (permalink / raw)
  To: davem, netdev, edumazet

The reference counting in the UDP receive path is quite expensive for
a socket that is share amoungst CPUs. This is probably true for normal
sockets, but really is painful when just using the socket for
receive encapsulation.

udp4_lib_lookup always takes a socket reference, and we also put back
the reference after calling udp_queue_rcv_skb in the normal receive
path, so the need for taking the reference seems to be to hold the
socket after doing rcu_read_unlock. This patch modifies udp_lib_lookup
to optionally take a reference and is always called with rcu_read_lock.
In udp4_lib_rcv we call lib_lookup and udp_queue_rcv under the
rcu_read_lock but without having taken the reference.

Requesting comments because I suspect there are nuances to this!

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/udp.c | 90 ++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 62 insertions(+), 28 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 48d8cb2..6043a2f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -424,7 +424,8 @@ static unsigned int udp_ehashfn(struct net *net, const __be32 laddr,
 static struct sock *udp4_lib_lookup2(struct net *net,
 		__be32 saddr, __be16 sport,
 		__be32 daddr, unsigned int hnum, int dif,
-		struct udp_hslot *hslot2, unsigned int slot2)
+		struct udp_hslot *hslot2, unsigned int slot2,
+		bool get_ref)
 {
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
@@ -461,12 +462,20 @@ begin:
 	if (get_nulls_value(node) != slot2)
 		goto begin;
 	if (result) {
-		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
-			result = NULL;
-		else if (unlikely(compute_score2(result, net, saddr, sport,
-				  daddr, hnum, dif) < badness)) {
-			sock_put(result);
-			goto begin;
+		if (get_ref) {
+			if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+				result = NULL;
+			else if (unlikely(compute_score2(result, net, saddr, sport,
+-                                 daddr, hnum, dif) < badness)) {
+				sock_put(result);
+				goto begin;
+			}
+		} else {
+			if (unlikely(atomic_read(&result->sk_refcnt) == 0))
+				result = NULL;
+			else if (unlikely(compute_score2(result, net, saddr, sport,
+-                                 daddr, hnum, dif) < badness))
+				goto begin;
 		}
 	}
 	return result;
@@ -475,9 +484,10 @@ begin:
 /* UDP is nearly always wildcards out the wazoo, it makes no sense to try
  * harder than this. -DaveM
  */
-struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
+/* called with read_rcu_lock() */
+struct sock *___udp4_lib_lookup(struct net *net, __be32 saddr,
 		__be16 sport, __be32 daddr, __be16 dport,
-		int dif, struct udp_table *udptable)
+		int dif, struct udp_table *udptable, bool get_ref)
 {
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
@@ -487,7 +497,6 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	int score, badness, matches = 0, reuseport = 0;
 	u32 hash = 0;
 
-	rcu_read_lock();
 	if (hslot->count > 10) {
 		hash2 = udp4_portaddr_hash(net, daddr, hnum);
 		slot2 = hash2 & udptable->mask;
@@ -497,7 +506,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 
 		result = udp4_lib_lookup2(net, saddr, sport,
 					  daddr, hnum, dif,
-					  hslot2, slot2);
+					  hslot2, slot2, get_ref);
 		if (!result) {
 			hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
 			slot2 = hash2 & udptable->mask;
@@ -507,9 +516,8 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 
 			result = udp4_lib_lookup2(net, saddr, sport,
 						  htonl(INADDR_ANY), hnum, dif,
-						  hslot2, slot2);
+						  hslot2, slot2, get_ref);
 		}
-		rcu_read_unlock();
 		return result;
 	}
 begin:
@@ -543,28 +551,50 @@ begin:
 		goto begin;
 
 	if (result) {
-		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
-			result = NULL;
-		else if (unlikely(compute_score(result, net, saddr, hnum, sport,
-				  daddr, dport, dif) < badness)) {
-			sock_put(result);
-			goto begin;
+		if (get_ref) {
+			if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+				result = NULL;
+			else if (unlikely(compute_score(result, net, saddr, hnum, sport,
+					  daddr, dport, dif) < badness)) {
+				sock_put(result);
+				goto begin;
+			}
+		} else {
+			if (unlikely(atomic_read(&result->sk_refcnt) == 0))
+				result = NULL;
+			else if (unlikely(compute_score(result, net, saddr, hnum, sport,
+					  daddr, dport, dif) < badness))
+				goto begin;
 		}
 	}
+	return result;
+}
+
+struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
+		__be16 sport, __be32 daddr, __be16 dport,
+		int dif, struct udp_table *udptable)
+{
+	struct sock *result;
+
+	rcu_read_lock();
+	result = ___udp4_lib_lookup(net, saddr, sport, daddr, dport, dif, udptable, true);
 	rcu_read_unlock();
+
 	return result;
 }
+
 EXPORT_SYMBOL_GPL(__udp4_lib_lookup);
 
+/* called with read_rcu_lock() */
 static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
 						 __be16 sport, __be16 dport,
 						 struct udp_table *udptable)
 {
 	const struct iphdr *iph = ip_hdr(skb);
 
-	return __udp4_lib_lookup(dev_net(skb_dst(skb)->dev), iph->saddr, sport,
+	return ___udp4_lib_lookup(dev_net(skb_dst(skb)->dev), iph->saddr, sport,
 				 iph->daddr, dport, inet_iif(skb),
-				 udptable);
+				 udptable, false);
 }
 
 struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
@@ -1755,19 +1785,21 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		if (ret > 0)
 			return -ret;
 		return 0;
-	} else {
-		if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
-			return __udp4_lib_mcast_deliver(net, skb, uh,
-					saddr, daddr, udptable);
-
-		sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	}
 
+
+	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
+		return __udp4_lib_mcast_deliver(net, skb, uh,
+				saddr, daddr, udptable);
+
+	rcu_read_lock();
+	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
+
 	if (sk != NULL) {
 		int ret;
 
 		ret = udp_queue_rcv_skb(sk, skb);
-		sock_put(sk);
+		rcu_read_unlock();
 
 		/* a return value > 0 means to resubmit the input, but
 		 * it wants the return to be -protocol, or 0
@@ -1777,6 +1809,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		return 0;
 	}
 
+	rcu_read_unlock();
+
 	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 		goto drop;
 	nf_reset(skb);
-- 
1.9.0.rc1.175.g0b1dcb5

^ permalink raw reply related

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: Joe Perches @ 2014-02-06 20:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, David Rientjes,
	linux-kernel@vger.kernel.org
In-Reply-To: <1391712162.10160.8.camel@edumazet-glaptop2.roam.corp.google.com>

On Thu, 2014-02-06 at 10:42 -0800, Eric Dumazet wrote:
> sock_alloc_send_pskb() & sk_page_frag_refill()
> have a loop trying high order allocations to prepare
> skb with low number of fragments as this increases performance.
> 
> Problem is that under memory pressure/fragmentation, this can
> trigger OOM while the intent was only to try the high order
> allocations, then fallback to order-0 allocations.
[]
> Call Trace:
>  [<ffffffff8043766c>] dump_header+0xe1/0x23e
>  [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
>  [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
>  [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
>  [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
>  [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
>  [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
[]
> diff --git a/net/core/sock.c b/net/core/sock.c
[]
> @@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
>  			while (order) {
>  				if (npages >= 1 << order) {
>  					page = alloc_pages(sk->sk_allocation |
> -							   __GFP_COMP | __GFP_NOWARN,
> +							   __GFP_COMP |
> +							   __GFP_NOWARN |
> +							   __GFP_NORETRY,
>  							   order);
>  					if (page)
>  						goto fill_page;
> @@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
>  		gfp_t gfp = prio;
>  
>  		if (order)
> -			gfp |= __GFP_COMP | __GFP_NOWARN;
> +			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;

Perhaps add __GFP_THISNODE too ?

^ permalink raw reply

* Re: [PATCH net] netpoll: fix netconsole IPv6 setup
From: Cong Wang @ 2014-02-06 20:34 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: David Miller, netdev
In-Reply-To: <1391708052-11188-1-git-send-email-sd@queasysnail.net>

On Thu, Feb 6, 2014 at 9:34 AM, Sabrina Dubroca <sd@queasysnail.net> wrote:
>  net/core/netpoll.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index c03f3de..a664f78 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -948,6 +948,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
>  {
>         char *cur=opt, *delim;
>         int ipv6;
> +       bool ipversion_set = false;
>

Or initialize 'ipv6' to -1 and then check if it is -1?

^ permalink raw reply

* Re: [PATCH 1/2] rtlwifi: rtl8192ce: Fix too long disable of IRQs
From: Larry Finger @ 2014-02-06 20:38 UTC (permalink / raw)
  To: Olivier Langlois, linville, chaoming_li
  Cc: linux-wireless, netdev, linux-kernel, Stable
In-Reply-To: <1391235070-23180-1-git-send-email-olivier@trillion01.com>

On 02/01/2014 12:11 AM, Olivier Langlois wrote:
> rtl8192ce is disabling for too long the local interrupts during hw initiatialisation when performing scans
>
> The observable symptoms in dmesg can be:
>
> - underruns from ALSA playback
> - clock freezes (tstamps do not change for several dmesg entries until irqs are finaly reenabled):
>
> [  250.817669] rtlwifi:rtl_op_config():<0-0-0> 0x100
> [  250.817685] rtl8192ce:_rtl92ce_phy_set_rf_power_state():<0-1-0> IPS Set eRf nic enable
> [  250.817732] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.817796] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.817910] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818024] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818139] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818253] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818367] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
> [  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:98053f15:10
> [  250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
> [  250.818472] rtl8192c_common:rtl92c_download_fw():<0-1-0> Firmware Version(49), Signature(0x88c1),Size(32)
> [  250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> PairwiseEncAlgorithm = 0 GroupEncAlgorithm = 0
> [  250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> The SECR-value cc
> [  250.818472] rtl8192c_common:rtl92c_dm_check_txpower_tracking_thermal_meter():<0-1-0> Schedule TxPowerTracking direct call!!
> [  250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> rtl92c_dm_txpower_tracking_callback_thermalmeter
> [  250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf
> [  250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Initial pathA ele_d reg0xc80 = 0x40000000, ofdm_index=0xc
> [  250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Initial reg0xa24 = 0x90e1317, cck_index=0xc, ch14 0
> [  250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf delta 0x1 delta_lck 0x0 delta_iqk 0x0
> [  250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> <===
> [  250.818472] rtl8192c_common:rtl92c_dm_initialize_txpower_tracking_thermalmeter():<0-1-0> pMgntInfo->txpower_tracking = 1
> [  250.818472] rtl8192ce:rtl92ce_led_control():<0-1-0> ledaction 3
> [  250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
> [  250.818472] rtlwifi:rtl_ips_nic_on():<0-1-0> before spin_unlock_irqrestore
> [  251.154656] PCM: Lost interrupts? [Q]-0 (stream=0, delta=15903, new_hw_ptr=293408, old_hw_ptr=277505)
>
> The exact code flow that causes that is:
>
> 1. wpa_supplicant send a start_scan request to the nl80211 driver
> 2. mac80211 module call rtl_op_config with IEEE80211_CONF_CHANGE_IDLE
> 3.   rtl_ips_nic_on is called which disable local irqs
> 4.     rtl92c_phy_set_rf_power_state() is called
> 5.       rtl_ps_enable_nic() is called and hw_init()is executed and then the interrupts on the device are enabled
>
> A good solution could be to refactor the code to avoid calling rtl92ce_hw_init() with the irqs disabled
> but a quick and dirty solution that has proven to work is
> to reenable the irqs during the function rtl92ce_hw_init().
>
> I think that it is safe doing so since the device interrupt will only be enabled after the init function succeed.
>
> Signed-off-by: Olivier Langlois <olivier@trillion01.com>
> Cc: Stable <stable@vger.kernel.org>

Acked-by: Larry Finger <Larry.Finger@lwfinger.net>

Larry

> ---
>   drivers/net/wireless/rtlwifi/rtl8192ce/hw.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
> index a82b30a..2eb0b38 100644
> --- a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
> +++ b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
> @@ -937,14 +937,26 @@ int rtl92ce_hw_init(struct ieee80211_hw *hw)
>   	bool is92c;
>   	int err;
>   	u8 tmp_u1b;
> +	unsigned long flags;
>
>   	rtlpci->being_init_adapter = true;
> +
> +	/* Since this function can take a very long time (up to 350 ms)
> +	 * and can be called with irqs disabled, reenable the irqs
> +	 * to let the other devices continue being serviced.
> +	 *
> +	 * It is safe doing so since our own interrupts will only be enabled
> +	 * in a subsequent step.
> +	 */
> +	local_save_flags(flags);
> +	local_irq_enable();
> +
>   	rtlpriv->intf_ops->disable_aspm(hw);
>   	rtstatus = _rtl92ce_init_mac(hw);
>   	if (!rtstatus) {
>   		RT_TRACE(rtlpriv, COMP_ERR, DBG_EMERG, "Init MAC failed\n");
>   		err = 1;
> -		return err;
> +		goto exit;
>   	}
>
>   	err = rtl92c_download_fw(hw);
> @@ -952,7 +964,7 @@ int rtl92ce_hw_init(struct ieee80211_hw *hw)
>   		RT_TRACE(rtlpriv, COMP_ERR, DBG_WARNING,
>   			 "Failed to download FW. Init HW without FW now..\n");
>   		err = 1;
> -		return err;
> +		goto exit;
>   	}
>
>   	rtlhal->last_hmeboxnum = 0;
> @@ -1032,6 +1044,8 @@ int rtl92ce_hw_init(struct ieee80211_hw *hw)
>   		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE, "under 1.5V\n");
>   	}
>   	rtl92c_dm_init(hw);
> +exit:
> +	local_irq_restore(flags);
>   	rtlpci->being_init_adapter = false;
>   	return err;
>   }
>

^ permalink raw reply

* Re: [PATCH 2/2] rtlwifi: Fix incorrect return from rtl_ps_enable_nic()
From: Larry Finger @ 2014-02-06 20:39 UTC (permalink / raw)
  To: Olivier Langlois, linville, chaoming_li
  Cc: linux-wireless, netdev, linux-kernel, Stable
In-Reply-To: <1391235070-23180-2-git-send-email-olivier@trillion01.com>

On 02/01/2014 12:11 AM, Olivier Langlois wrote:
> rtl_ps_enable_nic() is called from loops that will loop until this function returns true or a
> maximum number of retries is performed.
>
> hw_init() returns non-zero on error. In that situation return false to
> restore the original design intent to retry hw init when it fails.
>
> Signed-off-by: Olivier Langlois <olivier@trillion01.com>
> Cc: Stable <stable@vger.kernel.org>

Acked-by: Larry Finger <Larry.Finger.net>

Larry

> ---
>   drivers/net/wireless/rtlwifi/ps.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/rtlwifi/ps.c b/drivers/net/wireless/rtlwifi/ps.c
> index 0d81f76..a56e9b3 100644
> --- a/drivers/net/wireless/rtlwifi/ps.c
> +++ b/drivers/net/wireless/rtlwifi/ps.c
> @@ -48,7 +48,7 @@ bool rtl_ps_enable_nic(struct ieee80211_hw *hw)
>
>   	/*<2> Enable Adapter */
>   	if (rtlpriv->cfg->ops->hw_init(hw))
> -		return 1;
> +		return false;
>   	RT_CLEAR_PS_LEVEL(ppsc, RT_RF_OFF_LEVL_HALT_NIC);
>
>   	/*<3> Enable Interrupt */
>

^ permalink raw reply

* Re: [RFC PATCH] udp4: Don't take socket reference in receive path
From: Eric Dumazet @ 2014-02-06 20:58 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, edumazet
In-Reply-To: <alpine.DEB.2.02.1402061154580.17707@tomh.mtv.corp.google.com>

On Thu, 2014-02-06 at 11:58 -0800, Tom Herbert wrote:
> The reference counting in the UDP receive path is quite expensive for
> a socket that is share amoungst CPUs. This is probably true for normal
> sockets, but really is painful when just using the socket for
> receive encapsulation.
> 
> udp4_lib_lookup always takes a socket reference, and we also put back
> the reference after calling udp_queue_rcv_skb in the normal receive
> path, so the need for taking the reference seems to be to hold the
> socket after doing rcu_read_unlock. This patch modifies udp_lib_lookup
> to optionally take a reference and is always called with rcu_read_lock.
> In udp4_lib_rcv we call lib_lookup and udp_queue_rcv under the
> rcu_read_lock but without having taken the reference.
> 
> Requesting comments because I suspect there are nuances to this!
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---

Unfortunately this cant work.

When I did the RCU implementation for TCP/UDP, we chose to use
SLAB_DESTROY_BY_RCU.

This meant we have to take a reference, then check again the keys for
the lookup.

If we remove SLAB_DESTROY_BY_RCU, we kill performance for short lived
sessions, because of call_rcu() added latencies.

(One UDP socket is about 1024 bytes in memory, call_rcu() grace period
is throwing away 1024 bytes from cpu caches)

Sure, in your case you know your udp sessions are not short lived,
but many applications used UDP for DNS lookups, using few packets per
socket.

^ permalink raw reply

* Re: [PATCH net] netpoll: fix netconsole IPv6 setup
From: Sabrina Dubroca @ 2014-02-06 20:58 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev
In-Reply-To: <CAHA+R7OQYZ5RLn3v0nucKB=CVpDCZqp=HyNL=qgC9bViVKEuAQ@mail.gmail.com>

2014-02-06, 12:34:10 -0800, Cong Wang wrote:
> On Thu, Feb 6, 2014 at 9:34 AM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> >  net/core/netpoll.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> > index c03f3de..a664f78 100644
> > --- a/net/core/netpoll.c
> > +++ b/net/core/netpoll.c
> > @@ -948,6 +948,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
> >  {
> >         char *cur=opt, *delim;
> >         int ipv6;
> > +       bool ipversion_set = false;
> >
> 
> Or initialize 'ipv6' to -1 and then check if it is -1?

It's overwritten when we parse the remote address. And np->ipv6 is a
bool, so we can't store it there either.

-- 
Sabrina

^ permalink raw reply

* RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Thomas Glanzmann @ 2014-02-06 20:51 UTC (permalink / raw)
  To: Eric Dumazet, netdev, fubar, vfalico, andy, jiri

Hello,
this morning I checked out Linus tip and compiled it after booting my
dmesg is full of:

[    8.944991] RTNL: assertion failed at net/core/dev.c (4494)
[    8.950640] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
[    8.950642] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
[    8.950654] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
[    8.950658]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881020c88000
[    8.950664]  ffffffff812d3091 ffff881023961040 ffffffff812e3132 0000000000000246
[    8.950670]  0000000000000020 ffff881020ab1be8 0000000020ab1ba8 0000000000000000
[    8.950675] Call Trace:
[    8.950686]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
[    8.950694]  [<ffffffff812d3091>] ? netdev_master_upper_dev_get+0x2a/0x4d
[    8.950699]  [<ffffffff812e3132>] ? rtnl_fill_ifinfo+0x2c/0xac4
[    8.950707]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
[    8.950715]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
[    8.950721]  [<ffffffff81102040>] ? ksize+0x12/0x1e
[    8.950726]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
[    8.950731]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
[    8.950739]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
[    8.950747]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
[    8.950754]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]
[    8.950761]  [<ffffffff8104db2d>] ? process_one_work+0x191/0x293
[    8.950766]  [<ffffffff8104dfde>] ? worker_thread+0x121/0x1e7
[    8.950770]  [<ffffffff8104debd>] ? rescuer_thread+0x269/0x269
[    8.950777]  [<ffffffff810527b6>] ? kthread+0x99/0xa1
[    8.950782]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
[    8.950789]  [<ffffffff8139733c>] ? ret_from_fork+0x7c/0xb0
[    8.950794]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
[    8.950797] RTNL: assertion failed at net/core/rtnetlink.c (940)
[    8.956863] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
[    8.956871] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
[    8.956877] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
[    8.956879]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881023961040
[    8.956884]  ffffffff812e315e 0000000000000246 0000000000000020 ffff881020ab1be8
[    8.956890]  0000000020ab1ba8 0000000000000000 ffffffff817ef530 0000000000000008
[    8.956895] Call Trace:
[    8.956899]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
[    8.956903]  [<ffffffff812e315e>] ? rtnl_fill_ifinfo+0x58/0xac4
[    8.956909]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
[    8.956915]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
[    8.956920]  [<ffffffff81102040>] ? ksize+0x12/0x1e
[    8.956925]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
[    8.956929]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
[    8.956936]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
[    8.956942]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
[    8.956949]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]
[    8.956954]  [<ffffffff8104db2d>] ? process_one_work+0x191/0x293
[    8.956958]  [<ffffffff8104dfde>] ? worker_thread+0x121/0x1e7
[    8.956962]  [<ffffffff8104debd>] ? rescuer_thread+0x269/0x269
[    8.956968]  [<ffffffff810527b6>] ? kthread+0x99/0xa1
[    8.956973]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
[    8.956978]  [<ffffffff8139733c>] ? ret_from_fork+0x7c/0xb0
[    8.956983]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59

Full dmesg is at: http://pbot.rmdir.de/dJZsX0d71RFPaZAAjcTBWQ

I'm running Debian Wheezy and my Interface Konfiguration is:

auto lo
iface lo inet loopback

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond0.102
iface bond0.102 inet static
       address 10.102.99.4
       netmask 255.255.0.0

auto bond0.103
iface bond0.103 inet static
       address 10.103.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

auto bond1.102
iface bond1.102 inet static
       address 10.102.99.5
       netmask 255.255.0.0

auto bond1.103
iface bond1.103 inet static
       address 10.103.99.5
       netmask 255.255.0.0

My IPv4 interface Konfiguration is:

(node-62) [~/work/linux-2.6] ip -4 a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.100.4.62/16 brd 10.100.255.255 scope global bond0
       valid_lft forever preferred_lft forever
7: bond0.101@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.101.99.4/16 brd 10.101.255.255 scope global bond0.101
       valid_lft forever preferred_lft forever
8: bond0.102@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.102.99.4/16 brd 10.102.255.255 scope global bond0.102
       valid_lft forever preferred_lft forever
9: bond0.103@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.103.99.4/16 brd 10.103.255.255 scope global bond0.103
       valid_lft forever preferred_lft forever
10: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.100.5.62/16 brd 10.100.255.255 scope global bond1
       valid_lft forever preferred_lft forever
11: bond1.101@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.101.99.5/16 brd 10.101.255.255 scope global bond1.101
       valid_lft forever preferred_lft forever
12: bond1.102@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.102.99.5/16 brd 10.102.255.255 scope global bond1.102
       valid_lft forever preferred_lft forever
13: bond1.103@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.103.99.5/16 brd 10.103.255.255 scope global bond1.103
       valid_lft forever preferred_lft forever

I also have IPv6 with link-local and global addresses configured.

Kernel Config is at: http://pbot.rmdir.de/VTAnhVv8ECP7a7SPaxMsFA

I'm available for testing fixes or providing ssh access to the system if you
want to do further tests. I have the same config running on 3.13.0 on the same
machine without any problems whatsoever. If I should bisect it, let me know.

Cheers,
        Thomas

^ permalink raw reply

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: Eric Dumazet @ 2014-02-06 21:00 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Miller, netdev, David Rientjes,
	linux-kernel@vger.kernel.org
In-Reply-To: <1391718270.15777.20.camel@joe-AO722>

On Thu, 2014-02-06 at 12:24 -0800, Joe Perches wrote:

> Perhaps add __GFP_THISNODE too ?

Why ?

^ permalink raw reply

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: David Rientjes @ 2014-02-06 21:03 UTC (permalink / raw)
  To: Joe Perches
  Cc: Eric Dumazet, David Miller, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <1391718270.15777.20.camel@joe-AO722>

On Thu, 6 Feb 2014, Joe Perches wrote:

> On Thu, 2014-02-06 at 10:42 -0800, Eric Dumazet wrote:
> > sock_alloc_send_pskb() & sk_page_frag_refill()
> > have a loop trying high order allocations to prepare
> > skb with low number of fragments as this increases performance.
> > 
> > Problem is that under memory pressure/fragmentation, this can
> > trigger OOM while the intent was only to try the high order
> > allocations, then fallback to order-0 allocations.
> []
> > Call Trace:
> >  [<ffffffff8043766c>] dump_header+0xe1/0x23e
> >  [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
> >  [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
> >  [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
> >  [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
> >  [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
> >  [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
> []
> > diff --git a/net/core/sock.c b/net/core/sock.c
> []
> > @@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> >  			while (order) {
> >  				if (npages >= 1 << order) {
> >  					page = alloc_pages(sk->sk_allocation |
> > -							   __GFP_COMP | __GFP_NOWARN,
> > +							   __GFP_COMP |
> > +							   __GFP_NOWARN |
> > +							   __GFP_NORETRY,
> >  							   order);
> >  					if (page)
> >  						goto fill_page;
> > @@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
> >  		gfp_t gfp = prio;
> >  
> >  		if (order)
> > -			gfp |= __GFP_COMP | __GFP_NOWARN;
> > +			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
> 
> Perhaps add __GFP_THISNODE too ?
> 

How does __GFP_THISNODE have anything to do with avoiding oom killing due 
to high-order fragmentation?  If they absolutely require local memory to 
currnet's cpu node then that would make sense, but the fallback still 
allocates order-0 memory remotely and with __GFP_THISNODE on this attempt 
we wouldn't even attempt remote reclaim.

^ permalink raw reply

* Re: [PATCH] inet: defines IPPROTO_* needed for module alias generation
From: Carlos O'Donell @ 2014-02-06 21:33 UTC (permalink / raw)
  To: Jan Moskyto Matejka, David S. Miller; +Cc: netdev, linux-kernel
In-Reply-To: <1391685000-7346-1-git-send-email-mq@suse.cz>

On 02/06/2014 06:10 AM, Jan Moskyto Matejka wrote:
> Commit cfd280c91253 ("net: sync some IP headers with glibc") changed a set of
> define's to an enum (with no explanation why) which introduced a bug
> in module mip6 where aliases are generated using the IPPROTO_* defines;
> mip6 doesn't load if require_module called with the aliases from
> xfrm_get_type().

I wrote that code and I apologize for not giving a reason at
the time.

There are two reasons:

* It makes the debuginfo better and debugging easier via the enum.

* It harmonizes those headers with what is already in glibc.

Harmonizing this header with glibc makes it easier for userspace
to synchronize changes and perhaps eventually use the UAPI headers
directly.

> Reverting this change back to define's to fix the aliases.
> 
> modinfo mip6 (before this change)
> alias:          xfrm-type-10-IPPROTO_DSTOPTS
> alias:          xfrm-type-10-IPPROTO_ROUTING
> 
> modinfo mip6 (after this change)
> alias:          xfrm-type-10-43
> alias:          xfrm-type-10-60

Instead of reverting these changes I suggest someone fix
whatever is processing that information.

I do not condone the application of this patch for the
above two reasons. Though you might argue that I should
just make all debuggers and compilers better at dealing
with DW_at_macro_info/DW_MACINFO_* debug info... and 
you also would not be wrong.

I hope that answers your question.

> Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
> ---
>  include/uapi/linux/in6.h | 23 +++++++----------------
>  1 file changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> index 633b93c..e9a1d2d97 100644
> --- a/include/uapi/linux/in6.h
> +++ b/include/uapi/linux/in6.h
> @@ -128,22 +128,13 @@ struct in6_flowlabel_req {
>   *	IPV6 extension headers
>   */
>  #if __UAPI_DEF_IPPROTO_V6
> -enum {
> -  IPPROTO_HOPOPTS = 0,		/* IPv6 hop-by-hop options      */
> -#define IPPROTO_HOPOPTS		IPPROTO_HOPOPTS
> -  IPPROTO_ROUTING = 43,		/* IPv6 routing header          */
> -#define IPPROTO_ROUTING		IPPROTO_ROUTING
> -  IPPROTO_FRAGMENT = 44,	/* IPv6 fragmentation header    */
> -#define IPPROTO_FRAGMENT	IPPROTO_FRAGMENT
> -  IPPROTO_ICMPV6 = 58,		/* ICMPv6                       */
> -#define IPPROTO_ICMPV6		IPPROTO_ICMPV6
> -  IPPROTO_NONE = 59,		/* IPv6 no next header          */
> -#define IPPROTO_NONE		IPPROTO_NONE
> -  IPPROTO_DSTOPTS = 60,		/* IPv6 destination options     */
> -#define IPPROTO_DSTOPTS		IPPROTO_DSTOPTS
> -  IPPROTO_MH = 135,		/* IPv6 mobility header         */
> -#define IPPROTO_MH		IPPROTO_MH
> -};
> +#define IPPROTO_HOPOPTS		0	/* IPv6 hop-by-hop options	*/
> +#define IPPROTO_ROUTING		43	/* IPv6 routing header		*/
> +#define IPPROTO_FRAGMENT	44	/* IPv6 fragmentation header	*/
> +#define IPPROTO_ICMPV6		58	/* ICMPv6			*/
> +#define IPPROTO_NONE		59	/* IPv6 no next header		*/
> +#define IPPROTO_DSTOPTS		60	/* IPv6 destination options	*/
> +#define IPPROTO_MH		135	/* IPv6 mobility header		*/
>  #endif /* __UAPI_DEF_IPPROTO_V6 */
>  
>  /*
> 

Cheers,
Carlos.

^ permalink raw reply

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: Joe Perches @ 2014-02-06 21:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: Eric Dumazet, David Miller, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <alpine.DEB.2.02.1402061302020.9567@chino.kir.corp.google.com>

On Thu, 2014-02-06 at 13:03 -0800, David Rientjes wrote:
> On Thu, 6 Feb 2014, Joe Perches wrote:
> 
> > On Thu, 2014-02-06 at 10:42 -0800, Eric Dumazet wrote:
> > > sock_alloc_send_pskb() & sk_page_frag_refill()
> > > have a loop trying high order allocations to prepare
> > > skb with low number of fragments as this increases performance.
> > > 
> > > Problem is that under memory pressure/fragmentation, this can
> > > trigger OOM while the intent was only to try the high order
> > > allocations, then fallback to order-0 allocations.
> > []
> > > Call Trace:
> > >  [<ffffffff8043766c>] dump_header+0xe1/0x23e
> > >  [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
> > >  [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
> > >  [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
> > >  [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
> > >  [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
> > >  [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
> > []
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > []
> > > @@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> > >  			while (order) {
> > >  				if (npages >= 1 << order) {
> > >  					page = alloc_pages(sk->sk_allocation |
> > > -							   __GFP_COMP | __GFP_NOWARN,
> > > +							   __GFP_COMP |
> > > +							   __GFP_NOWARN |
> > > +							   __GFP_NORETRY,
> > >  							   order);
> > >  					if (page)
> > >  						goto fill_page;
> > > @@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
> > >  		gfp_t gfp = prio;
> > >  
> > >  		if (order)
> > > -			gfp |= __GFP_COMP | __GFP_NOWARN;
> > > +			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
> > 
> > Perhaps add __GFP_THISNODE too ?
> > 
> 
> How does __GFP_THISNODE have anything to do with avoiding oom killing due 
> to high-order fragmentation?

I don't think it does.

> If they absolutely require local memory to 
> currnet's cpu node then that would make sense,

I presumed THISNODE would be used only with NORETRY

> but the fallback still 
> allocates order-0 memory remotely and with __GFP_THISNODE on this attempt 
> we wouldn't even attempt remote reclaim.
any other alloc attempt could work on other cpus.

It was just a thought, ignore it if it's a dumb thought.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox