Netdev List
 help / color / mirror / Atom feed
* Re: OOM when adding ipv6 route:  How to make available more per-cpu memory?
From: Ben Greear @ 2010-11-06  0:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: NetDev, linux-kernel, Tejun Heo
In-Reply-To: <1288995103.2665.653.camel@edumazet-laptop>

On 11/05/2010 03:11 PM, Eric Dumazet wrote:
> Le vendredi 05 novembre 2010 à 21:20 +0100, Eric Dumazet a écrit :
>> Your vmalloc space is very fragmented. pcpu_get_vm_areas() want
>> hugepages (4MB on your machine, 2MB on mine because I have
>> CONFIG_HIGHMEM64G=y)
>
> Well, this is wrong. We use normal (4KB) pages, unfortunately.
>
> I have a NUMA machine, with two nodes, so pcpu_get_vm_areas() allocates
> two zones, one for each node, with a 'known' offset between them.
> Then, 4KB pages are allocated to populate the zone when needed.
>
> # grep pcpu_get_vm_areas /proc/vmallocinfo
> 0xffffe8ffa0400000-0xffffe8ffa0600000 2097152 pcpu_get_vm_areas+0x0/0x740 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x740 vmalloc
>
> BTW, we dont have the number of pages currently allocated in each
> 'vmalloc' zone, and/or node information.
>
> Tejun, do you have plans to use hugepages eventually ?
> (and fallback to 4KB pages, but most percpu data are allocated right
> after boot)

We just tried creating 1000 macvlans with IPv6 addrs on a 64-bit machine
with 12GB RAM.  Only around 520 interfaces properly set their IPs, and
again there are errors about of-of-memory from 'ip', but no obvious
splats in dmesg.

'top' shows 10G or so free.

It will take some time to figure out what exactly is returning
the ENOMEM....

Thanks,
Ben

>
> Thanks
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* FINANCE
From: ACCESS FINANCE INC @ 2010-11-05 20:57 UTC (permalink / raw)


                                                                              AP
PLY NOW FOR YOUR FINANCIAL ADVANCEMENT. ACCESS FINANCE LOAN FACILITY TO PEOPLE A
T 2.5% FIXED INTEREST RATE. IF YOUR ARE INTERESTED IN GETTING THIS LOAN PLEASE S
END IN THE FOLLOWING INFORMATION TO US WITHOUT DELAY: NAME.... LOAN AMOUNT... DU
RATION... PHONE NUMBER... FROM STEVE BARRY.GUARANTEED NO MATTER YOUR CREDIT SCOR
E.


^ permalink raw reply

* Re: radvd and auto-ipv6 address regression from 2.6.31 to 2.6.34+
From: Mikael Abrahamsson @ 2010-11-06  6:17 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev
In-Reply-To: <4CD48FB4.2050903@candelatech.com>

On Fri, 5 Nov 2010, Ben Greear wrote:

>> On hacked 2.6.34 and 2.6.36 kernels, however, the veth0 gains a new 
>> address that appears to be generated similar to other IPs associated 
>> with auto-creation via radvd.
>> 
>> I have not yet tested intervening kernels or physical interfaces between
>> two machines.
>> 
>> So, the question is: Is the new behaviour on purpose, or is it a
>> regression bug?
>
> Actually, this doesn't seem to work for 2.6.31 either, so I guess it
> isn't a regression.
>
> Is it expected behaviour, however?

I would not be surprised if the IPv6 stack on Linux would gain IPv6 
addresses using SLAAC from any interface if it sees RAs on it, regardless 
if these are locally generated or not.

I guess any Linux device running radvd should turn off autoconf on those 
interface that radvd is acting on?

net.ipv6.conf.veth0.accept_ra=0 should do the trick?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply

* Re: 2.6.36 vlans & bridge crash - status?
From: Nikola Ciprich @ 2010-11-06  7:11 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev list, nikola.ciprich
In-Reply-To: <AANLkTin130xsLRQRMH2W6Yy=gJSD-qS29vR-E6Qvfgpg@mail.gmail.com>

Hello Jesse,
It really fixes the problem, thanks!
Was the patch submitted to Greg for 2.6.36.y ?
n.


On Fri, Nov 05, 2010 at 08:05:17AM -0700, Jesse Gross wrote:
> On Fri, Nov 5, 2010 at 3:35 AM, Nikola Ciprich <extmaillist@linuxbox.cz> wrote:
> > Hello,
> > I would like to ask if somebody is trying to solve
> > https://bugzilla.kernel.org/show_bug.cgi?id=20462
> > and if needed, could I provide some help, testing etc?
> 
> I sent out a patch for something related:
> http://permalink.gmane.org/gmane.linux.network/176566
> 
> My guess is that it will also fix your issue.  Can you test it?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

^ permalink raw reply

* Re: OOM when adding ipv6 route:  How to make available more per-cpu memory?
From: Eric Dumazet @ 2010-11-06  7:26 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, linux-kernel, Tejun Heo
In-Reply-To: <4CD49C2F.3060904@candelatech.com>

Le vendredi 05 novembre 2010 à 17:07 -0700, Ben Greear a écrit :

> We just tried creating 1000 macvlans with IPv6 addrs on a 64-bit machine
> with 12GB RAM.  Only around 520 interfaces properly set their IPs, and
> again there are errors about of-of-memory from 'ip', but no obvious
> splats in dmesg.
> 
> 'top' shows 10G or so free.
> 
> It will take some time to figure out what exactly is returning
> the ENOMEM....

At least, nothing to do with percpu stuff ?

On my 4GB machine, 16 'cpus' (but 32 possible cpus), I was able to
allocate.
8192 percpu 8192 bytes structures

(total : 32 * 8192 * 8192 = 2 Gbytes)

setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:32 nr_node_ids:2
PERCPU: Embedded 26 pages/cpu @ffff88007fc00000 s76032 r8192 d22272 u131072
pcpu-alloc: s76032 r8192 d22272 u131072 alloc=1*2097152
pcpu-alloc: [0] 00 02 04 06 08 10 12 14 17 19 21 23 25 27 29 31 
pcpu-alloc: [1] 01 03 05 07 09 11 13 15 16 18 20 22 24 26 28 30 

grep Vmalloc /proc/meminfo 
VmallocTotal:   34359738367 kB
VmallocUsed:     2202592 kB
VmallocChunk:   34356996456 kB

Make sure udev / hotplug is not the problem, if you create your devices
very fast.

(modprobe dummy numdummies=2000) can be very slow because of that.
All tasks are fighting for RTNL or sysfs mutex.




^ permalink raw reply

* Re: OOM when adding ipv6 route:  How to make available more per-cpu memory?
From: Tejun Heo @ 2010-11-06  9:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Greear, NetDev, linux-kernel
In-Reply-To: <1288995103.2665.653.camel@edumazet-laptop>

Hello,

On 11/05/2010 11:11 PM, Eric Dumazet wrote:
> Le vendredi 05 novembre 2010 à 21:20 +0100, Eric Dumazet a écrit :
>> Your vmalloc space is very fragmented. pcpu_get_vm_areas() want
>> hugepages (4MB on your machine, 2MB on mine because I have
>> CONFIG_HIGHMEM64G=y)
> 
> Well, this is wrong. We use normal (4KB) pages, unfortunately.
>
> I have a NUMA machine, with two nodes, so pcpu_get_vm_areas() allocates
> two zones, one for each node, with a 'known' offset between them.
> Then, 4KB pages are allocated to populate the zone when needed.
> 
> # grep pcpu_get_vm_areas /proc/vmallocinfo 
> 0xffffe8ffa0400000-0xffffe8ffa0600000 2097152 pcpu_get_vm_areas+0x0/0x740 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x740 vmalloc
> 
> BTW, we dont have the number of pages currently allocated in each
> 'vmalloc' zone, and/or node information.
> 
> Tejun, do you have plans to use hugepages eventually ?
> (and fallback to 4KB pages, but most percpu data are allocated right
> after boot)

Well, it's rather complicated.  Till now, the percpu usage hasn't
justified allocating hugepages but it might someday, but more
importantly the reason why those big chunks of address space are used
is to keep the first chunk embedded in the regular linear kernel
address space to avoid extra TLB pressure.

On configurations where vmalloc area is a scarce resource,
percpu_alloc=page can be specified to use page-mapped allocation.
This will use much smaller chunks in vmalloc area at the cost of
additional 4k page TLB pressure for percpu memory in the first chunk
(all the static percpu variables and then some).

pcpu_embed_first_chunk() contains heuristic which makes it yield to
page allocator but the parameter is pretty generous (maximum distance
between chunks > 75% of vmalloc area).  It's there just to avoid
completely crazy cases.  Also, x86 setup_per_cpu_areas() chooses page
allocator on 32bit NUMAs.  This case didn't trigger either.  We
probably need to add another condition.

That large machine on 32bit is bound to be flaky.  To be short on
virtual address space is a pretty silly and stupid thing.  Anyways,
any good idea on what criteria we could test?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: bonding: flow control regression [was Re: bridging: flow control regression]
From: Simon Horman @ 2010-11-06  9:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jay Vosburgh, David S. Miller
In-Reply-To: <1288690185.2832.8.camel@edumazet-laptop>

On Tue, Nov 02, 2010 at 10:29:45AM +0100, Eric Dumazet wrote:
> Le mardi 02 novembre 2010 à 17:46 +0900, Simon Horman a écrit :
> 
> > Thanks Eric, that seems to resolve the problem that I was seeing.
> > 
> > With your patch I see:
> > 
> > No bonding
> > 
> > # netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472
> > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
> > Socket  Message  Elapsed      Messages                   CPU      Service
> > Size    Size     Time         Okay Errors   Throughput   Util     Demand
> > bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB
> > 
> > 116736    1472   30.00     2438413      0      957.2     8.52     1.458 
> > 129024           30.00     2438413             957.2     -1.00    -1.000
> > 
> > With bonding (one slave, the interface used in the test above)
> > 
> > netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472
> > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
> > Socket  Message  Elapsed      Messages                   CPU      Service
> > Size    Size     Time         Okay Errors   Throughput   Util     Demand
> > bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB
> > 
> > 116736    1472   30.00     2438390      0      957.1     8.97     1.535 
> > 129024           30.00     2438390             957.1     -1.00    -1.000
> > 
> 
> 
> Sure the patch helps when not too many flows are involved, but this is a
> hack.
> 
> Say the device queue is 1000 packets, and you run a workload with 2000
> sockets, it wont work...
> 
> Or device queue is 1000 packets, one flow, and socket send queue size
> allows for more than 1000 packets to be 'in flight' (echo 2000000
> >/proc/sys/net/core/wmem_default) , it wont work too with bonding, only
> with devices with a qdisc sitting in the first device met after the
> socket.

True, thanks for pointing that out.

The scenario that I am actually interested in is virtualisation.
And I believe that your patch helps the vhostnet case (I don't see
flow control problems with bonding + virtio without vhostnet). However,
I am unsure if there are also some easy work-arounds to degrade
flow control in the vhostnet case too.


^ permalink raw reply

* [PATCH] iproute2: remove useless use of buffer
From: Andreas Schwab @ 2010-11-06  9:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Print directly to the file instead of going through a buffer.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
---
 ip/ipaddress.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 19b3d6e..fc306e6 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -613,23 +613,21 @@ int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n,
 		fprintf(fp, "%s", (char*)RTA_DATA(rta_tb[IFA_LABEL]));
 	if (rta_tb[IFA_CACHEINFO]) {
 		struct ifa_cacheinfo *ci = RTA_DATA(rta_tb[IFA_CACHEINFO]);
-		char buf[128];
 		fprintf(fp, "%s", _SL_);
+		fprintf(fp, "       valid_lft ");
 		if (ci->ifa_valid == INFINITY_LIFE_TIME)
-			sprintf(buf, "valid_lft forever");
+			fprintf(fp, "forever");
 		else
-			sprintf(buf, "valid_lft %usec", ci->ifa_valid);
+			fprintf(fp, "%usec", ci->ifa_valid);
+		fprintf(fp, " preferred_lft ");
 		if (ci->ifa_prefered == INFINITY_LIFE_TIME)
-			sprintf(buf+strlen(buf), " preferred_lft forever");
+			fprintf(fp, "forever");
 		else {
 			if (deprecated)
-				sprintf(buf+strlen(buf), " preferred_lft %dsec",
-					ci->ifa_prefered);
+				fprintf(fp, "%dsec", ci->ifa_prefered);
 			else
-				sprintf(buf+strlen(buf), " preferred_lft %usec",
-					ci->ifa_prefered);
+				fprintf(fp, "%usec", ci->ifa_prefered);
 		}
-		fprintf(fp, "       %s", buf);
 	}
 	fprintf(fp, "\n");
 	fflush(fp);
-- 
1.7.3.2


-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply related

* hi
From: Gabriel kante @ 2010-11-06  8:25 UTC (permalink / raw)


I write to seek your help in the retrieval sum funds from US Bank account
belonging to my late Dad.


^ permalink raw reply

* Re: ping -I eth1 ....
From: Joakim Tjernlund @ 2010-11-06  9:42 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Eric Dumazet, netdev, Thomas Graf
In-Reply-To: <20101105203150.GA12118@canuck.infradead.org>

Thomas Graf <tgr@infradead.org> wrote on 2010/11/05 21:31:50:
>
> On Fri, Nov 05, 2010 at 04:54:18PM +0100, Joakim Tjernlund wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/11/05 16:06:54:
> > >
> > > > Hopefully most of that is legacy or just plain wrong? Unless
> > > > someone can say why only test IFF_UP one should consider changing them.
> > > >
> > >
> > > Most of the places are hot path.
> > >
> > > You dont want to replace one test by four tests.
> > >
> > > _This_ would be wrong :)
> >
> > Wrong is wrong, even if it is in the hot path :)
> > Perhaps it is time define and internal IFF_OPERATIONAL flag
> > which is the sum of IFF_UP, IFF_RUNNING etc.? Tht
> > way you still get one test in the hot path and can abstract
> > what defines an operational link.
>
> You definitely don't want to have your send() call fail simply because
> the carrier was off for a few msec or the routing daemon has put a link
> down temporarly. Also, the outgoing interface looked up at routing
> decision is not necessarly the interface used for sending in the end.
> The packet may get mangled and rerouted by netfilter or tc on the way.

But do you handle the case when the link is non operational for a long time?

>
> Personally I'm even ok with the current behaviour of sendto() while the
> socket is bound to an interface but if we choose to return an error
> if the interface is down we might as well do so based on the operational
> status.

Perhaps there is a better way. This all started when pppd hung because
of ping -I <ppp interface>, then someone pulled the cable for the on the link.

This is a strace where we have two ping -I,
ping -I p1-2-1-2-2 .. and ping -I p1-2-3-2-4 ..
Notice how pppd hangs for a long time in PPPIOCDETACH
As far as I can tell this is due to ping -I has claimed the ppp interfaces
and doesn't noticed that the link is down. Ideally ping should receive
a ENODEV as soon as pppd calls PPPIOCDETACH.


   0.000908 write(0, "Connection terminated.\n", 23) = 23
     0.000481 gettimeofday({1288952770, 566048}, NULL) = 0
     0.001553 ioctl(7, PPPIOCDETACH
Message from syslogd@Brazil at Fri Nov  5 11:26:20 2010 ...
Brazil kernel: unregister_netdevice: waiting for p1-2-1-2-2 to become free. Usage count = 3
Message from syslogd@Brazil at Fri Nov  5 11:26:20 2010 ...
Brazil kernel: unregister_netdevice: waiting for p1-2-3-2-4 to become free. Usage count = 3
Message from syslogd@Brazil at Fri Nov  5 11:26:51 2010 ...
Brazil last message repeated 3 times
, 0xbfbc3398) = 0
    66.559216 connect(9, {sa_family=AF_PPPOX, sa_data="\0\0\0\0\0\0\0\252\273\314\335\356hd"}, 30) = 0
     0.000693 close(10)                 = 0
     0.000449 close(7)                  = 0
     0.009801 close(9)                  = 0


^ permalink raw reply

* PERSONAL
From: Harry Barnes @ 2010-11-06 12:32 UTC (permalink / raw)
  To: netdev

Hello,

This is Harry Barnes of the US Marine Corps, but redeployed to Iraq, and presently stationed in Mosul. I will like to share some very vital information that would bring some good financial returns to us in just a few weeks or days depending on how fast we pursue the matter. I am seeking your assistance to evacuate the sum of $28m to you,as far as I can be assured that it will be safe in your care until I complete my service here.

This may not be the best medium to make this kind of contact because of the numerous scam offers transmitted through the Internet, but it is all I have access to for now. I will be very grateful if you can give me the opportunity to discuss this matter with you by assuring me that you will not use any part of it against me in anyway, I hope you understand my limitations here. Should you have reasons to reject this  offer,  please destroy this mail as any  leakage will be too  bad for us.

Respectfully,

Harry Barnes.



^ permalink raw reply

* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Julian Anastasov @ 2010-11-06 14:18 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev
In-Reply-To: <20101105235411.GA2443@verge.net.au>


 	Hello,

On Sat, 6 Nov 2010, Simon Horman wrote:

> This is a first attempt at allowing LVS to transmit
> skbs of greater than MTU length that have been aggregated by GRO.
>
> I have lightly tested the ip_vs_dr_xmit() portion of this patch and
> although it seems to work I am unsure that netif_needs_gso() is the correct
> test to use.

 	ip_forward() uses !skb_is_gso(skb), so may be it is
enough to check for GRO instead of using netif_needs_gso?

> Signed-off-by: Simon Horman <horms@verge.net.au>
>
> ---
>
> * LRO is still an outstanding issue, but as its deprecated in favour
>  of GRO perhaps it doesn't need to be solved.
>
> * v1
>  - Based on 2.6.35
>
> * v2
>  - Rebase on current nf-next-2.6 tree (~2.6.37-rc1)

> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_xmit.c	2010-10-30 11:43:37.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_xmit.c	2010-11-06 08:09:17.000000000 +0900

> @@ -1008,7 +1012,8 @@ ip_vs_dr_xmit(struct sk_buff *skb, struc
>
> 	/* MTU checking */
> 	mtu = dst_mtu(&rt->dst);
> -	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu) {
> +	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu &&
> +	    !netif_needs_gso(rt->dst.dev, skb)) {
> 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
> 		ip_rt_put(rt);
> 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Simon Horman @ 2010-11-06 14:28 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, netdev
In-Reply-To: <alpine.LFD.2.00.1011061608510.4009@ja.ssi.bg>

On Sat, Nov 06, 2010 at 04:18:21PM +0200, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Sat, 6 Nov 2010, Simon Horman wrote:
> 
> >This is a first attempt at allowing LVS to transmit
> >skbs of greater than MTU length that have been aggregated by GRO.
> >
> >I have lightly tested the ip_vs_dr_xmit() portion of this patch and
> >although it seems to work I am unsure that netif_needs_gso() is the correct
> >test to use.
> 
> 	ip_forward() uses !skb_is_gso(skb), so may be it is
> enough to check for GRO instead of using netif_needs_gso?

Thanks, I'll look into that.

^ permalink raw reply

* Re: [PATCH 2/3] net: packet: fix information leak to userland
From: Vasiliy Kulikov @ 2010-11-06 14:39 UTC (permalink / raw)
  To: walter harms
  Cc: kernel-janitors, David S. Miller, Jiri Pirko, Eric Dumazet,
	netdev, linux-kernel
In-Reply-To: <4CCE84E2.9050801@bfs.de>

On Mon, Nov 01, 2010 at 10:14 +0100, walter harms wrote:
> 
> 
> Vasiliy Kulikov schrieb:
> > packet_getname_spkt() doesn't initialize all members of sa_data field of
> > sockaddr struct if strlen(dev->name) < 13.  This structure is then copied
> > to userland.  It leads to leaking of contents of kernel stack memory.
> > We have to fully fill sa_data with strncpy() instead of strlcpy().
> > 
> > The same with packet_getname(): it doesn't initialize sll_pkttype field of
> > sockaddr_ll.  Set it to zero.
> > 
> > Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
> > ---
> >  net/packet/af_packet.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 3616f27..0856a13 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -1719,7 +1719,7 @@ static int packet_getname_spkt(struct socket *sock, struct sockaddr *uaddr,
> >  	rcu_read_lock();
> >  	dev = dev_get_by_index_rcu(sock_net(sk), pkt_sk(sk)->ifindex);
> >  	if (dev)
> > -		strlcpy(uaddr->sa_data, dev->name, 15);
> > +		strncpy(uaddr->sa_data, dev->name, 14);
> >  	else
> >  		memset(uaddr->sa_data, 0, 14);
> 
> if i understand the code correcly the max size for dev->name is IFNAMSIZ.

For dev->name - IFNAMSIZ, for uaddr->sa_data - 14.

> You can simply that part:
> 
> memset(uaddr->sa_data, 0, IFNAMSIZ);
> dev = dev_get_by_index_rcu(sock_net(sk), pkt_sk(sk)->ifindex);
> if (dev)
> 	strlcpy(uaddr->sa_data, dev->name, IFNAMSIZ);

This will overflow uaddr->sa_data.  Also I don't see any difficulty to
fill the array only once.

> you should send that as separate patch.
> re,
>  wh
> 
> 
> >  	rcu_read_unlock();
> > @@ -1742,6 +1742,7 @@ static int packet_getname(struct socket *sock, struct sockaddr *uaddr,
> >  	sll->sll_family = AF_PACKET;
> >  	sll->sll_ifindex = po->ifindex;
> >  	sll->sll_protocol = po->num;
> > +	sll->sll_pkttype = 0;
> >  	rcu_read_lock();
> >  	dev = dev_get_by_index_rcu(sock_net(sk), po->ifindex);
> >  	if (dev) {

Thanks,

-- 
Vasiliy

^ permalink raw reply

* [PATCH  kernel 2.6.37-rc1] axnet_cs: fix resume problem for some Ax88790 chip
From: Ken Kawasaki @ 2010-11-06 15:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20101030071751.63611ede.ken_kawasaki@spring.nifty.jp>


axnet_cs:
    Some Ax88790 chip need to reinitialize the CISREG_CCSR register
    after resume.

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>

---

--- linux-2.6.37-rc1/drivers/net/pcmcia/axnet_cs.c.orig	2010-11-06 07:48:17.000000000 +0900
+++ linux-2.6.37-rc1/drivers/net/pcmcia/axnet_cs.c	2010-11-06 08:08:40.000000000 +0900
@@ -111,13 +111,14 @@ static irqreturn_t ax_interrupt(int irq,
 CISREG_CCSR
 typedef struct axnet_dev_t {
 	struct pcmcia_device	*p_dev;
-    caddr_t		base;
-    struct timer_list	watchdog;
-    int			stale, fast_poll;
-    u_short		link_status;
-    u_char		duplex_flag;
-    int			phy_id;
-    int			flags;
+	caddr_t	base;
+	struct timer_list	watchdog;
+	int	stale, fast_poll;
+	u_short	link_status;
+	u_char	duplex_flag;
+	int	phy_id;
+	int	flags;
+	int	active_low;
 } axnet_dev_t;
 
 static inline axnet_dev_t *PRIV(struct net_device *dev)
@@ -322,6 +323,8 @@ static int axnet_config(struct pcmcia_de
     if (info->flags & IS_AX88790)
 	outb(0x10, dev->base_addr + AXNET_GPIO);  /* select Internal PHY */
 
+    info->active_low = 0;
+
     for (i = 0; i < 32; i++) {
 	j = mdio_read(dev->base_addr + AXNET_MII_EEP, i, 1);
 	j2 = mdio_read(dev->base_addr + AXNET_MII_EEP, i, 2);
@@ -329,15 +332,18 @@ static int axnet_config(struct pcmcia_de
 	if ((j != 0) && (j != 0xffff)) break;
     }
 
-    /* Maybe PHY is in power down mode. (PPD_SET = 1) 
-       Bit 2 of CCSR is active low. */ 
     if (i == 32) {
+	/* Maybe PHY is in power down mode. (PPD_SET = 1)
+	   Bit 2 of CCSR is active low. */
 	pcmcia_write_config_byte(link, CISREG_CCSR, 0x04);
 	for (i = 0; i < 32; i++) {
 	    j = mdio_read(dev->base_addr + AXNET_MII_EEP, i, 1);
 	    j2 = mdio_read(dev->base_addr + AXNET_MII_EEP, i, 2);
 	    if (j == j2) continue;
-	    if ((j != 0) && (j != 0xffff)) break;
+	    if ((j != 0) && (j != 0xffff)) {
+		info->active_low = 1;
+		break;
+	    }
 	}
     }
 
@@ -383,8 +389,12 @@ static int axnet_suspend(struct pcmcia_d
 static int axnet_resume(struct pcmcia_device *link)
 {
 	struct net_device *dev = link->priv;
+	axnet_dev_t *info = PRIV(dev);
 
 	if (link->open) {
+		if (info->active_low == 1)
+			pcmcia_write_config_byte(link, CISREG_CCSR, 0x04);
+
 		axnet_reset_8390(dev);
 		AX88190_init(dev, 1);
 		netif_device_attach(dev);

^ permalink raw reply

* [PATCH 1/2] skge: Remove tx queue stopping in skge_devinit()
From: Guillaume Chazarain @ 2010-11-06 16:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: Guillaume Chazarain

After e6484930d7c73d324bccda7d43d131088da697b9: net: allocate tx queues in register_netdevice
It causes an Oops at skge_probe() time.

Signed-off-by: Guillaume Chazarain <guichaz@gmail.com>
---
 drivers/net/skge.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index bfec2e0..220e039 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3858,7 +3858,6 @@ static struct net_device *skge_devinit(struct skge_hw *hw, int port,
 
 	/* device is off until link detection */
 	netif_carrier_off(dev);
-	netif_stop_queue(dev);
 
 	return dev;
 }
-- 
1.7.1


^ permalink raw reply related

* [PATCH 2/2] net: Detect and ignore netif_stop_queue() calls before register_netdev()
From: Guillaume Chazarain @ 2010-11-06 16:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: Guillaume Chazarain
In-Reply-To: <1289061572-5029-1-git-send-email-guichaz@gmail.com>

After e6484930d7c73d324bccda7d43d131088da697b9: net: allocate tx queues in register_netdevice
These calls make net drivers oops at load time, so let's avoid people
git-bisect'ing known problems.

Signed-off-by: Guillaume Chazarain <guichaz@gmail.com>
---
 include/linux/netdevice.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 072652d..d8fd2c2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1554,6 +1554,11 @@ static inline void netif_tx_wake_all_queues(struct net_device *dev)
 
 static inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
 {
+	if (WARN_ON(!dev_queue)) {
+		printk(KERN_INFO "netif_stop_queue() cannot be called before "
+		       "register_netdev()");
+		return;
+	}
 	set_bit(__QUEUE_STATE_XOFF, &dev_queue->state);
 }
 
-- 
1.7.1


^ permalink raw reply related

* Re: OOM when adding ipv6 route:  How to make available more per-cpu memory?
From: Ben Greear @ 2010-11-06 17:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: NetDev, linux-kernel, Tejun Heo
In-Reply-To: <1289028392.2665.2418.camel@edumazet-laptop>

On 11/06/2010 12:26 AM, Eric Dumazet wrote:
> Le vendredi 05 novembre 2010 à 17:07 -0700, Ben Greear a écrit :
>
>> We just tried creating 1000 macvlans with IPv6 addrs on a 64-bit machine
>> with 12GB RAM.  Only around 520 interfaces properly set their IPs, and
>> again there are errors about of-of-memory from 'ip', but no obvious
>> splats in dmesg.
>>
>> 'top' shows 10G or so free.
>>
>> It will take some time to figure out what exactly is returning
>> the ENOMEM....
>
> At least, nothing to do with percpu stuff ?

At least I don't see any percpu dumps in dmesg.  I vaguely remember
someone posting some ipv6 address scalability patches some time back.
I think they had to hack on /proc fs as well.  I'll see if I can
dig those up.

> Make sure udev / hotplug is not the problem, if you create your devices
> very fast.

We can create the macvlans w/out problem, though I'm sure that could
be sped up.  The problem is when we try to add IPv6 addresses to
them.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled
From: Michael Breuer @ 2010-11-06 16:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jarek Poplawski, David Miller, netdev
In-Reply-To: <4B69AD5C.5030601@majjas.com>

Basically, if I enable tcp timestamps (now disabled) I get a sky2 hang. 
As with the earlier issue the effects are not seen until after a couple 
days of uptime and seem exacerbated by load.

I can't 100% confirm that the problem is not occurring without tcp 
timestamps, but will leave the system up for a while to try to confirm. 
This didn't occur previously without tcp timestamps enabled, but I also 
pulled git changes between the two events.

I'm now also on 2.6.37-rc1.... I did a quick scan and didn't see any 
obvious commits between 2.6.36-09934 and -rc1 that would have affected this.

 From the log:
Nov  2 05:41:54 mail kernel: DRHD: handling fault status reg 2
Nov  2 05:41:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffea3000
Nov  2 05:41:54 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0x80000000
Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Nov  2 05:42:01 mail clamd[9755]: SelfCheck: Database status OK.
Nov  2 05:42:11 mail root: ping of potter failed
Nov  2 05:42:16 mail kernel: ------------[ cut here ]------------
Nov  2 05:42:16 mail kernel: WARNING: at net/sched/sch_generic.c:258 
dev_watchdog+0x251/0x260()
Nov  2 05:42:16 mail kernel: Hardware name: System Product Name
Nov  2 05:42:16 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
queue 0 timed out
Nov  2 05:42:16 mail kernel: Modules linked in: cpufreq_stats 
ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat 
nf_nat iptable_mangle iptable_raw ebtable_nat ebtables bridge stp 
appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs coretemp 
sunrpc acpi_cpufreq mperf sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp xt_DSCP xt_dscp xt_mark nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state xt_multiport ipv6 kvm_intel kvm 
snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec 
snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device 
snd_pcm gspca_spca505 gspca_main snd_timer videodev snd v4l1_compat 
i2c_i801 sky2 v4l2_compat_ioctl32 iTCO_wdt pcspkr asus_atk0110 
i7core_edac edac_core soundcore iTCO_vendor_support snd_page_alloc 
microcode raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
async_memcpy async_tx raid1 ata_generic firewire_ohci pata_acpi 
firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm 
i2c_algo_bit i2c_core video output [
Nov  2 05:42:16 mail kernel: last unloaded: ip6_tables]
Nov  2 05:42:16 mail kernel: Pid: 0, comm: swapper Tainted: G        W   
2.6.36-09934-g2aab243 #44
Nov  2 05:42:16 mail kernel: Call Trace:
Nov  2 05:42:16 mail kernel: <IRQ>  [<ffffffff81058a4f>] 
warn_slowpath_common+0x7f/0xc0
Nov  2 05:42:16 mail kernel: [<ffffffff81058b46>] 
warn_slowpath_fmt+0x46/0x50
Nov  2 05:42:16 mail kernel: [<ffffffff814603d1>] dev_watchdog+0x251/0x260
Nov  2 05:42:16 mail kernel: [<ffffffff8108a4a6>] ? 
tick_program_event+0x26/0x30
Nov  2 05:42:16 mail kernel: [<ffffffff8107eed4>] ? 
hrtimer_interrupt+0x134/0x240
Nov  2 05:42:16 mail kernel: [<ffffffff81068ab0>] 
run_timer_softirq+0x160/0x390
Nov  2 05:42:16 mail kernel: [<ffffffff8108a368>] ? 
tick_dev_program_event+0x48/0x110
Nov  2 05:42:16 mail kernel: [<ffffffff81460180>] ? dev_watchdog+0x0/0x260
Nov  2 05:42:16 mail kernel: [<ffffffff8105f981>] __do_softirq+0xb1/0x220
Nov  2 05:42:16 mail kernel: [<ffffffff8100cfdc>] call_softirq+0x1c/0x30
Nov  2 05:42:16 mail kernel: [<ffffffff8100ea15>] do_softirq+0x65/0xa0
Nov  2 05:42:16 mail kernel: [<ffffffff8105f845>] irq_exit+0x85/0x90
Nov  2 05:42:16 mail kernel: [<ffffffff81511d61>] do_IRQ+0x71/0xf0
Nov  2 05:42:16 mail kernel: [<ffffffff8150a7d3>] ret_from_intr+0x0/0x11
Nov  2 05:42:16 mail kernel: <EOI>  [<ffffffff812e4165>] ? 
intel_idle+0xd5/0x170
Nov  2 05:42:16 mail kernel: [<ffffffff812e4148>] ? intel_idle+0xb8/0x170
Nov  2 05:42:16 mail kernel: [<ffffffff81425b51>] 
cpuidle_idle_call+0x91/0x150
Nov  2 05:42:16 mail kernel: [<ffffffff8100aa8b>] cpu_idle+0xbb/0x150
Nov  2 05:42:16 mail kernel: [<ffffffff814f1785>] rest_init+0x75/0x80
Nov  2 05:42:16 mail kernel: [<ffffffff81b4ae9b>] start_kernel+0x3dc/0x3e7
Nov  2 05:42:16 mail kernel: [<ffffffff81b4a346>] 
x86_64_start_reservations+0x131/0x135
Nov  2 05:42:16 mail kernel: [<ffffffff81b4a450>] 
x86_64_start_kernel+0x106/0x115
Nov  2 05:42:16 mail kernel: ---[ end trace d9d3a1889f8925bf ]---
Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: tx timeout
Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: transmit ring 29 
.. 117 report=29 done=29


^ permalink raw reply

* [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Dan Rosenberg @ 2010-11-06 20:11 UTC (permalink / raw)
  To: chas, davem, kuznet, pekkas, jmorris, yoshfuji, kaber,
	remi.denis-courmont
  Cc: netdev, security

Maintainers,

I am planning on submitting a series of patches to resolve the leakage
of kernel heap addresses to userspace via network protocol /proc
interfaces.  Revealing this information is a bad idea from a security
perspective for a number of reasons, the most obvious of which is it
provides unprivileged users a mechanism by which to create a structure
in the kernel heap containing function pointers, obtain the address of
that structure, and overwrite those function pointers by leveraging
other vulnerabilities.  It is my hope that by eliminating this
information leakage, in conjunction with making statically-declared
function pointer tables read-only (to be done in a separate patch
series), we can at least add a small hurdle for the exploitation of a
subset of kernel vulnerabilities.

Here's the list of protocols that I've identified as leaking socket
structure addresses via their /proc interfaces:

atm
ipv4
ipv6
key
netlink
packet
phonet
unix

There may be others, and I will continue looking for more examples, but
this is at least a good start.

Clearly, in most cases we cannot just remove the field from the /proc
output, as this would break a number of userspace programs that rely on
consistency.  However, I propose that we replace the address with a "0"
rather than leaking this information.

I'm writing this email before submitting a patch to ask if anyone would
have any objections to this change, and if anyone can foresee anything
breaking as a result of this.  Please let me know if you're on board,
have some doubts, or are totally opposed, and I can get this patch ready
for submission.

Thanks,
Dan Rosenberg


^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Linus Torvalds @ 2010-11-06 20:50 UTC (permalink / raw)
  To: Dan Rosenberg
  Cc: chas@cmf.nrl.navy.mil, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	pekkas@netcore.fi, jmorris@namei.org, yoshfuji@linux-ipv6.org,
	kaber@trash.net, remi.denis-courmont@nokia.com,
	netdev@vger.kernel.org, security@kernel.org
In-Reply-To: <1289074307.3090.100.camel@Dan>

On Saturday, November 6, 2010, Dan Rosenberg <drosenberg@vsecurity.com> wrote:
>
> Clearly, in most cases we cannot just remove the field from the /proc
> output, as this would break a number of userspace programs that rely on
> consistency.  However, I propose that we replace the address with a "0"
> rather than leaking this information.

I really think it would be much better to use the unidentified number
or similar.

Just replacing with zeroes is annoying, and has the potential of
losing actual information.

     Linus

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Ted Ts'o @ 2010-11-06 23:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Rosenberg, chas@cmf.nrl.navy.mil, davem@davemloft.net,
	kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, jmorris@namei.org,
	yoshfuji@linux-ipv6.org, kaber@trash.net,
	remi.denis-courmont@nokia.com, netdev@vger.kernel.org,
	security@kernel.org
In-Reply-To: <AANLkTimC01SViEOObgf2N2VcLTYO59rOAbHaKf2RM54w@mail.gmail.com>

On Sat, Nov 06, 2010 at 01:50:32PM -0700, Linus Torvalds wrote:
> On Saturday, November 6, 2010, Dan Rosenberg <drosenberg@vsecurity.com> wrote:
> > Clearly, in most cases we cannot just remove the field from the /proc
> > output, as this would break a number of userspace programs that rely on
> > consistency.  However, I propose that we replace the address with a "0"
> > rather than leaking this information.
> 
> I really think it would be much better to use the unidentified number
> or similar.
> 
> Just replacing with zeroes is annoying, and has the potential of
> losing actual information.

Are there any userspace programs that might be reasonably expected to
_use_ this information?  If there is, we could just pick a random
number at boot time, and then XOR the heap adddress with that random
number.

						- Ted

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: David Miller @ 2010-11-06 23:57 UTC (permalink / raw)
  To: torvalds
  Cc: drosenberg, chas, kuznet, pekkas, jmorris, yoshfuji, kaber,
	remi.denis-courmont, netdev, security
In-Reply-To: <AANLkTimC01SViEOObgf2N2VcLTYO59rOAbHaKf2RM54w@mail.gmail.com>

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 6 Nov 2010 13:50:32 -0700

> On Saturday, November 6, 2010, Dan Rosenberg <drosenberg@vsecurity.com> wrote:
>>
>> Clearly, in most cases we cannot just remove the field from the /proc
>> output, as this would break a number of userspace programs that rely on
>> consistency.  However, I propose that we replace the address with a "0"
>> rather than leaking this information.
> 
> I really think it would be much better to use the unidentified number
> or similar.
> 
> Just replacing with zeroes is annoying, and has the potential of
> losing actual information.

I would really like to see the specific examples of where this is
happening, it sounds like something very silly to me.

^ permalink raw reply

* (unknown), 
From: NOKIA MOBILE XMAS-PROMO @ 2010-11-07  3:00 UTC (permalink / raw)





Congratulation: Dear Winner,

You have been awarded £600.000.00 GBP.
 in the NOKIA MOBILE-LOTTO Satellite
Software email lottery in which e-mail addresses are picked randomly by
Software powered by the internet through the worldwide website.

Your email address, attached to Ref Number:5, 7, 14, 17, 18, 43 with
Serial Number:1979-12 Verification Number:CY-085-333-0, and consequently
won the lottery in the "A" Category. You have therefore been approved for
a lump sum pay out of £600.000.00 GBP.

===============================
Contact: Mr.GARRY MOOR.
E-mail: garrym_2010@yahoo.co.jp

Please Indicate Your Means Of Delivery:

1: by courier Service

2: bank to bank wire transfer

================================

Provide him with the informations as stated below:
1. Name________________
2. Address:______________
3. Marital Status:__________
4. Age:__________________
5. Sex:__________________
6. Nationality:____________
7. Country of Residence:____
8. Occupation:____________
9. Telephone Number& Fax Number____
10.Draw Number above:___________

These details facilitate the due process and the release of winnings to
avoid unnecessary delays and complications in the processing of your
winnings.

Sincerely,
Mr. GARRY MOOR
Online Games Director
NOKIA MOBILE XMAS-PROMO.
NOKIA CONNECTING PEOPLE.


^ permalink raw reply

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
From: Maxim Levitsky @ 2010-11-07  3:26 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, netdev@vger.kernel.org
In-Reply-To: <tkrat.276aeae22ec60090@s5r6.in-berlin.de>

On Sun, 2010-11-07 at 00:23 +0100, Stefan Richter wrote:
> On  6 Nov, Stefan Richter wrote:
> > Then I tried an XIO2213A card in the AMD PC (again the Intel PC as peer)
> > and got 243 times "failed: 12" i.e. RCODE_BUSY and 81 times "failed: 10"
> > i.e. RCODE_SEND_ERROR during ftp transfer of a >500 MB large file from
> > XIO2213A to FW323.
> 

I also am getting strange results (but very good compared to what I had
recently).

With all your patches, I get very stable TCP and UDP streams from laptop
to desktop at 180~190 Mbits/s.

However, the opposite direction (desktop->laptop) still suffers from
tlabel exhaustion.
I added some printks, and I see, clearly that netif_stop_queue doesn't
always work (probably this is intended?).

If I replace == with >= in inc_queue_packets and similar in
dec_queued_packets, then tlabel exhaustion disappears, and I get ~240
Mbit/s on TCP and UDP.

UDP transfers work quite well, tested for few minutes.
TCP transfers unfortunelly trigger (probably a hardware) bug in notebook
OHCI controller (I have seen that meny times so far.)

Transfer just stops, and controller goes south.
If I unload the firewire-ohci, then when I load it:

[ 2062.632532] firewire_ohci 0000:07:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2072.650173] firewire_ohci: Failed to reset ohci card.
[ 2072.650267] firewire_ohci 0000:07:00.0: PCI INT A disabled
[ 2072.650314] firewire_ohci: probe of 0000:07:00.0 failed with error -16


Only suspend to ram helps bring it back from that state.

Best regards,
	Maxim Levitsky


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox