Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets
From: Felipe W Damasio @ 2010-07-16 15:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Avi Kivity, David Miller, Patrick McHardy, linux-kernel, netdev
In-Reply-To: <AANLkTikkcaionMp3SJLgo3JmCjxQ7rPZv3-3_RbxCx_4@mail.gmail.com>

Hi All,

2010/7/14 Felipe W Damasio <felipewd@gmail.com>:
> Hi Mr. Dumazet,
>
> 2010/7/14 Eric Dumazet <eric.dumazet@gmail.com>:
>> RDX being the sk pointer (and sk+0x38 contains the corrupted "sk_prot" value)
>> , we notice RBP contains same "sk" value + 0x200000  (2 Mbytes).
>>
>> (same remark on your initial bug report)
>>
>> Could you enable CONFIG_FRAME_POINTER=y in your config ?

I did, this is the new bug:

general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
CPU 2
Modules linked in: e1000e

Pid: 4209, comm: squid Not tainted 2.6.34 #4 DX58SO/
RIP: 0010:[<ffffffff8137a887>]  [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
RSP: 0018:ffff88042d781ba8  EFLAGS: 00010282
RAX: 9a7e7f4602400d48 RBX: ffff88034c918e00 RCX: 0000000000000720
RDX: ffff880413a82e00 RSI: ffff8804161e5e2a RDI: ffff88034c918e00
RBP: ffff88042d781ba8 R08: ffff88042d781b98 R09: 0000000000000000
R10: 0000000000040570 R11: 0000000000000000 R12: ffff880413882e00
R13: 00000000000005a8 R14: 00000000000005a8 R15: 000000000000a84d
FS:  00007f9aa0007710(0000) GS:ffff880001a80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f11f831f020 CR3: 000000042d5f8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process squid (pid: 4209, threadinfo ffff88042d780000, task ffff88042e325620)
Stack:
 ffff88042d781bc8 ffffffff8137fda0 ffff880413882e00 ffff88034c918e00
<0> ffff88042d781be8 ffffffff8137fb3b ffff88034c918e00 ffff88034c918e00
<0> ffff88042d781cd8 ffffffff813be69b ffff88042d781c38 ffffffff813c76e4
Call Trace:
 [<ffffffff8137fda0>] skb_release_head_state+0x75/0xc0
 [<ffffffff8137fb3b>] __kfree_skb+0x11/0x86
 [<ffffffff813be69b>] tcp_recvmsg+0x6b9/0x8be
 [<ffffffff813c76e4>] ? tcp_current_mss+0x46/0x65
 [<ffffffff8137ab89>] sock_common_recvmsg+0x32/0x47
 [<ffffffff811bafb6>] ? selinux_socket_recvmsg+0x1d/0x1f
 [<ffffffff81378752>] __sock_recvmsg+0x6a/0x76
 [<ffffffff81378847>] sock_aio_read+0xe9/0x102
 [<ffffffff811b9d68>] ? avc_has_perm+0x4e/0x60
 [<ffffffff810b63f6>] do_sync_read+0xc7/0x10d
 [<ffffffff811bdb17>] ? selinux_file_permission+0xa5/0xb2
 [<ffffffff811b7917>] ? security_file_permission+0x11/0x13
 [<ffffffff810b6e7b>] vfs_read+0xbb/0x102
 [<ffffffff810b6f86>] sys_read+0x47/0x70
 [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
Code: c3 48 8b 57 18 55 8b 87 d8 00 00 00 48 89 e5 48 8d 8a ac 00 00
00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42 38 <48>
83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c9 c3 55 48 89
RIP  [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
 RSP <ffff88042d781ba8>
---[ end trace 8932efc1ba58ce6e ]---

Does this tell you anything?

Cheers,

Felipe Damasio

^ permalink raw reply

* Re: [PATCH 1/2] Remove REDWOOD_[456] config options and conditional code
From: Milton Miller @ 2010-07-16 15:45 UTC (permalink / raw)
  To: Christian Dietrich
  Cc: Milton Miller, Josh Boyer, Matt Porter, Benjamin Herrenschmidt,
	Paul Mackerras, Solomon Peachy, David Woodhouse, Mike Frysinger,
	Jiri Kosina, Artem Bityutskiy, Alexander Kurz, David S. Miller,
	Randy Dunlap, John Linn, Florian Fainelli, Nicolas Pitre,
	Joe Perches, Ladislav Michl, David Brown, linuxppc-dev,
	linux-kernel, linux-mtd, netdev, vamos-dev
In-Reply-To: <20100716142055.GA11736@zod.rchland.ibm.com>


On Fri, 16 Jul 2010 at about 08:20:55 -0600 Josh Boyer wrote:
> On Fri, Jul 16, 2010 at 02:29:02PM +0200, Christian Dietrich wrote: 
> > The config options for REDWOOD_[456] were commented out in the powerpc
> > Kconfig. The ifdefs referencing this options therefore are dead and all
> > references to this can be removed (Also dependencies in other KConfig
> > files).

> This seems fine with me.
> 
> The only question is which tree it coms through. I'm happy to take it
> in via mine if the netdev and MTD people are fine with that. Otherwise,
> my ack is below.


> On Fri, 16 Jul 2010 around 14:29:08 +0200 Christian Dietrich wrote:
> > diff --git a/drivers/mtd/maps/Kconfig b/drivers/mtd/maps/Kconfig
> > index f22bc9f..6629d09 100644
> > --- a/drivers/mtd/maps/Kconfig
> > +++ b/drivers/mtd/maps/Kconfig
> > @@ -321,7 +321,7 @@ config MTD_CFI_FLAGADM
> > 
> >  config MTD_REDWOOD
> >  	tristate "CFI Flash devices mapped on IBM Redwood"
> > -	depends on MTD_CFI && ( REDWOOD_4 || REDWOOD_5 || REDWOOD_6 )
> > +	depends on MTD_CFI
> >  	help
> >  	  This enables access routines for the flash chips on the IBM
> >  	  Redwood board. If you have one of these boards and would like to
> > diff --git a/drivers/mtd/maps/redwood.c b/drivers/mtd/maps/redwood.c
> > index 933c0b6..d2c9db0 100644
> > --- a/drivers/mtd/maps/redwood.c
> > +++ b/drivers/mtd/maps/redwood.c
> > @@ -22,8 +22,6 @@

The patches are unnecssarly coupled by removing the REDWOOD_* symbols
in the MTD area before removing the files and Kconfig completely in
the second patch.  This could easily be eliminated by pushing the
two fragments into the second patch.

milton

^ permalink raw reply

* Re: [PATCH 01/11] Removing dead RT2800PCI_SOC
From: Gertjan van Wingerde @ 2010-07-16 15:46 UTC (permalink / raw)
  To: Helmut Schaa
  Cc: Bartlomiej Zolnierkiewicz, Felix Fietkau, John W. Linville,
	Ivo Van Doorn, Christoph Egger, linux-wireless, users, netdev,
	linux-kernel, vamos-dev, Luis Correia
In-Reply-To: <AANLkTik8M0gNipi_rwVJYbmo_3FfdHs-H4H_XDgTcbhI@mail.gmail.com>

On 07/16/10 12:08, Helmut Schaa wrote:
> On Fri, Jul 16, 2010 at 9:18 AM, Gertjan van Wingerde
> <gwingerde@gmail.com> wrote:
>>
>> On 07/16/10 08:57, Helmut Schaa wrote:
>>> On Thu, Jul 15, 2010 at 10:41 AM, Bartlomiej Zolnierkiewicz <bzolnier@gmail.com <mailto:bzolnier@gmail.com>> wrote:
>>>
>>>     On Wednesday 14 July 2010 04:44:44 pm Felix Fietkau wrote:
>>>     > On 2010-07-14 3:15 PM, John W. Linville wrote:
>>>     > > On Wed, Jul 14, 2010 at 02:52:14PM +0200, Ivo Van Doorn wrote:
>>>     > >> On Wed, Jul 14, 2010 at 2:46 PM, Luis Correia <luis.f.correia@gmail.com <mailto:luis.f.correia@gmail.com>> wrote:
>>>     > >> > On Wed, Jul 14, 2010 at 13:39, Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>> wrote:
>>>     > >> >> While RT2800PCI_SOC exists in Kconfig, it depends on either
>>>     > >> >> RALINK_RT288X or RALINK_RT305X which are both not available in Kconfig
>>>     > >> >> so all Code depending on that can't ever be selected and, if there's
>>>     > >> >> no plan to add these options, should be cleaned up
>>>     > >> >>
>>>     > >> >> Signed-off-by: Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>>
>>>     > >> >
>>>     > >> > NAK,
>>>     > >> >
>>>     > >> > this is not dead code, it is needed for the Ralink System-on-Chip
>>>     > >> > Platform devices.
>>>     > >> >
>>>     > >> > While I can't fix Kconfig errors and the current KConfig file may be
>>>     > >> > wrong, this code cannot and will not be deleted.
>>>     > >>
>>>     > >> When the config option was introduced, the config options RALINK_RT288X and
>>>     > >> RALINK_RT305X were supposed to be merged as well soon after by somebody (Felix?)
>>>     > >>
>>>     > >> But since testing is done on SoC boards by Helmut and Felix, I assume the code
>>>     > >> isn't dead but actually in use.
>>>     > >
>>>     > > Perhaps Helmut and Felix can send us the missing code?
>>>     > The missing code is a MIPS platform port, which is currently being
>>>     > maintained in OpenWrt, but is not ready for upstream submission yet.
>>>     > I'm not working on this code at the moment, but I think it will be
>>>     > submitted once it's ready.
>>>
>>>     People are using automatic scripts to catch unused config options nowadays
>>>     so the issue is quite likely to come back again sooner or later..
>>>
>>>     Would it be possible to improve situation somehow till the missing parts
>>>     get merged?  Maybe by adding a tiny comment documenting RT2800PCI_SOC
>>>     situation to Kconfig (if the config option itself really cannot be removed)
>>>     until all code is ready etc.?
>>>
>>>
>>> Or we could just remove RT2800PCI_SOC completely and build the soc specific
>>> parts always as part of rt2800pci. I mean it's not much code, just the platform
>>> driver stuff and the eeprom access.
>>>
>>
>> I'm not sure if that is feasible. Sure, we can reduce the usage of the variable by
>> unconditionally compiling in the generic SOC code, but we should not unconditionally
>> register the SOC platform device, which is currently also under the scope of this
>> Kconfig variable.
> 
> Ehm, no, the platform device is not registered in rt2800pci at all,
> it's just the platform
> driver that gets registered there. The platform device will be
> registered in the according
> board init code (that only resides in openwrt at the moment).
> 

OK. Didn't know that. Sounds good then.

However, I've tried this in my local tree, and now compilation fails on the x86 platform
due to a missing KSEG1ADDR macro. How do you suggest to handle the potentially missing
macro?

---
Gertjan.

^ permalink raw reply

* Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets
From: Eric Dumazet @ 2010-07-16 15:52 UTC (permalink / raw)
  To: Felipe W Damasio
  Cc: Avi Kivity, David Miller, Patrick McHardy, linux-kernel, netdev
In-Reply-To: <AANLkTik3Lf9IbgFiagZ-oTDsqzHhhuH2B0DL6PFyZVeV@mail.gmail.com>

Le vendredi 16 juillet 2010 à 12:41 -0300, Felipe W Damasio a écrit :
> Hi All,
> 
> 2010/7/14 Felipe W Damasio <felipewd@gmail.com>:
> > Hi Mr. Dumazet,
> >
> > 2010/7/14 Eric Dumazet <eric.dumazet@gmail.com>:
> >> RDX being the sk pointer (and sk+0x38 contains the corrupted "sk_prot" value)
> >> , we notice RBP contains same "sk" value + 0x200000  (2 Mbytes).
> >>
> >> (same remark on your initial bug report)
> >>
> >> Could you enable CONFIG_FRAME_POINTER=y in your config ?
> 
> I did, this is the new bug:
> 
> general protection fault: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
> CPU 2
> Modules linked in: e1000e
> 
> Pid: 4209, comm: squid Not tainted 2.6.34 #4 DX58SO/
> RIP: 0010:[<ffffffff8137a887>]  [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
> RSP: 0018:ffff88042d781ba8  EFLAGS: 00010282
> RAX: 9a7e7f4602400d48 RBX: ffff88034c918e00 RCX: 0000000000000720
> RDX: ffff880413a82e00 RSI: ffff8804161e5e2a RDI: ffff88034c918e00
> RBP: ffff88042d781ba8 R08: ffff88042d781b98 R09: 0000000000000000
> R10: 0000000000040570 R11: 0000000000000000 R12: ffff880413882e00
> R13: 00000000000005a8 R14: 00000000000005a8 R15: 000000000000a84d
> FS:  00007f9aa0007710(0000) GS:ffff880001a80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f11f831f020 CR3: 000000042d5f8000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process squid (pid: 4209, threadinfo ffff88042d780000, task ffff88042e325620)
> Stack:
>  ffff88042d781bc8 ffffffff8137fda0 ffff880413882e00 ffff88034c918e00
> <0> ffff88042d781be8 ffffffff8137fb3b ffff88034c918e00 ffff88034c918e00
> <0> ffff88042d781cd8 ffffffff813be69b ffff88042d781c38 ffffffff813c76e4
> Call Trace:
>  [<ffffffff8137fda0>] skb_release_head_state+0x75/0xc0
>  [<ffffffff8137fb3b>] __kfree_skb+0x11/0x86
>  [<ffffffff813be69b>] tcp_recvmsg+0x6b9/0x8be
>  [<ffffffff813c76e4>] ? tcp_current_mss+0x46/0x65
>  [<ffffffff8137ab89>] sock_common_recvmsg+0x32/0x47
>  [<ffffffff811bafb6>] ? selinux_socket_recvmsg+0x1d/0x1f
>  [<ffffffff81378752>] __sock_recvmsg+0x6a/0x76
>  [<ffffffff81378847>] sock_aio_read+0xe9/0x102
>  [<ffffffff811b9d68>] ? avc_has_perm+0x4e/0x60
>  [<ffffffff810b63f6>] do_sync_read+0xc7/0x10d
>  [<ffffffff811bdb17>] ? selinux_file_permission+0xa5/0xb2
>  [<ffffffff811b7917>] ? security_file_permission+0x11/0x13
>  [<ffffffff810b6e7b>] vfs_read+0xbb/0x102
>  [<ffffffff810b6f86>] sys_read+0x47/0x70
>  [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
> Code: c3 48 8b 57 18 55 8b 87 d8 00 00 00 48 89 e5 48 8d 8a ac 00 00
> 00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42 38 <48>
> 83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c9 c3 55 48 89
> RIP  [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
>  RSP <ffff88042d781ba8>
> ---[ end trace 8932efc1ba58ce6e ]---
> 
> Does this tell you anything?
> 

Could you privatly send me the vmlinux file ?

^ permalink raw reply

* [RFC] LSM hook for post recvmsg.
From: Tetsuo Handa @ 2010-07-16 16:14 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hello, David. Thank you for giving me suggestions at Japan Linux Symposium 2009.
As TOMOYO is getting functional and AppArmor is about to join mainline, I'd like
to resume discussions regarding LSM hooks for post accept()/recvmsg() operations.

Below is a patch for post recvmsg() operation. I modified the patch to call
skb_recv_datagram() again (for udp_recvmsg(), raw_recvmsg(), udpv6_recvmsg())
if LSM dicided to drop the message. (Regarding rawv6_recvmsg(), I didn't do so
in accordance with the comment at "csum_copy_err:".)
What do you think about this verion?

Regards.

diff --git a/include/linux/security.h b/include/linux/security.h
index 723a93d..409c44d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -879,6 +879,12 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@size contains the size of message structure.
  *	@flags contains the operational flags.
  *	Return 0 if permission is granted.
+ * @socket_post_recvmsg:
+ *	Check permission after receiving a message from a socket.
+ *	The message is discarded if permission is not granted.
+ *	@sk contains the sock structure.
+ *	@skb contains the sk_buff structure.
+ *	Return 0 if permission is granted.
  * @socket_getsockname:
  *	Check permission before the local address (name) of the socket object
  *	@sock is retrieved.
@@ -1575,6 +1581,7 @@ struct security_operations {
 			       struct msghdr *msg, int size);
 	int (*socket_recvmsg) (struct socket *sock,
 			       struct msghdr *msg, int size, int flags);
+	int (*socket_post_recvmsg) (struct sock *sk, struct sk_buff *skb);
 	int (*socket_getsockname) (struct socket *sock);
 	int (*socket_getpeername) (struct socket *sock);
 	int (*socket_getsockopt) (struct socket *sock, int level, int optname);
@@ -2526,6 +2533,7 @@ int security_socket_accept(struct socket *sock, struct socket *newsock);
 int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size);
 int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 			    int size, int flags);
+int security_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb);
 int security_socket_getsockname(struct socket *sock);
 int security_socket_getpeername(struct socket *sock);
 int security_socket_getsockopt(struct socket *sock, int level, int optname);
@@ -2617,6 +2625,12 @@ static inline int security_socket_recvmsg(struct socket *sock,
 	return 0;
 }
 
+static inline int security_socket_post_recvmsg(struct sock *sk,
+					       struct sk_buff *skb)
+{
+	return 0;
+}
+
 static inline int security_socket_getsockname(struct socket *sock)
 {
 	return 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 2c7a163..69652d4 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -676,9 +676,15 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
-	if (!skb)
-		goto out;
+	for (;;) {
+		skb = skb_recv_datagram(sk, flags, noblock, &err);
+		if (!skb)
+			goto out;
+		err = security_socket_post_recvmsg(sk, skb);
+		if (likely(!err))
+			break;
+		skb_kill_datagram(sk, skb, flags);
+	}
 
 	copied = skb->len;
 	if (len < copied) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5858574..9145685 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1125,6 +1125,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	bool slow;
+	bool update_stat;
 
 	/*
 	 *	Check any passed addresses
@@ -1140,6 +1141,12 @@ try_again:
 				  &peeked, &err);
 	if (!skb)
 		goto out;
+	err = security_socket_post_recvmsg(sk, skb);
+	if (err) {
+		update_stat = false;
+		goto csum_copy_err;
+	}
+	update_stat = true;
 
 	ulen = skb->len - sizeof(struct udphdr);
 	if (len > ulen)
@@ -1200,7 +1207,7 @@ out:
 
 csum_copy_err:
 	slow = lock_sock_fast(sk);
-	if (!skb_kill_datagram(sk, skb, flags))
+	if (!skb_kill_datagram(sk, skb, flags) && update_stat)
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
 	unlock_sock_fast(sk, slow);
 
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4a4dcbe..135d4ed 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -467,6 +467,9 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	skb = skb_recv_datagram(sk, flags, noblock, &err);
 	if (!skb)
 		goto out;
+	err = security_socket_post_recvmsg(sk, skb);
+	if (unlikely(err))
+		goto csum_copy_err;
 
 	copied = skb->len;
 	if (copied > len) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 87be586..6cae276 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -329,6 +329,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	int is_udplite = IS_UDPLITE(sk);
 	int is_udp4;
 	bool slow;
+	bool update_stat;
 
 	if (addr_len)
 		*addr_len=sizeof(struct sockaddr_in6);
@@ -344,6 +345,12 @@ try_again:
 				  &peeked, &err);
 	if (!skb)
 		goto out;
+	err = security_socket_post_recvmsg(sk, skb);
+	if (err) {
+		update_stat = false;
+		goto csum_copy_err;
+	}
+	update_stat = true;
 
 	ulen = skb->len - sizeof(struct udphdr);
 	if (len > ulen)
@@ -426,7 +433,7 @@ out:
 
 csum_copy_err:
 	slow = lock_sock_fast(sk);
-	if (!skb_kill_datagram(sk, skb, flags)) {
+	if (!skb_kill_datagram(sk, skb, flags) && update_stat) {
 		if (is_udp4)
 			UDP_INC_STATS_USER(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
diff --git a/security/capability.c b/security/capability.c
index 4aeb699..709aea3 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -597,6 +597,11 @@ static int cap_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 	return 0;
 }
 
+static int cap_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb)
+{
+	return 0;
+}
+
 static int cap_socket_getsockname(struct socket *sock)
 {
 	return 0;
@@ -1001,6 +1006,7 @@ void __init security_fixup_ops(struct security_operations *ops)
 	set_to_cap_if_null(ops, socket_accept);
 	set_to_cap_if_null(ops, socket_sendmsg);
 	set_to_cap_if_null(ops, socket_recvmsg);
+	set_to_cap_if_null(ops, socket_post_recvmsg);
 	set_to_cap_if_null(ops, socket_getsockname);
 	set_to_cap_if_null(ops, socket_getpeername);
 	set_to_cap_if_null(ops, socket_setsockopt);
diff --git a/security/security.c b/security/security.c
index e8c87b8..4291bd7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1037,6 +1037,12 @@ int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 	return security_ops->socket_recvmsg(sock, msg, size, flags);
 }
 
+int security_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb)
+{
+	return security_ops->socket_post_recvmsg(sk, skb);
+}
+EXPORT_SYMBOL(security_socket_post_recvmsg);
+
 int security_socket_getsockname(struct socket *sock)
 {
 	return security_ops->socket_getsockname(sock);

^ permalink raw reply related

* [PATCH] rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()
From: Stephen Boyd @ 2010-07-16 16:50 UTC (permalink / raw)
  To: users-poMEt7QlJxcwIE2E9O76wjtx2kNaKg5H
  Cc: Ivo van Doorn, John W. Linville,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

The rt2x00dev->intf_work workqueue is never initialized when a driver is
probed for a non-existent device (in this case rt2500usb). On such a
path we call rt2x00lib_remove_dev() to free any resources initialized
during the probe before we use INIT_WORK to initialize the workqueue.
This causes lockdep to get confused since the lock used in the workqueue
hasn't been initialized yet but is now being acquired during
cancel_work_sync() called by rt2x00lib_remove_dev().

Fix this by initializing the workqueue first before we attempt to probe
the device. This should make lockdep happy and avoid breaking any
assumptions about how the library cleans up after a probe fails.

phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Pid: 2027, comm: modprobe Not tainted 2.6.35-rc5+ #60
Call Trace:
 [<ffffffff8105fe59>] register_lock_class+0x152/0x31f
 [<ffffffff81344a00>] ? usb_control_msg+0xd5/0x111
 [<ffffffff81061bde>] __lock_acquire+0xce/0xcf4
 [<ffffffff8105f6fd>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff81492aef>] ?  _raw_spin_unlock_irqrestore+0x33/0x41
 [<ffffffff810628d5>] lock_acquire+0xd1/0xf7
 [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
 [<ffffffff8104f06e>] __cancel_work_timer+0xd0/0x17e
 [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
 [<ffffffff8104f136>] cancel_work_sync+0xb/0xd
 [<ffffffffa0096675>] rt2x00lib_remove_dev+0x25/0xb0 [rt2x00lib]
 [<ffffffffa0096bf7>] rt2x00lib_probe_dev+0x380/0x3ed [rt2x00lib]
 [<ffffffff811d78a7>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffffa00bbd2c>] ? T.676+0xe/0x10 [rt2x00usb]
 [<ffffffffa00bbe4f>] rt2x00usb_probe+0x121/0x15e [rt2x00usb]
 [<ffffffff813468bd>] usb_probe_interface+0x151/0x19e
 [<ffffffff812ea08e>] driver_probe_device+0xa7/0x136
 [<ffffffff812ea167>] __driver_attach+0x4a/0x66
 [<ffffffff812ea11d>] ? __driver_attach+0x0/0x66
 [<ffffffff812e96ca>] bus_for_each_dev+0x54/0x89
 [<ffffffff812e9efd>] driver_attach+0x19/0x1b
 [<ffffffff812e9b64>] bus_add_driver+0xb4/0x204
 [<ffffffff812ea41b>] driver_register+0x98/0x109
 [<ffffffff813465dd>] usb_register_driver+0xb2/0x173
 [<ffffffffa00ca000>] ? rt2500usb_init+0x0/0x20 [rt2500usb]
 [<ffffffffa00ca01e>] rt2500usb_init+0x1e/0x20 [rt2500usb]
 [<ffffffff81000203>] do_one_initcall+0x6d/0x17a
 [<ffffffff8106cae8>] sys_init_module+0x9c/0x1e0
 [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b

Signed-off-by: Stephen Boyd <bebarino-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/net/wireless/rt2x00/rt2x00dev.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
index 3ae468c..f20d3ee 100644
--- a/drivers/net/wireless/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
@@ -854,6 +854,11 @@ int rt2x00lib_probe_dev(struct rt2x00_dev *rt2x00dev)
 		    BIT(NL80211_IFTYPE_WDS);
 
 	/*
+	 * Initialize configuration work.
+	 */
+	INIT_WORK(&rt2x00dev->intf_work, rt2x00lib_intf_scheduled);
+
+	/*
 	 * Let the driver probe the device to detect the capabilities.
 	 */
 	retval = rt2x00dev->ops->lib->probe_hw(rt2x00dev);
@@ -863,11 +868,6 @@ int rt2x00lib_probe_dev(struct rt2x00_dev *rt2x00dev)
 	}
 
 	/*
-	 * Initialize configuration work.
-	 */
-	INIT_WORK(&rt2x00dev->intf_work, rt2x00lib_intf_scheduled);

^ permalink raw reply related

* Re: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index> mapping
From: Ben Hutchings @ 2010-07-16 17:10 UTC (permalink / raw)
  To: Chetan Loke; +Cc: netdev, Loke, Chetan
In-Reply-To: <AANLkTinxLy2noL0GcZyVtMsEndxseCr6kkxuvyBNIfHr@mail.gmail.com>

On Fri, 2010-07-16 at 09:18 -0400, Chetan Loke wrote:
[...]
> Requirement:
> R1)Ability to address NICs/interfaces using a mac-addr in ioctls. This
> is required because we don't have a consistent naming scheme for
> Ethernet devices.Asking customers and/or field-engineers to change
> udev rules and
> other config files is not feasible.

I don't know why they would need to change those.  It might be useful to
have a simple userland tool that will look up a device name by MAC
address.  But it's not like a MAC address is any more user-friendly than
a net device name!

> Existing pain-points:
> P1) ioctl needs either i) if-name or ii) if-index before we can invoke
> bind() etc.This works fine if you know your configuration and it is not going
> to change.However,if we hot-add a NIC and if you have adapters from multiple
> vendors(think:driver load order) then upon a reboot,the 'eth'
> interfaces can be re-mapped.

As you well know, udev makes the name/MAC-address association
persistent, so this is no longer a problem.

[...]
>   W2.1) If renaming were to even succeed then none of the existing
> drivers re-register their msix-vectors.

There is no need to do that, since the IRQ handler names are not copied
but are held by the driver.  So the driver only needs to rewrite the
names when the device is renamed.  However that does require registering
a netdev notifier.  It might be worth adding a netdev operation for
renaming, to make this easier for driver maintainers.

[...]
> But there is no programmatic way of deriving the 'ethX' name.
[...]

Deriving it from what?  Are you aiming to determine what the name of a
device will be before the physical device is installed?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Raise initial congestion window size / speedup slow start?
From: Patrick McManus @ 2010-07-16 17:01 UTC (permalink / raw)
  To: H.K. Jerry Chu; +Cc: David Miller, davidsen, lists, linux-kernel, netdev
In-Reply-To: <AANLkTil3MVGwJUG12dikut9X1s9Ozc_3GwMjx_gErvxI@mail.gmail.com>

On Wed, 2010-07-14 at 21:51 -0700, H.K. Jerry Chu wrote:
>  except there are indeed bugs in the code today in that the
> code in various places assumes initcwnd as per RFC3390. So when
> initcwnd is raised, that actual value may be limited unnecessarily by
> the initial wmem/sk_sndbuf.

Thanks for the discussion!

can you tell us more about the impl concerns of initcwnd stored on the
route?

and while I'm asking for info, can you expand on the conclusion
regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
only read the slides.. maybe the paper has more info?)

article and slides much appreciated and very interetsing. I've long been
of the opinion that the downsides of being too aggressive once in a
while aren't all that serious anymore.. as someone else said in a
non-reservation world you are always trying to predict the future anyhow
and therefore overflowing a queue is always possible no matter how
conservative.

^ permalink raw reply

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
From: Dave Hansen @ 2010-07-16 17:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: divya, LKML, linuxppc-dev, sachinp, benh, netdev, David Miller,
	Jan-Bernd Themann
In-Reply-To: <1279274185.2549.14.camel@edumazet-laptop>

On Fri, 2010-07-16 at 11:56 +0200, Eric Dumazet wrote:
> 
> > SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> >    cache: kmalloc-16384, object size: 16384, buffer size: 16384,
> default order: 2, min order: 0
> >    node 0: slabs: 28, objs: 292, free: 0
> > ip: page allocation failure. order:0, mode:0x8020
> > Call Trace:
> > [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> > [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> > [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> > [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
> > [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
> > [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
> > [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
> > [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
> > [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
> > [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> > [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> > [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> > [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> > [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> > [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> > [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> > [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> > [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> > [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> > [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> > Mem-Info:
> > Node 0 DMA per-cpu:
> > CPU    0: hi:    0, btch:   1 usd:   0
> > CPU    1: hi:    0, btch:   1 usd:   0
> > CPU    2: hi:    0, btch:   1 usd:   0
> > CPU    3: hi:    0, btch:   1 usd:   0
> > 
> > The mainline 2.6.35-rc5 worked fine.
> 
> Maybe you were lucky with 2.6.35-rc5
> 
> Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
> called in process context, but GFP_KERNEL.
> 
> Another patch is needed for ehea_refill_rq_def() as well.

You're right that this is abusing GFP_ATOMIC.

But is, this is just a normal "GFP_ATOMIC" allocation failure?  "SLUB:
Unable to allocate memory on node -1" seems like a somewhat
inappropriate error message for that.  

It isn't immediately obvious where the -1 is coming from.  Does it truly
mean "allocate from any node" here, or is that a buglet in and of
itself?

-- Dave


^ permalink raw reply

* Re: Raise initial congestion window size / speedup slow start?
From: Ed W @ 2010-07-16 17:41 UTC (permalink / raw)
  To: Patrick McManus
  Cc: H.K. Jerry Chu, David Miller, davidsen, linux-kernel, netdev
In-Reply-To: <1279299709.2156.5814.camel@tng>

> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)
>    

My guess is that this result is specific to google and their servers?

I guess we can probably stereotype the world into two pools of devices:

1) Devices in a pool of fast networking, but connected to the rest of 
the world through a relatively slow router
2) Devices connected via a high speed network and largely the bottleneck 
device is many hops down the line and well away from us

I'm thinking here 1) client users behind broadband routers, wireless, 
3G, dialup, etc and 2) public servers that have obviously been 
deliberately placed in locations with high levels of interconnectivity.

I think history information could be more useful for clients in category 
1) because there is a much higher probability that their most 
restrictive device is one hop away and hence affects all connections and 
relatively occasionally the bottleneck is multiple hops away.  For 
devices in category 2) it's much harder because the restriction will 
usually be lots of hops away and effectively you are trying to figure 
out and cache the speed of every ADSL router out there...  For sure you 
can probably figure out how to cluster this stuff and say that pool 
there is 56K dialup, that pool there is "broadband", that pool is cell 
phone, etc, but probably it's hard to do better than that?

So my guess is this is why google have had poor results investigating 
cwnd caching?

However, I would suggest that whilst it's of little value for the server 
side, it still remains a very interesting idea for the client side and 
the cache hit ratio would seem to be dramatically higher here?

I haven't studied the code, but given there is a userspace ability to 
change init cwnd through the IP utility, it would seem likely that 
relatively little coding would now be required to implement some kind of 
limited cwnd caching and experiment with whether this is a valuable 
addition?  I would have thought if you are only fiddling with devices 
behind a broadband router then there is little chance of you "crashing 
the internet" with these kind of experiments?

Good luck

Ed W

^ permalink raw reply

* Re: [PATCH 01/11] Removing dead RT2800PCI_SOC
From: Helmut Schaa @ 2010-07-16 17:44 UTC (permalink / raw)
  To: Gertjan van Wingerde
  Cc: Bartlomiej Zolnierkiewicz, Felix Fietkau, John W. Linville,
	Ivo Van Doorn, Christoph Egger, linux-wireless, users, netdev,
	linux-kernel, vamos-dev, Luis Correia
In-Reply-To: <4C407ED4.6000002@gmail.com>

Am Freitag 16 Juli 2010 schrieb Gertjan van Wingerde:
> On 07/16/10 12:08, Helmut Schaa wrote:
> > On Fri, Jul 16, 2010 at 9:18 AM, Gertjan van Wingerde
> > <gwingerde@gmail.com> wrote:
> >>
> >> On 07/16/10 08:57, Helmut Schaa wrote:
> >>> On Thu, Jul 15, 2010 at 10:41 AM, Bartlomiej Zolnierkiewicz <bzolnier@gmail.com <mailto:bzolnier@gmail.com>> wrote:
> >>>
> >>>     On Wednesday 14 July 2010 04:44:44 pm Felix Fietkau wrote:
> >>>     > On 2010-07-14 3:15 PM, John W. Linville wrote:
> >>>     > > On Wed, Jul 14, 2010 at 02:52:14PM +0200, Ivo Van Doorn wrote:
> >>>     > >> On Wed, Jul 14, 2010 at 2:46 PM, Luis Correia <luis.f.correia@gmail.com <mailto:luis.f.correia@gmail.com>> wrote:
> >>>     > >> > On Wed, Jul 14, 2010 at 13:39, Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>> wrote:
> >>>     > >> >> While RT2800PCI_SOC exists in Kconfig, it depends on either
> >>>     > >> >> RALINK_RT288X or RALINK_RT305X which are both not available in Kconfig
> >>>     > >> >> so all Code depending on that can't ever be selected and, if there's
> >>>     > >> >> no plan to add these options, should be cleaned up
> >>>     > >> >>
> >>>     > >> >> Signed-off-by: Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>>
> >>>     > >> >
> >>>     > >> > NAK,
> >>>     > >> >
> >>>     > >> > this is not dead code, it is needed for the Ralink System-on-Chip
> >>>     > >> > Platform devices.
> >>>     > >> >
> >>>     > >> > While I can't fix Kconfig errors and the current KConfig file may be
> >>>     > >> > wrong, this code cannot and will not be deleted.
> >>>     > >>
> >>>     > >> When the config option was introduced, the config options RALINK_RT288X and
> >>>     > >> RALINK_RT305X were supposed to be merged as well soon after by somebody (Felix?)
> >>>     > >>
> >>>     > >> But since testing is done on SoC boards by Helmut and Felix, I assume the code
> >>>     > >> isn't dead but actually in use.
> >>>     > >
> >>>     > > Perhaps Helmut and Felix can send us the missing code?
> >>>     > The missing code is a MIPS platform port, which is currently being
> >>>     > maintained in OpenWrt, but is not ready for upstream submission yet.
> >>>     > I'm not working on this code at the moment, but I think it will be
> >>>     > submitted once it's ready.
> >>>
> >>>     People are using automatic scripts to catch unused config options nowadays
> >>>     so the issue is quite likely to come back again sooner or later..
> >>>
> >>>     Would it be possible to improve situation somehow till the missing parts
> >>>     get merged?  Maybe by adding a tiny comment documenting RT2800PCI_SOC
> >>>     situation to Kconfig (if the config option itself really cannot be removed)
> >>>     until all code is ready etc.?
> >>>
> >>>
> >>> Or we could just remove RT2800PCI_SOC completely and build the soc specific
> >>> parts always as part of rt2800pci. I mean it's not much code, just the platform
> >>> driver stuff and the eeprom access.
> >>>
> >>
> >> I'm not sure if that is feasible. Sure, we can reduce the usage of the variable by
> >> unconditionally compiling in the generic SOC code, but we should not unconditionally
> >> register the SOC platform device, which is currently also under the scope of this
> >> Kconfig variable.
> > 
> > Ehm, no, the platform device is not registered in rt2800pci at all,
> > it's just the platform
> > driver that gets registered there. The platform device will be
> > registered in the according
> > board init code (that only resides in openwrt at the moment).
> > 
> 
> OK. Didn't know that. Sounds good then.
> 
> However, I've tried this in my local tree, and now compilation fails on the x86 platform
> due to a missing KSEG1ADDR macro. How do you suggest to handle the potentially missing
> macro?

We can convert it to an ioremap call, that should be available on all platforms.

Helmut

^ permalink raw reply

* Re: Question about way that NICs deliver packets to the kernel
From: Rick Jones @ 2010-07-16 17:58 UTC (permalink / raw)
  To: Junchang Wang; +Cc: Ben Hutchings, romieu, netdev
In-Reply-To: <AANLkTild5SQvt7Uw7AAV7BrrSdvPdoglgbm1uJQnigdM@mail.gmail.com>

Junchang Wang wrote:
>>You should also compare the CPU usage.
>>
>>Ben.
>>
> 
> Hi Ben,
> I added options -c -C to netperf's command line. Result is as follows:
>                     scheme 1    scheme 2    Imp.
> Throughput:     683M        718M       5%
> CPU usage:     47.8%       45.6%
> 
> That really surprised me because "top" command showed the CPU usage
> was fluctuating between 0.5% and 1.5% rather that between 45% and 50%.

Can you tell us a bit more about the system, and which version of netperf you 
are using?  Any chance that the CPU utilization you were looking at in top was 
just that being charged to netperf the process?  "Network processing" does not 
often get charged to the responsible process, so netperf reports system-wide CPU 
utilization on the assumption it is the only thing causing the CPUs to be utilized.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index> mapping
From: Stephen Hemminger @ 2010-07-16 18:04 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Chetan Loke, netdev, Loke, Chetan
In-Reply-To: <1279300205.2097.9.camel@achroite.uk.solarflarecom.com>

On Fri, 16 Jul 2010 18:10:05 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Fri, 2010-07-16 at 09:18 -0400, Chetan Loke wrote:
> [...]
> > Requirement:
> > R1)Ability to address NICs/interfaces using a mac-addr in ioctls. This
> > is required because we don't have a consistent naming scheme for
> > Ethernet devices.Asking customers and/or field-engineers to change
> > udev rules and
> > other config files is not feasible.
> 
> I don't know why they would need to change those.  It might be useful to
> have a simple userland tool that will look up a device name by MAC
> address.  But it's not like a MAC address is any more user-friendly than
> a net device name!
> 
> > Existing pain-points:
> > P1) ioctl needs either i) if-name or ii) if-index before we can invoke
> > bind() etc.This works fine if you know your configuration and it is not going
> > to change.However,if we hot-add a NIC and if you have adapters from multiple
> > vendors(think:driver load order) then upon a reboot,the 'eth'
> > interfaces can be re-mapped.
> 
> As you well know, udev makes the name/MAC-address association
> persistent, so this is no longer a problem.
> 
> [...]
> >   W2.1) If renaming were to even succeed then none of the existing
> > drivers re-register their msix-vectors.
> 
> There is no need to do that, since the IRQ handler names are not copied
> but are held by the driver.  So the driver only needs to rewrite the
> names when the device is renamed.  However that does require registering
> a netdev notifier.  It might be worth adding a netdev operation for
> renaming, to make this easier for driver maintainers.
> 
> [...]
> > But there is no programmatic way of deriving the 'ethX' name.
> [...]
> 
> Deriving it from what?  Are you aiming to determine what the name of a
> device will be before the physical device is installed?
> 
> Ben.
> 

The additional API is not needed. It is trivial to find address for device
and do reverse mapping. Either with ioctl's or /sys/class/net/XXX/addr

^ permalink raw reply

* RE: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index>  mapping
From: Loke, Chetan @ 2010-07-16 18:12 UTC (permalink / raw)
  To: Stephen Hemminger, Ben Hutchings; +Cc: Chetan Loke, netdev
In-Reply-To: <20100716110407.316eb8f1@s6510>

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> Sent: July 16, 2010 2:04 PM
 
> The additional API is not needed. It is trivial to find address for
> device and do reverse mapping. Either with ioctl's 
Sorry, I might have missed it. But which ioctl would that be?

> or /sys/class/net/XXX/addr
So, is reading /sys/ nodes preferred over get-calls?

regards
Chetan Loke

^ permalink raw reply

* Re: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index> mapping
From: Stephen Hemminger @ 2010-07-16 18:31 UTC (permalink / raw)
  To: Loke, Chetan; +Cc: Ben Hutchings, Chetan Loke, netdev
In-Reply-To: <D3F292ADF945FB49B35E96C94C2061B90C55C202@nsmail.netscout.com>

On Fri, 16 Jul 2010 14:12:24 -0400
"Loke, Chetan" <Chetan.Loke@netscout.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> > Sent: July 16, 2010 2:04 PM
>  
> > The additional API is not needed. It is trivial to find address for
> > device and do reverse mapping. Either with ioctl's 
> Sorry, I might have missed it. But which ioctl would that be?

Simple way:
   Use SIOCGIFCONF to get list of interfaces
   Use SIOCGIFHWADDR to read device addresss

If you want to handle the case where device address is set
by bonding or other protocols, use ETHTOOL_GPERMADDR to read
the original ethernet address.

> > or /sys/class/net/XXX/addr
> So, is reading /sys/ nodes preferred over get-calls?

No difference

^ permalink raw reply

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
From: David Rientjes @ 2010-07-16 19:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Eric Dumazet, divya, LKML, linuxppc-dev, sachinp, benh, netdev,
	David Miller, Jan-Bernd Themann
In-Reply-To: <1279301731.9207.239.camel@nimitz>

On Fri, 16 Jul 2010, Dave Hansen wrote:

> > > SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> > >    cache: kmalloc-16384, object size: 16384, buffer size: 16384,
> > default order: 2, min order: 0
> > >    node 0: slabs: 28, objs: 292, free: 0
> > > ip: page allocation failure. order:0, mode:0x8020
> > > Call Trace:
> > > [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> > > [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> > > [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> > > [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
> > > [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
> > > [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
> > > [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
> > > [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
> > > [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
> > > [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> > > [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> > > [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> > > [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> > > [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> > > [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> > > [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> > > [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> > > [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> > > [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> > > [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> > > Mem-Info:
> > > Node 0 DMA per-cpu:
> > > CPU    0: hi:    0, btch:   1 usd:   0
> > > CPU    1: hi:    0, btch:   1 usd:   0
> > > CPU    2: hi:    0, btch:   1 usd:   0
> > > CPU    3: hi:    0, btch:   1 usd:   0
> > > 
> > > The mainline 2.6.35-rc5 worked fine.
> > 
> > Maybe you were lucky with 2.6.35-rc5
> > 
> > Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
> > called in process context, but GFP_KERNEL.
> > 
> > Another patch is needed for ehea_refill_rq_def() as well.
> 
> You're right that this is abusing GFP_ATOMIC.
> 
> But is, this is just a normal "GFP_ATOMIC" allocation failure?  "SLUB:
> Unable to allocate memory on node -1" seems like a somewhat
> inappropriate error message for that.  
> 

The slub message is seperate and doesn't generate a call trace, even 
though it is a (minimum) order-0 GFP_ATOMIC allocation as well.  The page 
allocation failure is seperate instance that is calling the page 
allocator, not the slab allocator.

> It isn't immediately obvious where the -1 is coming from.  Does it truly
> mean "allocate from any node" here, or is that a buglet in and of
> itself?
> 

Yes, slub uses -1 to indicate that the allocation need not come from a 
specific node.

^ permalink raw reply

* RE: [REGRESSION] e1000e stopped working [MANUALLY BISECTED]
From: Maxim Levitsky @ 2010-07-16 19:25 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: Kirsher, Jeffrey T, netdev@vger.kernel.org, Allan, Bruce W,
	Pieper, Jeffrey E
In-Reply-To: <1279220945.4411.6.camel@localhost.localdomain>

On Thu, 2010-07-15 at 22:09 +0300, Maxim Levitsky wrote:
> On Thu, 2010-07-15 at 13:02 -0600, Tantilov, Emil S wrote:
> > Maxim Levitsky wrote:
> > > On Thu, 2010-07-15 at 02:33 +0300, Maxim Levitsky wrote:
> > >> On Wed, 2010-07-14 at 16:56 -0600, Tantilov, Emil S wrote:
> > >>> Maxim Levitsky wrote:
> > >>>> On Mon, 2010-07-12 at 15:23 -0600, Tantilov, Emil S wrote:
> > >>>>> Maxim Levitsky wrote:
> > >>>>>> On Mon, 2010-07-05 at 12:58 +0300, Maxim Levitsky wrote:
> > >>>>>>> On Mon, 2010-07-05 at 01:13 -0700, Jeff Kirsher wrote:
> > >>>>>>>> On Sun, Jul 4, 2010 at 15:48, Maxim Levitsky
> > >>>>>>>> <maximlevitsky@gmail.com> wrote:
> > >>>>>>>>> Did few guesses, and now I see that reverting the below
> > >>>>>>>>> commit fixes the problem. 
> > >>>>>>>>> 
> > >>>>>>>>> "e1000e: Fix/cleanup PHY reset code for ICHx/PCHx"
> > >>>>>>>>> e98cac447cc1cc418dff1d610a5c79c4f2bdec7f.
> > >>>>>>>>> 
> > >>>>>>>>> 
> > >>>>>>>>> Best regards,
> > >>>>>>>>>        Maxim Levitsky
> > >>>>>>>>> 
> > >>>>>>>>> --
> > >>>>>>>> 
> > >>>>>>>> Can you give us till Tuesday to respond?  I know that there are
> > >>>>>>>> some additional e1000e patches in my queue, which may resolve
> > >>>>>>>> the issue, but this weekend the power is down to do some
> > >>>>>>>> infrastructure upgrades which prevents us from doing any
> > >>>>>>>> investigation.debugging until Tuesday.
> > >>>>>>>> 
> > >>>>>>> 
> > >>>>>>> Sure.
> > >>>>>>> 
> > >>>>>>> Best regards,
> > >>>>>>> 	Maxim Levitsky
> > >>>>>>> 
> > >>>>>> 
> > >>>>>> Updates?
> > >>>>> 
> > >>>>> We are working on reproducing the issue. So far we have not seen
> > >>>>> the problem when testing with net-next.
> > >>>>> 
> > >>>>> I asked in previous email about some additional info from ethtool
> > >>>>> (-d, -e, -S) and kernel config. That would help us to narrow it
> > >>>>> down. 
> > >>>>> 
> > >>>>> Thanks,
> > >>>>> Emil
> > >>>> I did send -e and -d output.
> > >>> 
> > >>> Sorry, looks like I lost the email with the attachements.
> > >>> 
> > >>> Could you provide the output of dmesg after the failure occurs?
> > >>> 
> > >>>> Since you probably want -S output during failure, I need to
> > >>>> recompile kernel for that. I will do that soon.
> > >>>> 
> > >>>> 
> > >>>> One question, in two weeks I hope 2.6.35 won't be released?
> > >>>> If so, I will have enough free time then to narrow down this issue.
> > >>>> 
> > >>>> Other solution, is to revert this commit.
> > >>>> (I have never seen this problem with it reverted).
> > >>> 
> > >>> We have been running reboot tests on 2 separate systems with recent
> > >>> net-next kernels using your config and so far no luck in
> > >>> reproducing this issue. 
> > >>> 
> > >>> What is the make model of your system (or MB)?
> > >> 
> > >> the motherboard is Intel DG965RY.
> > >> 
> > >> However, I am using vanilla kernel.
> > >> net-next might contain further fixes.
> > >> 
> > >> I see if net-next works here.
> > > 
> > > Yep, net-next works here.
> > > 
> > > 
> > > I have the problem on vanilla kernel.
> > > Last revision of it, I tested is 2.6.35-rc4 exactly
> > > (815c4163b6c8ebf8152f42b0a5fd015cfdcedc78)
> > > 
> > > 
> > > Maybe vanilla git master works, I test it too soon.
> > 
> > Thanks for the information! Good to know that this issue does not exist in the latest branch.
> > 
> > Have you by any chance tested a stable branch (2.6.34.x)?
> 
> I only did test plain 2.6.34 (v2.6.34)
And forgot to add, that it did work.

> 
> Also I repeat that revert of e98cac447cc1cc418dff1d610a5c79c4f2bdec7f 
> (e1000e: Fix/cleanup PHY reset code for ICHx/PCHx) fixes the bug on
> vanilla kernel.
> 
> Also I just pulled latest vanilla git, and I according to diffstat I see
> no changes in e1000e, so its likely that bug remains there.
> I will test that soon.
Tested, broken as expected.




Best regards,
	Maxim Levitsky





^ permalink raw reply

* Re: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index> mapping
From: Chetan Loke @ 2010-07-16 19:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Loke, Chetan, Ben Hutchings, netdev
In-Reply-To: <20100716113113.1e2b51c7@nehalam>

On Fri, Jul 16, 2010 at 2:31 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Fri, 16 Jul 2010 14:12:24 -0400
> "Loke, Chetan" <Chetan.Loke@netscout.com> wrote:
>
>> > -----Original Message-----
>> > From: Stephen Hemminger [mailto:shemminger@vyatta.com]
>> > Sent: July 16, 2010 2:04 PM
>>
>> > The additional API is not needed. It is trivial to find address for
>> > device and do reverse mapping. Either with ioctl's
>> Sorry, I might have missed it. But which ioctl would that be?
>
> Simple way:
>   Use SIOCGIFCONF to get list of interfaces
>   Use SIOCGIFHWADDR to read device addresss
>
And interfaces that don't have IP's?

^ permalink raw reply

* Re: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index> mapping
From: David Miller @ 2010-07-16 19:33 UTC (permalink / raw)
  To: chetanloke; +Cc: shemminger, Chetan.Loke, bhutchings, netdev
In-Reply-To: <AANLkTin3V7LtmhDAOktNF1UITM86sJwButxCQ3vo31lv@mail.gmail.com>

From: Chetan Loke <chetanloke@gmail.com>
Date: Fri, 16 Jul 2010 15:29:06 -0400

> On Fri, Jul 16, 2010 at 2:31 PM, Stephen Hemminger
> <shemminger@vyatta.com> wrote:
>> Simple way:
>>   Use SIOCGIFCONF to get list of interfaces
>>   Use SIOCGIFHWADDR to read device addresss
>>
> And interfaces that don't have IP's?

There is no requirement that an interface have configured IP addresses
in order to use those ioctl()'s.

You simply create an arbitrary socket, and use the resulting 'fd' to
run the ioctl's mentioned to fetch information about any and all
interfaces which exist in the system.

^ permalink raw reply

* Re: [RFC] LSM hook for post recvmsg.
From: David Miller @ 2010-07-16 19:35 UTC (permalink / raw)
  To: penguin-kernel; +Cc: netdev
In-Reply-To: <201007170114.GFC57373.SQJHOVtLFMOFFO@I-love.SAKURA.ne.jp>

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 17 Jul 2010 01:14:38 +0900

> Below is a patch for post recvmsg() operation. I modified the patch to call
> skb_recv_datagram() again (for udp_recvmsg(), raw_recvmsg(), udpv6_recvmsg())
> if LSM dicided to drop the message. (Regarding rawv6_recvmsg(), I didn't do so
> in accordance with the comment at "csum_copy_err:".)
> What do you think about this verion?

This looks fine, but regardless of that comment I think the IPV6 raw recvmsg()
should loop just as the IPV4 one does in your patch.

^ permalink raw reply

* Re: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index> mapping
From: Chetan Loke @ 2010-07-16 19:35 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, Chetan.Loke, bhutchings, netdev
In-Reply-To: <20100716.123306.232886890.davem@davemloft.net>

On Fri, Jul 16, 2010 at 3:33 PM, David Miller <davem@davemloft.net> wrote:
> From: Chetan Loke <chetanloke@gmail.com>
> Date: Fri, 16 Jul 2010 15:29:06 -0400
>
>> On Fri, Jul 16, 2010 at 2:31 PM, Stephen Hemminger
>> <shemminger@vyatta.com> wrote:
>>> Simple way:
>>>   Use SIOCGIFCONF to get list of interfaces
>>>   Use SIOCGIFHWADDR to read device addresss
>>>
>> And interfaces that don't have IP's?
>
> There is no requirement that an interface have configured IP addresses
> in order to use those ioctl()'s.
>
> You simply create an arbitrary socket, and use the resulting 'fd' to
> run the ioctl's mentioned to fetch information about any and all
> interfaces which exist in the system.
>

Yes, I opened a socket and then sent the IFCONF ioctl.What I meant was
that interfaces that didn't have an IP weren't returned.

^ permalink raw reply

* Re: [RFC] Enhance dev_ioctl to return <hwaddr>:<if_name::if_index> mapping
From: David Miller @ 2010-07-16 19:40 UTC (permalink / raw)
  To: chetanloke; +Cc: shemminger, Chetan.Loke, bhutchings, netdev
In-Reply-To: <AANLkTilhSSh1kE5ZfqVDbBJJWdP3hDcXRffpQwdyMQua@mail.gmail.com>

From: Chetan Loke <chetanloke@gmail.com>
Date: Fri, 16 Jul 2010 15:35:48 -0400

> Yes, I opened a socket and then sent the IFCONF ioctl.What I meant was
> that interfaces that didn't have an IP weren't returned.

If you use the correct set of ioctl()'s it will, just as
"/sbin/ifconfig -a" lists all interfaces regardless of whether they
have IP addresses.

Run strace on 'ifconfig', see what it does :-)

And then you also have the option of using netlink as well.
This is how "ip l l" lists all interfaces, also regardless of
configuration.

^ permalink raw reply

* Re: [PATCH 0/1] Reviewing batman-adv for net/
From: David Miller @ 2010-07-16 19:41 UTC (permalink / raw)
  To: sven.eckelmann; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1279291156-5297-1-git-send-email-sven.eckelmann@gmx.de>

So, what't the point of saying "PATCH 0/1" here if you don't
say "PATCH 1/1" in the actual patch posting? :-)

^ permalink raw reply

* Re: [PATCH 0/1] Reviewing batman-adv for net/
From: Sven Eckelmann @ 2010-07-16 19:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20100716.124105.124016801.davem@davemloft.net>

[-- Attachment #1: Type: Text/Plain, Size: 235 bytes --]

David Miller wrote:
> So, what't the point of saying "PATCH 0/1" here if you don't
> say "PATCH 1/1" in the actual patch posting? :-)

Sry, my fault. Happened when I removed another patch which wasn't for this ml.

Best regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [GIT PULL] vhost-net fixes
From: David Miller @ 2010-07-16 19:57 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20100716122530.GA29478@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 16 Jul 2010 15:25:30 +0300

> David, please pull the following fixes for 2.6.35.
> Thanks!
> 
> The following changes since commit 91a72a70594e5212c97705ca6a694bd307f7a26b:
> 
>   net/core: neighbour update Oops (2010-07-14 18:02:16 -0700)
> 
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
> 
> Michael S. Tsirkin (2):
>       vhost-net: avoid flush under lock
>       vhost: avoid pr_err on condition guest can trigger

Pulled, thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox