Netdev List
 help / color / mirror / Atom feed
* [PATCH] net: Fix wrong sizeof
From: Jean Delvare @ 2009-10-02  9:30 UTC (permalink / raw)
  To: LKML, netdev; +Cc: linux-doc, Randy Dunlap, stable

Which is why I have always preferred sizeof(struct foo) over
sizeof(var).

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
---
Stable team, the non-documentation part of this fix applies to 2.6.31,
2.6.30 and 2.6.27.

 Documentation/networking/timestamping/timestamping.c |    2 +-
 drivers/net/iseries_veth.c                           |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.32-rc1.orig/Documentation/networking/timestamping/timestamping.c	2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.32-rc1/Documentation/networking/timestamping/timestamping.c	2009-10-02 11:07:19.000000000 +0200
@@ -381,7 +381,7 @@ int main(int argc, char **argv)
 	memset(&hwtstamp, 0, sizeof(hwtstamp));
 	strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name));
 	hwtstamp.ifr_data = (void *)&hwconfig;
-	memset(&hwconfig, 0, sizeof(&hwconfig));
+	memset(&hwconfig, 0, sizeof(hwconfig));
 	hwconfig.tx_type =
 		(so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ?
 		HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
--- linux-2.6.32-rc1.orig/drivers/net/iseries_veth.c	2009-09-28 10:28:42.000000000 +0200
+++ linux-2.6.32-rc1/drivers/net/iseries_veth.c	2009-10-02 11:07:15.000000000 +0200
@@ -495,7 +495,7 @@ static void veth_take_cap_ack(struct vet
 			   cnx->remote_lp);
 	} else {
 		memcpy(&cnx->cap_ack_event, event,
-		       sizeof(&cnx->cap_ack_event));
+		       sizeof(cnx->cap_ack_event));
 		cnx->state |= VETH_STATE_GOTCAPACK;
 		veth_kick_statemachine(cnx);
 	}


-- 
Jean Delvare

^ permalink raw reply

* Re: [PATCH 03/31] mm: expose gfp_to_alloc_flags()
From: David Rientjes @ 2009-10-02  9:30 UTC (permalink / raw)
  To: Neil Brown
  Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <19141.35274.513790.845711@notabene.brown>

On Fri, 2 Oct 2009, Neil Brown wrote:

> So something like this?
> Then change every occurrence of
> +		if (!(gfp_to_alloc_flags(gfpflags) & ALLOC_NO_WATERMARKS))
> to
> +		if (!(gfp_has_no_watermarks(gfpflags)))
> 
> ??
> 

No, it's not even necessary to call gfp_to_alloc_flags() at all, just 
create a globally exported function such as can_alloc_use_reserves() and 
use it in gfp_to_alloc_flags().

 [ Using 'p' in gfp_to_alloc_flags() is actually wrong since
   test_thread_flag() only works on current anyway, so it would be
   inconsistent if p were set to anything other than current; we can
   get rid of that auto variable. ]

Something like the following, which you can fold into this patch proposal 
and modify later for GFP_MEMALLOC.

Signed-off-by: David Rientjes <rientjes@google.com>
---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 557bdad..7dd62a0 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -265,6 +265,8 @@ static inline void arch_free_page(struct page *page, int order) { }
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
 
+int can_alloc_use_reserves(void);
+
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 		       struct zonelist *zonelist, nodemask_t *nodemask);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf72055..cf1d765 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1744,10 +1744,19 @@ void wake_all_kswapd(unsigned int order, struct zonelist *zonelist,
 		wakeup_kswapd(zone, order);
 }
 
+/*
+ * Does the current context allow the allocation to utilize memory reserves
+ * by ignoring watermarks for all zones?
+ */
+int can_alloc_use_reserves(void)
+{
+	return !in_interrupt() && ((current->flags & PF_MEMALLOC) ||
+				   unlikely(test_thread_flag(TIF_MEMDIE)));
+}
+
 static inline int
 gfp_to_alloc_flags(gfp_t gfp_mask)
 {
-	struct task_struct *p = current;
 	int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 	const gfp_t wait = gfp_mask & __GFP_WAIT;
 
@@ -1769,15 +1778,12 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
 		 */
 		alloc_flags &= ~ALLOC_CPUSET;
-	} else if (unlikely(rt_task(p)))
+	} else if (unlikely(rt_task(current)))
 		alloc_flags |= ALLOC_HARDER;
 
-	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-		if (!in_interrupt() &&
-		    ((p->flags & PF_MEMALLOC) ||
-		     unlikely(test_thread_flag(TIF_MEMDIE))))
+	if (likely(!(gfp_mask & __GFP_NOMEMALLOC)))
+		if (can_alloc_use_reserves())
 			alloc_flags |= ALLOC_NO_WATERMARKS;
-	}
 
 	return alloc_flags;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC] netlink: add socket destruction notification
From: Johannes Berg @ 2009-10-02  8:44 UTC (permalink / raw)
  To: netdev; +Cc: Jouni Malinen, Thomas Graf

When we want to keep track of resources associated with applications, we
need to know when an app is going away. Add a notification function to
netlink that tells us that, and also hook it up to generic netlink so
generic netlink can notify the families. Due to the way generic netlink
works though, we need to notify all families and they have to sort out
whatever resources some commands associated with the socket themselves.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
 drivers/connector/connector.c       |    2 +-
 drivers/scsi/scsi_netlink.c         |    2 +-
 drivers/scsi/scsi_transport_iscsi.c |    2 +-
 include/linux/netlink.h             |    1 +
 include/net/genetlink.h             |    3 +++
 kernel/audit.c                      |    3 ++-
 lib/kobject_uevent.c                |    2 +-
 net/bridge/netfilter/ebt_ulog.c     |    2 +-
 net/core/rtnetlink.c                |    3 ++-
 net/decnet/netfilter/dn_rtmsg.c     |    2 +-
 net/ipv4/fib_frontend.c             |    2 +-
 net/ipv4/inet_diag.c                |    2 +-
 net/ipv4/netfilter/ip_queue.c       |    2 +-
 net/ipv4/netfilter/ipt_ULOG.c       |    6 +++---
 net/ipv6/netfilter/ip6_queue.c      |    2 +-
 net/netfilter/nfnetlink.c           |    2 +-
 net/netlink/af_netlink.c            |    6 ++++++
 net/netlink/genetlink.c             |   18 ++++++++++++++++--
 net/xfrm/xfrm_user.c                |    2 +-
 security/selinux/netlink.c          |    3 ++-
 20 files changed, 47 insertions(+), 20 deletions(-)

--- wireless-testing.orig/net/xfrm/xfrm_user.c	2009-09-23 10:10:41.000000000 +0200
+++ wireless-testing/net/xfrm/xfrm_user.c	2009-09-29 14:45:33.000000000 +0200
@@ -2605,7 +2605,7 @@ static int __net_init xfrm_user_net_init
 	struct sock *nlsk;
 
 	nlsk = netlink_kernel_create(net, NETLINK_XFRM, XFRMNLGRP_MAX,
-				     xfrm_netlink_rcv, NULL, THIS_MODULE);
+				     xfrm_netlink_rcv, NULL, NULL, THIS_MODULE);
 	if (nlsk == NULL)
 		return -ENOMEM;
 	rcu_assign_pointer(net->xfrm.nlsk, nlsk);
--- wireless-testing.orig/drivers/connector/connector.c	2009-09-29 12:26:17.000000000 +0200
+++ wireless-testing/drivers/connector/connector.c	2009-09-29 14:45:33.000000000 +0200
@@ -451,7 +451,7 @@ static int __devinit cn_init(void)
 
 	dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR,
 					 CN_NETLINK_USERS + 0xf,
-					 dev->input, NULL, THIS_MODULE);
+					 dev->input, NULL, NULL, THIS_MODULE);
 	if (!dev->nls)
 		return -EIO;
 
--- wireless-testing.orig/drivers/scsi/scsi_netlink.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/drivers/scsi/scsi_netlink.c	2009-09-29 14:45:33.000000000 +0200
@@ -496,7 +496,7 @@ scsi_netlink_init(void)
 
 	scsi_nl_sock = netlink_kernel_create(&init_net, NETLINK_SCSITRANSPORT,
 				SCSI_NL_GRP_CNT, scsi_nl_rcv_msg, NULL,
-				THIS_MODULE);
+				NULL, THIS_MODULE);
 	if (!scsi_nl_sock) {
 		printk(KERN_ERR "%s: register of recieve handler failed\n",
 				__func__);
--- wireless-testing.orig/drivers/scsi/scsi_transport_iscsi.c	2009-09-29 12:26:46.000000000 +0200
+++ wireless-testing/drivers/scsi/scsi_transport_iscsi.c	2009-09-29 14:45:33.000000000 +0200
@@ -2082,7 +2082,7 @@ static __init int iscsi_transport_init(v
 		goto unregister_conn_class;
 
 	nls = netlink_kernel_create(&init_net, NETLINK_ISCSI, 1, iscsi_if_rx,
-				    NULL, THIS_MODULE);
+				    NULL, NULL, THIS_MODULE);
 	if (!nls) {
 		err = -ENOBUFS;
 		goto unregister_session_class;
--- wireless-testing.orig/kernel/audit.c	2009-09-29 12:27:01.000000000 +0200
+++ wireless-testing/kernel/audit.c	2009-09-29 14:45:33.000000000 +0200
@@ -970,7 +970,8 @@ static int __init audit_init(void)
 	printk(KERN_INFO "audit: initializing netlink socket (%s)\n",
 	       audit_default ? "enabled" : "disabled");
 	audit_sock = netlink_kernel_create(&init_net, NETLINK_AUDIT, 0,
-					   audit_receive, NULL, THIS_MODULE);
+					   audit_receive, NULL, NULL,
+					   THIS_MODULE);
 	if (!audit_sock)
 		audit_panic("cannot initialize netlink socket");
 	else
--- wireless-testing.orig/lib/kobject_uevent.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/lib/kobject_uevent.c	2009-09-29 14:45:33.000000000 +0200
@@ -322,7 +322,7 @@ EXPORT_SYMBOL_GPL(add_uevent_var);
 static int __init kobject_uevent_init(void)
 {
 	uevent_sock = netlink_kernel_create(&init_net, NETLINK_KOBJECT_UEVENT,
-					    1, NULL, NULL, THIS_MODULE);
+					    1, NULL, NULL, NULL, THIS_MODULE);
 	if (!uevent_sock) {
 		printk(KERN_ERR
 		       "kobject_uevent: unable to create netlink socket!\n");
--- wireless-testing.orig/net/bridge/netfilter/ebt_ulog.c	2009-09-29 12:27:03.000000000 +0200
+++ wireless-testing/net/bridge/netfilter/ebt_ulog.c	2009-09-29 14:45:33.000000000 +0200
@@ -304,7 +304,7 @@ static int __init ebt_ulog_init(void)
 
 	ebtulognl = netlink_kernel_create(&init_net, NETLINK_NFLOG,
 					  EBT_ULOG_MAXNLGROUPS, NULL, NULL,
-					  THIS_MODULE);
+					  NULL, THIS_MODULE);
 	if (!ebtulognl) {
 		printk(KERN_WARNING KBUILD_MODNAME ": out of memory trying to "
 		       "call netlink_kernel_create\n");
--- wireless-testing.orig/net/core/rtnetlink.c	2009-09-29 12:27:04.000000000 +0200
+++ wireless-testing/net/core/rtnetlink.c	2009-09-29 14:45:33.000000000 +0200
@@ -1360,7 +1360,8 @@ static int rtnetlink_net_init(struct net
 {
 	struct sock *sk;
 	sk = netlink_kernel_create(net, NETLINK_ROUTE, RTNLGRP_MAX,
-				   rtnetlink_rcv, &rtnl_mutex, THIS_MODULE);
+				   rtnetlink_rcv, NULL,
+				   &rtnl_mutex, THIS_MODULE);
 	if (!sk)
 		return -ENOMEM;
 	net->rtnl = sk;
--- wireless-testing.orig/net/decnet/netfilter/dn_rtmsg.c	2009-09-23 10:10:41.000000000 +0200
+++ wireless-testing/net/decnet/netfilter/dn_rtmsg.c	2009-09-29 14:45:33.000000000 +0200
@@ -128,7 +128,7 @@ static int __init dn_rtmsg_init(void)
 
 	dnrmg = netlink_kernel_create(&init_net,
 				      NETLINK_DNRTMSG, DNRNG_NLGRP_MAX,
-				      dnrmg_receive_user_skb,
+				      dnrmg_receive_user_skb, NULL,
 				      NULL, THIS_MODULE);
 	if (dnrmg == NULL) {
 		printk(KERN_ERR "dn_rtmsg: Cannot create netlink socket");
--- wireless-testing.orig/net/ipv4/fib_frontend.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/net/ipv4/fib_frontend.c	2009-09-29 14:45:33.000000000 +0200
@@ -879,7 +879,7 @@ static int nl_fib_lookup_init(struct net
 {
 	struct sock *sk;
 	sk = netlink_kernel_create(net, NETLINK_FIB_LOOKUP, 0,
-				   nl_fib_input, NULL, THIS_MODULE);
+				   nl_fib_input, NULL, NULL, THIS_MODULE);
 	if (sk == NULL)
 		return -EAFNOSUPPORT;
 	net->ipv4.fibnl = sk;
--- wireless-testing.orig/net/ipv4/inet_diag.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/net/ipv4/inet_diag.c	2009-09-29 14:45:33.000000000 +0200
@@ -924,7 +924,7 @@ static int __init inet_diag_init(void)
 		goto out;
 
 	idiagnl = netlink_kernel_create(&init_net, NETLINK_INET_DIAG, 0,
-					inet_diag_rcv, NULL, THIS_MODULE);
+					inet_diag_rcv, NULL, NULL, THIS_MODULE);
 	if (idiagnl == NULL)
 		goto out_free_table;
 	err = 0;
--- wireless-testing.orig/net/ipv4/netfilter/ip_queue.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/net/ipv4/netfilter/ip_queue.c	2009-09-29 14:45:33.000000000 +0200
@@ -578,7 +578,7 @@ static int __init ip_queue_init(void)
 
 	netlink_register_notifier(&ipq_nl_notifier);
 	ipqnl = netlink_kernel_create(&init_net, NETLINK_FIREWALL, 0,
-				      ipq_rcv_skb, NULL, THIS_MODULE);
+				      ipq_rcv_skb, NULL, NULL, THIS_MODULE);
 	if (ipqnl == NULL) {
 		printk(KERN_ERR "ip_queue: failed to create netlink socket\n");
 		goto cleanup_netlink_notifier;
--- wireless-testing.orig/net/ipv4/netfilter/ipt_ULOG.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/net/ipv4/netfilter/ipt_ULOG.c	2009-09-29 14:45:33.000000000 +0200
@@ -400,9 +400,9 @@ static int __init ulog_tg_init(void)
 	for (i = 0; i < ULOG_MAXNLGROUPS; i++)
 		setup_timer(&ulog_buffers[i].timer, ulog_timer, i);
 
-	nflognl = netlink_kernel_create(&init_net,
-					NETLINK_NFLOG, ULOG_MAXNLGROUPS, NULL,
-					NULL, THIS_MODULE);
+	nflognl = netlink_kernel_create(&init_net, NETLINK_NFLOG,
+					ULOG_MAXNLGROUPS, NULL,
+					NULL, NULL, THIS_MODULE);
 	if (!nflognl)
 		return -ENOMEM;
 
--- wireless-testing.orig/net/ipv6/netfilter/ip6_queue.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/net/ipv6/netfilter/ip6_queue.c	2009-09-29 14:45:33.000000000 +0200
@@ -580,7 +580,7 @@ static int __init ip6_queue_init(void)
 
 	netlink_register_notifier(&ipq_nl_notifier);
 	ipqnl = netlink_kernel_create(&init_net, NETLINK_IP6_FW, 0,
-			              ipq_rcv_skb, NULL, THIS_MODULE);
+				      ipq_rcv_skb, NULL, NULL, THIS_MODULE);
 	if (ipqnl == NULL) {
 		printk(KERN_ERR "ip6_queue: failed to create netlink socket\n");
 		goto cleanup_netlink_notifier;
--- wireless-testing.orig/net/netfilter/nfnetlink.c	2009-09-29 12:27:12.000000000 +0200
+++ wireless-testing/net/netfilter/nfnetlink.c	2009-09-29 14:45:33.000000000 +0200
@@ -196,7 +196,7 @@ static int __init nfnetlink_init(void)
 	printk("Netfilter messages via NETLINK v%s.\n", nfversion);
 
 	nfnl = netlink_kernel_create(&init_net, NETLINK_NETFILTER, NFNLGRP_MAX,
-				     nfnetlink_rcv, NULL, THIS_MODULE);
+				     nfnetlink_rcv, NULL, NULL, THIS_MODULE);
 	if (!nfnl) {
 		printk(KERN_ERR "cannot initialize nfnetlink!\n");
 		return -ENOMEM;
--- wireless-testing.orig/net/netlink/genetlink.c	2009-09-29 12:27:12.000000000 +0200
+++ wireless-testing/net/netlink/genetlink.c	2009-09-29 14:45:33.000000000 +0200
@@ -561,6 +561,20 @@ static void genl_rcv(struct sk_buff *skb
 	genl_unlock();
 }
 
+static void genl_destruct(struct sock *sk)
+{
+	struct genl_family *f;
+	int idx;
+
+	genl_lock();
+
+	for (idx = 0; idx < GENL_FAM_TAB_SIZE; idx++)
+		list_for_each_entry(f, &family_ht[idx], family_list)
+			if (f->destruct_sk)
+				f->destruct_sk(sk);
+	genl_unlock();
+}
+
 /**************************************************************************
  * Controller
  **************************************************************************/
@@ -852,8 +866,8 @@ static int __net_init genl_pernet_init(s
 {
 	/* we'll bump the group number right afterwards */
 	net->genl_sock = netlink_kernel_create(net, NETLINK_GENERIC, 0,
-					       genl_rcv, &genl_mutex,
-					       THIS_MODULE);
+					       genl_rcv, genl_destruct,
+					       &genl_mutex, THIS_MODULE);
 
 	if (!net->genl_sock && net_eq(net, &init_net))
 		panic("GENL: Cannot initialize generic netlink\n");
--- wireless-testing.orig/security/selinux/netlink.c	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/security/selinux/netlink.c	2009-09-29 14:45:33.000000000 +0200
@@ -106,7 +106,8 @@ void selnl_notify_policyload(u32 seqno)
 static int __init selnl_init(void)
 {
 	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX,
-				      SELNLGRP_MAX, NULL, NULL, THIS_MODULE);
+				      SELNLGRP_MAX, NULL, NULL, NULL,
+				      THIS_MODULE);
 	if (selnl == NULL)
 		panic("SELinux:  Cannot create netlink socket.");
 	netlink_set_nonroot(NETLINK_SELINUX, NL_NONROOT_RECV);
--- wireless-testing.orig/include/linux/netlink.h	2009-09-29 12:26:58.000000000 +0200
+++ wireless-testing/include/linux/netlink.h	2009-09-29 14:45:33.000000000 +0200
@@ -182,6 +182,7 @@ extern void netlink_table_ungrab(void);
 extern struct sock *netlink_kernel_create(struct net *net,
 					  int unit,unsigned int groups,
 					  void (*input)(struct sk_buff *skb),
+					  void (*destruct)(struct sock *sk),
 					  struct mutex *cb_mutex,
 					  struct module *module);
 extern void netlink_kernel_release(struct sock *sk);
--- wireless-testing.orig/include/net/genetlink.h	2009-09-23 10:10:42.000000000 +0200
+++ wireless-testing/include/net/genetlink.h	2009-09-29 14:45:33.000000000 +0200
@@ -30,6 +30,8 @@ struct genl_multicast_group
  * @maxattr: maximum number of attributes supported
  * @netnsok: set to true if the family can handle network
  *	namespaces and should be presented in all of them
+ * @destruct_sk: called when any generic netlink socket
+ *	is destroyed (e.g. by the application closing it)
  * @attrbuf: buffer to store parsed attributes
  * @ops_list: list of all assigned operations
  * @family_list: family list
@@ -43,6 +45,7 @@ struct genl_family
 	unsigned int		version;
 	unsigned int		maxattr;
 	bool			netnsok;
+	void			(*destruct_sk)(struct sock *sk);
 	struct nlattr **	attrbuf;	/* private */
 	struct list_head	ops_list;	/* private */
 	struct list_head	family_list;	/* private */
--- wireless-testing.orig/net/netlink/af_netlink.c	2009-09-29 12:27:12.000000000 +0200
+++ wireless-testing/net/netlink/af_netlink.c	2009-09-29 14:45:33.000000000 +0200
@@ -80,6 +80,7 @@ struct netlink_sock {
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;
 	void			(*netlink_rcv)(struct sk_buff *skb);
+	void			(*destruct)(struct sock *sk);
 	struct module		*module;
 };
 
@@ -166,6 +167,9 @@ static void netlink_sock_destruct(struct
 		return;
 	}
 
+	if (nlk->destruct)
+		nlk->destruct(sk);
+
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
 	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
 	WARN_ON(nlk_sk(sk)->groups);
@@ -1464,6 +1468,7 @@ static void netlink_data_ready(struct so
 struct sock *
 netlink_kernel_create(struct net *net, int unit, unsigned int groups,
 		      void (*input)(struct sk_buff *skb),
+		      void (*destruct)(struct sock *sk),
 		      struct mutex *cb_mutex, struct module *module)
 {
 	struct socket *sock;
@@ -1502,6 +1507,7 @@ netlink_kernel_create(struct net *net, i
 	sk->sk_data_ready = netlink_data_ready;
 	if (input)
 		nlk_sk(sk)->netlink_rcv = input;
+	nlk_sk(sk)->destruct = destruct;
 
 	if (netlink_insert(sk, net, 0))
 		goto out_sock_release;



^ permalink raw reply

* Re: [PATCH] ipvs: Add boundary check on ioctl arguments
From: Julian Anastasov @ 2009-10-02  8:35 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Hannes Eder, Wensong Zhang, netdev, linux-kernel, Simon Horman
In-Reply-To: <20090930171833.5ce0011d@infradead.org>


	Hello,

On Wed, 30 Sep 2009, Arjan van de Ven wrote:

> fair enough; updated patch below

	OK, you can add my signed-off line after changing
'cmd > ...MAX + 1' to 'cmd > ...MAX' at both
places, nf_sockopt_ops ranges are [optmin ... optmax)

May be comments should be changed because:

- i'm not the author but after ispection we do not see any holes,
we do not want users to upgrade just for this change
- the cmd checks are just to help code checking tools
- the len checks should help programmers (may be BUG_ON is
better, user does not deserve EINVAL for wrong set_arglen/get_arglen).
Checks for *len and len are not needed.

	For example, for len checks this should be enough, before
copy_from_user():

in do_ip_vs_get_ctl check can be
	BUG_ON(get_arglen[GET_CMDID(cmd)] > sizeof(arg));

in do_ip_vs_set_ctl check can be
	BUG_ON(set_arglen[SET_CMDID(cmd)] > sizeof(arg));

Acked-by: Julian Anastasov <ja@ssi.bg>

> >From 28ae217858e683c0c94c02219d46a9a9c87f61c6 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Wed, 30 Sep 2009 13:05:51 +0200
> Subject: [PATCH] ipvs: Add boundary check on ioctl arguments
> 
> The ipvs code has a nifty system for doing the size of ioctl command copies;
> it defines an array with values into which it indexes the cmd to find the
> right length.
> 
> Unfortunately, the ipvs code forgot to check if the cmd was in the range
> that the array provides, allowing for an index outside of the array,
> which then gives a "garbage" result into the length, which then gets
> used for copying into a stack buffer.
> 
> Fix this by adding sanity checks on these as well as the copy size.
> 
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  net/netfilter/ipvs/ip_vs_ctl.c |   14 +++++++++++++-
>  1 files changed, 13 insertions(+), 1 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index ac624e5..7adc876 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -2077,6 +2077,10 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
>  	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
>  
> +	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX + 1)
> +		return -EINVAL;
> +	if (len < 0 || len >  sizeof(arg))
> +		return -EINVAL;
>  	if (len != set_arglen[SET_CMDID(cmd)]) {
>  		pr_err("set_ctl: len %u != %u\n",
>  		       len, set_arglen[SET_CMDID(cmd)]);
> @@ -2353,17 +2357,25 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
>  {
>  	unsigned char arg[128];
>  	int ret = 0;
> +	unsigned int copylen;
>  
>  	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
>  
> +	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX + 1)
> +		return -EINVAL;
> +
>  	if (*len < get_arglen[GET_CMDID(cmd)]) {
>  		pr_err("get_ctl: len %u < %u\n",
>  		       *len, get_arglen[GET_CMDID(cmd)]);
>  		return -EINVAL;
>  	}
>  
> -	if (copy_from_user(arg, user, get_arglen[GET_CMDID(cmd)]) != 0)
> +	copylen = get_arglen[GET_CMDID(cmd)];
> +	if (copylen > sizeof(arg))
> +		return -EINVAL;
> +
> +	if (copy_from_user(arg, user, copylen) != 0)
>  		return -EFAULT;
>  
>  	if (mutex_lock_interruptible(&__ip_vs_mutex))

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH 00/31] Swap over NFS -v20
From: Suresh Jayaraman @ 2009-10-02  8:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm, netdev,
	Neil Brown, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <20091001174201.GA30068@infradead.org>

Christoph Hellwig wrote:
> On Thu, Oct 01, 2009 at 07:34:18PM +0530, Suresh Jayaraman wrote:
> 
> The other really big one is adding a proper method for safe, page-backed
> kernelspace I/O on files.  That is not something like the grotty
> swap-tied address_space operations in this patch, but more something in

I'm not sure I understood about what problems you see with the proposed
address_space operations. Could you please elaborate a bit more?

> the direction of the kernel direct I/O patches from Jenx Axboe he did
> for using in the loop driver.  But even those aren't complete as they
> don't touch the locking issue yet.
> 

Thanks,

-- 
Suresh Jayaraman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Network hangs with 2.6.30.5
From: Ilpo Järvinen @ 2009-10-02  8:11 UTC (permalink / raw)
  To: David Miller
  Cc: jarkao2, holger.hoffstaette, Netdev, eric.dumazet,
	Evgeniy Polyakov
In-Reply-To: <20091001.154913.88345178.davem@davemloft.net>

On Thu, 1 Oct 2009, David Miller wrote:

> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 7 Sep 2009 07:21:43 +0000
> 
> > While Eric is analyzing your data, I guess you could try reverting
> > some stuff around this tcp_tw_recycle, and my tcp ignorance would
> > point these commits for the beginning:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.30.y.git;a=commitdiff;h=fc1ad92dfc4e363a055053746552cdb445ba5c57
> > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.30.y.git;a=commitdiff;h=c887e6d2d9aee56ee7c9f2af4cec3a5efdcc4c72
> 
> Ilpo's cleanup (the second commit listed) looks most likely to
> be a possibility.
> 
> But I surely cannot find any bugs in it, even after studying it
> a few times.
> 
> Ilpo could you audit it one more time for us just in case?

Argh, not that one ...the jungle of negations. But I'll try to go it 
through once more but I tell you I did go through those negations multiple 
times already before submitting it :-).

> I also looked through all the TCP commits in 2.6.29 to 2.6.30
> and I could not find anything else that might cause stalls with
> time-wait recycled connections.

What about the more than 64k connections change a9d8f9110d7e953c2f2 (or 
its fixes), it might be another possibility? ...It certainly does 
something related to reuse and happens to be in the correct time frame... 
(I've added Evgeniy).

-- 
 i.

^ permalink raw reply

* Re: [PATCH 03/31] mm: expose gfp_to_alloc_flags()
From: Suresh Jayaraman @ 2009-10-02  8:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm, netdev,
	Neil Brown, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <alpine.DEB.1.00.0910011355230.32006@chino.kir.corp.google.com>

David Rientjes wrote:
> On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
> 
>> From: Peter Zijlstra <a.p.zijlstra@chello.nl> 
>>
>> Expose the gfp to alloc_flags mapping, so we can use it in other parts
>> of the vm.
>>
>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
> 
> Nack, these flags are internal to the page allocator and exporting them to 
> generic VM code is unnecessary.

Yes, you're right.

> The only bit you actually use in your patchset is ALLOC_NO_WATERMARKS to 
> determine whether a particular allocation can use memory reserves.  I'd 
> suggest adding a bool function that returns whether the current context is 
> given access to reserves including your new __GFP_MEMALLOC flag and 
> exporting that instead.

Makes sense and Neil already posted a patch citing the suggested
changes, will incorporate the change.

Thanks,

-- 
Suresh Jayaraman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: SPLICE_F_NONBLOCK semantics...
From: Jens Axboe @ 2009-10-02  7:47 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, eric.dumazet, jgunthorpe, vl, opurdila, netdev,
	linux-kernel
In-Reply-To: <20091001.152717.187318570.davem@davemloft.net>

On Thu, Oct 01 2009, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Thu, 1 Oct 2009 15:21:44 -0700 (PDT)
> 
> > On Thu, 1 Oct 2009, David Miller wrote:
> >> 
> >> It depends upon our interpretation of how you intended the
> >> SPLICE_F_NONBLOCK flag to work when you added it way back
> >> when.
> >> 
> >> Linus introduced  SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
> >> (splice: add SPLICE_F_NONBLOCK flag )
> >> 
> >>   It doesn't make the splice itself necessarily nonblocking (because the
> >>   actual file descriptors that are spliced from/to may block unless they
> >>   have the O_NONBLOCK flag set), but it makes the splice pipe operations
> >>   nonblocking.
> >> 
> >> Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
> > 
> > Ack. The original intent was for the flag to affect the buffering, not the 
> > end points.
> 
> Great, thanks for reviewing.
> 
> > Although the more I think about it, the more I suspect that the
> > whole NONBLOCK thing should probably have been two bits, and simply
> > been about "nonblocking input" vs "nonblocking output" (so that you
> > could control both sides on a call-by-call basis).
> 
> I think we could still extend things in this way if we wanted to.
> So if you specify the explicit input and/or output nonblock flag,
> it takes precedence over the SPLICE_F_NONBLOCK thing.

Yes I agree, thank god for having a 'flags' parameter for the syscalls
:-). I'll make a note to add and test bidirectional nonblock hints.

The net patch looks fine and correct to me, feel free to add my acked-by
if you want.

-- 
Jens Axboe


^ permalink raw reply

* Re: 2.6.32-rc1-git2: Reported regressions from 2.6.31
From: Jaswinder Singh Rajput @ 2009-10-02  7:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <9UCePxij8cB.A.VCG.-3SxKB@chimera>

Hello Rafael,

On Thu, 2009-10-01 at 21:26 +0200, Rafael J. Wysocki wrote:
> [Notes:
> 
>  * Here's the first summary report of known regressions from 2.6.31.  There's
>    not too many of them at the moment, which is nice.
> 
>  * We're still getting quite a number of reports of regressions from 2.6.30 and
>    it's been that way since 2.6.31 was released.  For details please see the
>    summary report of regressions 2.6.30 -> 2.6.31 that will follow shortly.]
> 
> This message contains a list of some regressions from 2.6.31, for which there
> are no fixes in the mainline I know of.  If any of them have been fixed already,
> please let me know.
> 
> If you know of any other unresolved regressions from 2.6.31, please let me know
> either and I'll add them to the list.  Also, please let me know if any of the
> entries below are invalid.
> 
> Each entry from the list will be sent additionally in an automatic reply to
> this message with CCs to the people involved in reporting and handling the
> issue.
> 
> 
> Listed regressions statistics:
> 
>   Date          Total  Pending  Unresolved
>   ----------------------------------------
>   2009-10-02       22       15           9
> 
> 
> Unresolved regressions
> ----------------------
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=14299
> Subject		: oops in wireless, iwl3945 related?
> Submitter	: Pavel Machek <pavel@ucw.cz>
> Date		: 2009-09-29 17:12 (3 days old)
> References	: http://marc.info/?l=linux-kernel&m=125424439725743&w=4
> 

If you add one more entry say "Suspected commit :" then it will be great
and will solve regressions much faster. You can request submitter to
submit 'suspected commit' by git bisect and also specify git bisect
links like : (for more information about git bisect check
http://kerneltrap.org/node/11753)

Thanks,
--
JSR

^ permalink raw reply

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Jarek Poplawski @ 2009-10-02  7:32 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, kaber, netdev
In-Reply-To: <20091002070819.GA9694@ff.dom.local>

On Fri, Oct 02, 2009 at 07:08:19AM +0000, Jarek Poplawski wrote:
> On 01-10-2009 23:21, Jarek Poplawski wrote:
...
> To make my point clare: [...]

Am I clair? ;-)

Jarek P.

^ permalink raw reply

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Jarek Poplawski @ 2009-10-02  7:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, kaber, netdev
In-Reply-To: <4AC5A7F9.3000005@gmail.com>

On Fri, Oct 02, 2009 at 09:12:57AM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> 
> > To make my point clare: why not something like this?:
> > 
> > static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
> >                          u32 pid, u32 seq, u16 flags, int event)
> > {
> > 	...
> > 	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
> > 	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
> >              gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
> >             gnet_stats_copy_queue(&d, &q->qstats) < 0)
> >                 goto nla_put_failure;
> > 
> > BTW, I'm not sure we need to chanage user visible API for this.
> > (Is it really expected to work after updating gen_stats.h only in
> > iproute?)
> > 
> 
> Thats would be better indeed, do you want to work on it or let me do it ?

I want you work on it.

Thanks,
Jarek P.

^ permalink raw reply

* [net-2.6 PATCH] e1000e/igb/ixgbe: Don't report an error if devices don't support AER
From: Jeff Kirsher @ 2009-10-02  7:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Frans Pop, Jeff Kirsher

From: Frans Pop <elendil@planet.nl>

The only error returned by pci_{en,dis}able_pcie_error_reporting() is
-EIO which simply means that Advanced Error Reporting is not supported.
There is no need to report that, so remove the error check from e1000e,
igb and ixgbe.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/netdev.c    |   13 ++-----------
 drivers/net/igb/igb_main.c     |   13 ++-----------
 drivers/net/ixgbe/ixgbe_main.c |   13 ++-----------
 3 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 16c193a..0687c6a 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4982,12 +4982,7 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 		goto err_pci_reg;
 
 	/* AER (Advanced Error Reporting) hooks */
-	err = pci_enable_pcie_error_reporting(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "pci_enable_pcie_error_reporting failed "
-		        "0x%x\n", err);
-		/* non-fatal, continue */
-	}
+	pci_enable_pcie_error_reporting(pdev);
 
 	pci_set_master(pdev);
 	/* PCI config space info */
@@ -5263,7 +5258,6 @@ static void __devexit e1000_remove(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
-	int err;
 
 	/*
 	 * flush_scheduled work may reschedule our watchdog task, so
@@ -5299,10 +5293,7 @@ static void __devexit e1000_remove(struct pci_dev *pdev)
 	free_netdev(netdev);
 
 	/* AER disable */
-	err = pci_disable_pcie_error_reporting(pdev);
-	if (err)
-		dev_err(&pdev->dev,
-		        "pci_disable_pcie_error_reporting failed 0x%x\n", err);
+	pci_disable_pcie_error_reporting(pdev);
 
 	pci_disable_device(pdev);
 }
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 5d6c153..714c3a4 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1246,12 +1246,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	if (err)
 		goto err_pci_reg;
 
-	err = pci_enable_pcie_error_reporting(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "pci_enable_pcie_error_reporting failed "
-		        "0x%x\n", err);
-		/* non-fatal, continue */
-	}
+	pci_enable_pcie_error_reporting(pdev);
 
 	pci_set_master(pdev);
 	pci_save_state(pdev);
@@ -1628,7 +1623,6 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
-	int err;
 
 	/* flush_scheduled work may reschedule our watchdog task, so
 	 * explicitly disable watchdog tasks from being rescheduled  */
@@ -1682,10 +1676,7 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 
 	free_netdev(netdev);
 
-	err = pci_disable_pcie_error_reporting(pdev);
-	if (err)
-		dev_err(&pdev->dev,
-		        "pci_disable_pcie_error_reporting failed 0x%x\n", err);
+	pci_disable_pcie_error_reporting(pdev);
 
 	pci_disable_device(pdev);
 }
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 1cbc6a3..28fbb9d 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -5507,12 +5507,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 		goto err_pci_reg;
 	}
 
-	err = pci_enable_pcie_error_reporting(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "pci_enable_pcie_error_reporting failed "
-		                    "0x%x\n", err);
-		/* non-fatal, continue */
-	}
+	pci_enable_pcie_error_reporting(pdev);
 
 	pci_set_master(pdev);
 	pci_save_state(pdev);
@@ -5821,7 +5816,6 @@ static void __devexit ixgbe_remove(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int err;
 
 	set_bit(__IXGBE_DOWN, &adapter->state);
 	/* clear the module not found bit to make sure the worker won't
@@ -5872,10 +5866,7 @@ static void __devexit ixgbe_remove(struct pci_dev *pdev)
 
 	free_netdev(netdev);
 
-	err = pci_disable_pcie_error_reporting(pdev);
-	if (err)
-		dev_err(&pdev->dev,
-		        "pci_disable_pcie_error_reporting failed 0x%x\n", err);
+	pci_disable_pcie_error_reporting(pdev);
 
 	pci_disable_device(pdev);
 }


^ permalink raw reply related

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Eric Dumazet @ 2009-10-02  7:12 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, kaber, netdev
In-Reply-To: <20091002070819.GA9694@ff.dom.local>

Jarek Poplawski a écrit :

> To make my point clare: why not something like this?:
> 
> static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
>                          u32 pid, u32 seq, u16 flags, int event)
> {
> 	...
> 	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
> 	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
>              gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
>             gnet_stats_copy_queue(&d, &q->qstats) < 0)
>                 goto nla_put_failure;
> 
> BTW, I'm not sure we need to chanage user visible API for this.
> (Is it really expected to work after updating gen_stats.h only in
> iproute?)
> 

Thats would be better indeed, do you want to work on it or let me do it ?

Thanks

^ permalink raw reply

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Jarek Poplawski @ 2009-10-02  7:08 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, kaber, netdev
In-Reply-To: <4AC51D3D.8010700@gmail.com>

On 01-10-2009 23:21, Jarek Poplawski wrote:
> David Miller wrote, On 10/01/2009 11:14 PM:
> 
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Thu, 01 Oct 2009 23:05:53 +0200
>>
>>> Since you ask... I wonder about this whole int plus quite a bit of
>>> struct unreadability for one flag only. Maybe it could be queried
>>> on qdisc level (with a flag if necessary), and additional parameter
>>> of gnet_stats_copy_rate_est()? (Qdiscs should have no problem with
>>> setting this param for their classes too.)
>> Certainly, that's another approach to this problem.
>>
>> But logically, just like we wouldn't emit a block of RED scheduler
>> data to 'tc' unless RED is actually configured, it seems consistent to
>> not emit estimator data when no estimator is even there.
> 
> Sure! I've exaggerated with this additional parameter. ;-)

To make my point clare: why not something like this?:

static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
                         u32 pid, u32 seq, u16 flags, int event)
{
	...
	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
             gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
            gnet_stats_copy_queue(&d, &q->qstats) < 0)
                goto nla_put_failure;

BTW, I'm not sure we need to chanage user visible API for this.
(Is it really expected to work after updating gen_stats.h only in
iproute?)

Jarek P.

^ permalink raw reply

* Re: [Question]: reqsk table size limited to 16?
From: Eric Dumazet @ 2009-10-02  6:50 UTC (permalink / raw)
  To: Gerrit Renker, netdev
In-Reply-To: <20091002062532.GA15755@gerrit.erg.abdn.ac.uk>

Gerrit Renker a écrit :
> Please forget the posting, this is correct; the clamping is
> 
>   8 <= nr_table_entries <=  sysctl_max_syn_backlog,
> 
> i.e. the minimum table size is 16.
>

Yes, agreed, 8+1 -> 16


^ permalink raw reply

* Re: [Question]: reqsk table size limited to 16?
From: Eric Dumazet @ 2009-10-02  6:49 UTC (permalink / raw)
  To: Gerrit Renker, netdev
In-Reply-To: <20091002061134.GC5646@gerrit.erg.abdn.ac.uk>

Gerrit Renker a écrit :
> Can someone please have a look, it may be that I am missing something?
> 
> It seems that in the following the maximum number of table entries is set
> to always 16, despite sysctl_max_syn_backlog (tcp_max_syn_backlog), 
> overriding the 'backlog' parameter to listen(2).

False alarm ;)

> 
> net/core/request_sock.c
> -----------------------
> 
> int reqsk_queue_alloc(struct request_sock_queue *queue,
>                       unsigned int nr_table_entries)
> {
>         size_t lopt_size = sizeof(struct listen_sock);
>         struct listen_sock *lopt;
> 
> 	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);

Here we take the _minimum_ value.
If you have  nr_table_entries=4096 and sysctl_max_syn_backlog=1024,
result is 1024

>         nr_table_entries = max_t(u32, nr_table_entries, 8);

Here we take the _maximum_ value of nr_table_entries and 8

-> 1024

Deal is : We want at least 8 slots, even if users called listen(fd, 1);

(Later, user can change its mind and call listen(fd, 1024).

We dont resize hashtable yet, so we guarantee at least 8 slots fot pathological cases.

>         nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
> 
> 	//...
> 	for (lopt->max_qlen_log = 3;
>              (1 << lopt->max_qlen_log) < nr_table_entries;
>              lopt->max_qlen_log++);
> 
>  	//...
> 	lopt->nr_table_entries = nr_table_entries;
> 	
> 	//...
> 	return 0
> }
> 
> The function is called with an argument 'nr_table_entries', which is then clamped as
> 
>    sysctl_max_syn_backlog <= nr_table_entries <= 8
> 
> If nr_table_entries = 8, then round_pow_of_two(8 + 1) = 16.
> 
> The sysctl value is set to a much higher value (default 128 or 1024, net/ipv4/tcp.c).
> 
> The reqsk_queue_alloc() gets 'nr_table_entries' passed directly from inet_csk_listen_start(),
> which in turn gets its 'nr_table_entries' as the 'backlog' argument to listen(2) via
>  * net/dccp/proto.c   (dccp_listen_start) or
>  * net/ipv4/af_inet.c (inet_listen).


^ permalink raw reply

* [BUG net-2.6] bluetooth/rfcomm : sleeping function called from invalid context at mm/slub.c:1719
From: Oliver Hartkopp @ 2009-10-02  6:28 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: Linux Netdev List, linux-bluetooth-u79uwXL29TY76Z2rM5mHXA

Hello Marcel,

with current net-2.6 tree ...

While starting my PPP Bluetooth dialup networking, i got this:

[  722.461549] PPP generic driver version 2.4.2
[  722.477519] BUG: sleeping function called from invalid context at
mm/slub.c:1719
[  722.477530] in_atomic(): 1, irqs_disabled(): 0, pid: 4677, name: pppd
[  722.477537] 3 locks held by pppd/4677:
[  722.477542]  #0:  (rfcomm_mutex){+.+.+.}, at: [<fa5df2a1>]
rfcomm_dlc_open+0x28/0x2d6 [rfcomm]
[  722.477568]  #1:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at:
[<fa5414f8>] l2cap_sock_connect+0x62/0x2c6 [l2cap]
[  722.477589]  #2:  (&hdev->lock){+...+.}, at: [<fa5415b4>]
l2cap_sock_connect+0x11e/0x2c6 [l2cap]
[  722.477613] Pid: 4677, comm: pppd Not tainted 2.6.31-08939-gdb8abec-dirty #21
[  722.477619] Call Trace:
[  722.477633]  [<c1042a2b>] ? __debug_show_held_locks+0x1e/0x20
[  722.477644]  [<c10212a1>] __might_sleep+0xc9/0xce
[  722.477655]  [<c1078b62>] __kmalloc+0x6d/0xfb
[  722.477666]  [<c119e739>] ? kzalloc+0xb/0xd
[  722.477674]  [<c119e739>] kzalloc+0xb/0xd
[  722.477683]  [<c119ef1a>] device_private_init+0x15/0x3d
[  722.477693]  [<c11a0e1b>] dev_set_drvdata+0x18/0x26
[  722.477718]  [<f8b7ca1b>] hci_conn_init_sysfs+0x3d/0xc7 [bluetooth]
[  722.477737]  [<f8b791b3>] hci_conn_add+0x1c0/0x1d5 [bluetooth]
[  722.477756]  [<f8b79360>] hci_connect+0x71/0x17d [bluetooth]
[  722.477769]  [<fa54162c>] l2cap_sock_connect+0x196/0x2c6 [l2cap]
[  722.477782]  [<c1246e3d>] kernel_connect+0xd/0x12
[  722.477795]  [<fa5df3c3>] rfcomm_dlc_open+0x14a/0x2d6 [rfcomm]
[  722.477810]  [<fa5e10fa>] ? rfcomm_tty_open+0x73/0x227 [rfcomm]
[  722.477825]  [<fa5e1130>] rfcomm_tty_open+0xa9/0x227 [rfcomm]
[  722.477836]  [<c1022e3f>] ? default_wake_function+0x0/0xd
[  722.477847]  [<c1180c79>] tty_open+0x29e/0x399
[  722.477858]  [<c107e9bd>] chrdev_open+0x13f/0x156
[  722.477868]  [<c107b0d3>] __dentry_open+0x11b/0x20f
[  722.477878]  [<c107b261>] nameidata_to_filp+0x2c/0x43
[  722.477888]  [<c107e87e>] ? chrdev_open+0x0/0x156
[  722.477898]  [<c1084e9e>] do_filp_open+0x3c6/0x70a
[  722.477910]  [<c108d3e4>] ? alloc_fd+0xc8/0xd2
[  722.477920]  [<c108d3e4>] ? alloc_fd+0xc8/0xd2
[  722.477930]  [<c107aebc>] do_sys_open+0x4a/0xe7
[  722.477940]  [<c1002acc>] ? restore_all_notrace+0x0/0x18
[  722.477950]  [<c107af9b>] sys_open+0x1e/0x26
[  722.477959]  [<c1002a18>] sysenter_do_call+0x12/0x36
[  729.658613] PPP BSD Compression module registered
[  729.684789] PPP Deflate Compression module registered

Any idea?

Regards,
Oliver

^ permalink raw reply

* Re: [Question]: reqsk table size limited to 16?
From: Gerrit Renker @ 2009-10-02  6:25 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20091002061134.GC5646@gerrit.erg.abdn.ac.uk>

Please forget the posting, this is correct; the clamping is

  8 <= nr_table_entries <=  sysctl_max_syn_backlog,

i.e. the minimum table size is 16.

Quoting Gerrit:
| Can someone please have a look, it may be that I am missing something?
| 
| It seems that in the following the maximum number of table entries is set
| to always 16, despite sysctl_max_syn_backlog (tcp_max_syn_backlog), 
| overriding the 'backlog' parameter to listen(2).
| 
| net/core/request_sock.c
| -----------------------
| 
| int reqsk_queue_alloc(struct request_sock_queue *queue,
|                       unsigned int nr_table_entries)
| {
|         size_t lopt_size = sizeof(struct listen_sock);
|         struct listen_sock *lopt;
| 
| 	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
|         nr_table_entries = max_t(u32, nr_table_entries, 8);
|         nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
| 
| 	//...
| 	for (lopt->max_qlen_log = 3;
|              (1 << lopt->max_qlen_log) < nr_table_entries;
|              lopt->max_qlen_log++);
| 
|  	//...
| 	lopt->nr_table_entries = nr_table_entries;
| 	
| 	//...
| 	return 0
| }
| 
| The function is called with an argument 'nr_table_entries', which is then clamped as
| 
|    sysctl_max_syn_backlog <= nr_table_entries <= 8
| 
| If nr_table_entries = 8, then round_pow_of_two(8 + 1) = 16.
| 
| The sysctl value is set to a much higher value (default 128 or 1024, net/ipv4/tcp.c).
| 
| The reqsk_queue_alloc() gets 'nr_table_entries' passed directly from inet_csk_listen_start(),
| which in turn gets its 'nr_table_entries' as the 'backlog' argument to listen(2) via
|  * net/dccp/proto.c   (dccp_listen_start) or
|  * net/ipv4/af_inet.c (inet_listen).

-- 

^ permalink raw reply

* Re: [PATCH] connector: Fix regression introduced by sid connector
From: Christian Borntraeger @ 2009-10-02  6:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: oleg, scott, zbr, linux-kernel, matthltc, davem, netdev
In-Reply-To: <20091001141426.2c1a0139.akpm@linux-foundation.org>

Sorry about that. Dont know how this escaped.  It was probably hiding between
all the sparse warnings I get in kernel/* and a lack of knowledge in this
area.
Here is a new version:



since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add event 
for process becoming session leader) we have the following warning:
Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([<000000013fe04100>] 0x13fe04100)
 [<000000000048a946>] sk_filter+0x9a/0xd0
 [<000000000049d938>] netlink_broadcast+0x2c0/0x53c
 [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
 [<00000000003baef0>] proc_sid_connector+0xc4/0xd4
 [<0000000000142604>] __set_special_pids+0x58/0x90
 [<0000000000159938>] sys_setsid+0xb4/0xd8
 [<00000000001187fe>] sysc_noemu+0x10/0x16
 [<00000041616cb266>] 0x41616cb266

The warning is
--->    WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling
the connector. 
After a discussion we agreed that we can move proc_sid_connector
from __set_special_pids to sys_setsid.
We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CCed: Scott James Remnant <scott@ubuntu.com>
CCed: Matt Helsley <matthltc@us.ibm.com>
CCed: David S. Miller <davem@davemloft.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
---
 kernel/exit.c |    4 +---
 kernel/sys.c  |    2 ++
 2 files changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/exit.c
===================================================================
--- linux-2.6.orig/kernel/exit.c
+++ linux-2.6/kernel/exit.c
@@ -359,10 +359,8 @@ void __set_special_pids(struct pid *pid)
 {
 	struct task_struct *curr = current->group_leader;
 
-	if (task_session(curr) != pid) {
+	if (task_session(curr) != pid)
 		change_pid(curr, PIDTYPE_SID, pid);
-		proc_sid_connector(curr);
-	}
 
 	if (task_pgrp(curr) != pid)
 		change_pid(curr, PIDTYPE_PGID, pid);
Index: linux-2.6/kernel/sys.c
===================================================================
--- linux-2.6.orig/kernel/sys.c
+++ linux-2.6/kernel/sys.c
@@ -1110,6 +1110,8 @@ SYSCALL_DEFINE0(setsid)
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
+	if (err > 0)
+		proc_sid_connector(group_leader);
 	return err;
 }
 

^ permalink raw reply

* [Question]: reqsk table size limited to 16?
From: Gerrit Renker @ 2009-10-02  6:11 UTC (permalink / raw)
  To: netdev

Can someone please have a look, it may be that I am missing something?

It seems that in the following the maximum number of table entries is set
to always 16, despite sysctl_max_syn_backlog (tcp_max_syn_backlog), 
overriding the 'backlog' parameter to listen(2).

net/core/request_sock.c
-----------------------

int reqsk_queue_alloc(struct request_sock_queue *queue,
                      unsigned int nr_table_entries)
{
        size_t lopt_size = sizeof(struct listen_sock);
        struct listen_sock *lopt;

	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
        nr_table_entries = max_t(u32, nr_table_entries, 8);
        nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);

	//...
	for (lopt->max_qlen_log = 3;
             (1 << lopt->max_qlen_log) < nr_table_entries;
             lopt->max_qlen_log++);

 	//...
	lopt->nr_table_entries = nr_table_entries;
	
	//...
	return 0
}

The function is called with an argument 'nr_table_entries', which is then clamped as

   sysctl_max_syn_backlog <= nr_table_entries <= 8

If nr_table_entries = 8, then round_pow_of_two(8 + 1) = 16.

The sysctl value is set to a much higher value (default 128 or 1024, net/ipv4/tcp.c).

The reqsk_queue_alloc() gets 'nr_table_entries' passed directly from inet_csk_listen_start(),
which in turn gets its 'nr_table_entries' as the 'backlog' argument to listen(2) via
 * net/dccp/proto.c   (dccp_listen_start) or
 * net/ipv4/af_inet.c (inet_listen).

^ permalink raw reply

* [PATCH] cnic: Fix NETDEV_UP event processing.
From: Michael Chan @ 2009-10-02  6:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, michaelc, Michael Chan, Benjamin Li

This fixes the problem of not handling the NETDEV_UP event properly
during hot-plug or modprobe of bnx2 after cnic.  The handling was
skipped by mistakenly using "else if" to check for the event.

Also update version to 2.0.1.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/cnic.c    |    3 ++-
 drivers/net/cnic_if.h |    4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
index 211c8e9..46c87ec 100644
--- a/drivers/net/cnic.c
+++ b/drivers/net/cnic.c
@@ -2733,7 +2733,8 @@ static int cnic_netdev_event(struct notifier_block *this, unsigned long event,
 			cnic_ulp_init(dev);
 		else if (event == NETDEV_UNREGISTER)
 			cnic_ulp_exit(dev);
-		else if (event == NETDEV_UP) {
+
+		if (event == NETDEV_UP) {
 			if (cnic_register_netdev(dev) != 0) {
 				cnic_put(dev);
 				goto done;
diff --git a/drivers/net/cnic_if.h b/drivers/net/cnic_if.h
index a492357..d8b09ef 100644
--- a/drivers/net/cnic_if.h
+++ b/drivers/net/cnic_if.h
@@ -12,8 +12,8 @@
 #ifndef CNIC_IF_H
 #define CNIC_IF_H
 
-#define CNIC_MODULE_VERSION	"2.0.0"
-#define CNIC_MODULE_RELDATE	"May 21, 2009"
+#define CNIC_MODULE_VERSION	"2.0.1"
+#define CNIC_MODULE_RELDATE	"Oct 01, 2009"
 
 #define CNIC_ULP_RDMA		0
 #define CNIC_ULP_ISCSI		1
-- 
1.6.4.GIT



^ permalink raw reply related

* Re: [PATCH] Use sk_mark for routing lookup in more places
From: Eric Dumazet @ 2009-10-02  6:08 UTC (permalink / raw)
  Cc: David Miller, atis, panther, netdev
In-Reply-To: <4AC58C46.8080408@gmail.com>

Eric Dumazet a écrit :
> Here is a followup on this area, thanks.
> 
> [RFC] af_packet: fill skb->mark at xmit
> 
> skb->mark may be used by classifiers, so fill it in case user 
> set a SO_MARK option on socket.
> 

Maybe a more generic way to handle this for various protocols
would be to fill skb->mark in sock_alloc_send_pskb()



^ permalink raw reply

* Re: query: adding a sysctl
From: Stephen Hemminger @ 2009-10-02  5:57 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev
In-Reply-To: <4AC57AC5.3080703@gmail.com>

On Fri, 02 Oct 2009 00:00:05 -0400
William Allen Simpson <william.allen.simpson@gmail.com> wrote:

> [My first post here, hopefully not a FAQ, as I've googled it, but cannot find
> the definitive answer.]
> 
> I've been trying to add a sysctl, and I've noticed this message:
> 
> sysctl table check failed: /net/ipv4/tcp_cookie_size .3.5.126 Unknown sysctl binary path
> 
> I modeled the code on sysctl_tcp_syncookies, and apparently I'm missing some
> additional magic?  Or does something need to be done other than C?

The sysctl table check code is kernel/sysctl.c, it maps numerical
sysctl values to /proc paths so that the permissions checks on the numeric
sysctl match those of the /proc file involved.

Hint: the easiest way to find things out is to use git grep
to see how any related sysctl is implemented.

BUT numbered sysctl values are deprecated and should no longer be added.
The current way is to use CTL_UNNUMBERED instead, if you use CTL_UNNUMBERED
then the table does not need to be changed.

-- 

^ permalink raw reply

* Re: [PATCH 00/31] Swap over NFS -v20
From: Neil Brown @ 2009-10-02  5:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <20091001174201.GA30068@infradead.org>

On Thursday October 1, hch@infradead.org wrote:
> 
> The other really big one is adding a proper method for safe, page-backed
> kernelspace I/O on files.  That is not something like the grotty
> swap-tied address_space operations in this patch, but more something in
> the direction of the kernel direct I/O patches from Jenx Axboe he did
> for using in the loop driver.  But even those aren't complete as they
> don't touch the locking issue yet.

Do you have a problem with the proposed address_space operations apart
from their names including the word "swap"?  Would something like:
  direct_on, direct_off, direct_read, direct_write
be better.
Semantics being that the read and write:
  - bypass the page cache (invalidation is up to caller)
  - must not make a blocking non-emergency memory allocation
direct_on does any pre-allocation and pre-reading to ensure those
semantics and be provided.

I have wondered if an extra flag along the lines of "I don't care
about this data after a crash" would be useful.
It would be set for swap, but not set for other users.  Thus
e.g. RAID1 could easily avoid resyncing an area that was used only for
swap.

The only thing of Jens' that I could find used bmap - is there
something more recent I should look for?

> 
> Especially the latter is an absolutely essential step to make any
> progress here, and an excellent patch series of it's own as there are
> multiple users for this, like making swap safe on btrfs files, making
> the MD bitmap code actually safe or improving the loop driver.

100% agree.

Thanks,
NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/5] Implement loss counting on TFRC-SP receiver
From: Gerrit Renker @ 2009-10-01 20:40 UTC (permalink / raw)
  To: Ivo Calado; +Cc: dccp, netdev
In-Reply-To: <cb00fa210909231843q7f13b2c3i32672e883a017b7b@mail.gmail.com>

| >> The following code would be correct then?
| >>
| >>	 if ((len <= 0) ||
| >>	     (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval)))
| > {
| >> +		 cur->li_losses += rh->num_losses;
| >> + 		 rh->num_losses  = 0;
| >> 		 return false;
| >> With this change I suppose the could be fixed. With that, the
| >> rh->num_losses couldn't added twice. Am I correct?
| >>
| >>
| > The function tfrc_lh_interval_add() is called when
| >  * __two_after_loss() returns true (a new loss is detected) or
| >  * a data packet is ECN-CE marked.
| >
| > I am still not sure about the 'len <= 0' case; this would be true
| > if an ECN-marked packet arrives whose sequence number is 'before'
| > the start of the current loss interval, or if a loss is detected
| > which is older than the start of the current loss interval.
| >
| > The other case (tfrc_lh_closed_check) returns 1 if the current loss
| > interval is 'closed' according to RFC 4342, 10.2.
| >
| > Intuitively, in the first case it refers to the preceding loss
| > interval (i.e. not cur->...), in the second case it seems correct.
| >
| > Doing the first case is complicated due to going back in history.
| > The simplest solution I can think of at the moment is to ignore
| > the exception-case of reordered packets and do something like
| >
| >  if (len <= 0) {
| >     /* FIXME: this belongs into the previous loss interval */
| >     tfrc_pr_debug("Warning: ignoring loss due to reordering");
| > 	return false;
| > }
| >  if (!tfrc_lh_closed_check(...)) {
| >     // your code from above
| > }
| 
| Okay, i'll add your sugestion. But i don't know how this would be fixed at all.
|
If it doesn't we will just do another iteration and fix it.



| > So it is necessary to decide whether to go the full way, which means
| >  * support Loss Intervals and Dropped Packets alike
| >  * modify TFRC library (it will be a redesign)
| >  * modify receiver code
| >  * modify sender code,
| >    or to use the present approach where
| >  * the receiver computes the Loss Rate and
| >  * a Mandatory Send Loss Event Rate feature is present during feature
| >    negotiation, to avoid problems with incompatible senders
| >   (there is a comment explaining this, in net/dccp/feat.c).
| >
| > Thoughts?
| 
<snip>

| I believe that the first way is better (to "support Loss Intervals and
| Dropped Packets alike..."), because RFC requires loss intervals option
| to be sent. And so, proceed and implement dropped packets option for
| TFRC-SP. You are right, this would need a redesign and rewrite of
| sender and receiver code.
| 
Agree, then let's do that. It requires some coordination on how to arrange
the patches, but we can simplify the process by using the test tree to 
store all intermediate results (i.e. use a separate tree for the rewrite
until it is sufficiently stable/useful).

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox