Netdev List

Netdev List
 help / color / mirror / Atom feed

* How does SACK or FACK determine the time to start fast retransmition?
From: LovelyLich @ 2012-06-20 14:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

HI all,
    When tcp uses reno as its congestion control algothim, it uses
tp->sacked_out as dup-ack. When the third dup-ack(under default
condition) comes, tcp will initiate its fast retransmition.
    But how about sack ?
    According to kernel source code comments, when sack or fack tcp option
is enabled, there is no dup-ack counter. See comments for function
tcp_dupack_heuristics():
http://lxr.linux.no/linux+v2.6.37/net/ipv4/tcp_input.c#L2300
    So , how does tcp know the current dup-ack is the last one which
triggers the fast retransmition?

    According to rfc3517 section 5:
    "Upon the receipt of the first (DupThresh - 1) duplicate ACKs, the
scoreboard is to be updated as normal."
    "When a TCP sender receives the duplicate ACK corresponding to
DupThresh ACKs,
the scoreboard MUST be updated with the new SACK information (via
Update ()). If no previous loss event has occurred
on the connection or the cumulative acknowledgment point is beyond
the last value of RecoveryPoint, a loss recovery phase SHOULD be
initiated, per the fast retransmit algorithm outlined in [RFC2581]."

    But these sentences doesn't describe how tcp knows the current ack
is the dup-threshold dup-ack.

    Accorrding to rfc3517 seciton 4 and isLost(Seqnum) function:
    "The routine returns true when either
DupThresh discontiguous SACKed sequences have arrived above
’SeqNum’ or (DupThresh * SMSS) bytes with sequence numbers greater
than ’SeqNum’ have been SACKed. Otherwise, the routine returns
false."
    I think this is just what I am searching for, but I still don't know
which line of code in Linux tcp protocol does this check.
    Can any one help me ? thks in advance.

^ permalink raw reply

* Re: [PATCH net-next 2/6] bnx2x: link cleanup
From: Joe Perches @ 2012-06-20 14:53 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: davem, netdev, eilong, Yaniv Rosner
In-Reply-To: <1340182175-916-3-git-send-email-yuvalmin@broadcom.com>

On Wed, 2012-06-20 at 11:49 +0300, Yuval Mintz wrote:
> This patch does several things:
[]
>  3. Change msleep(1) --> usleep_range(1000, 1000)

I believe replacing msleep(small) with
usleep_range(small * 1000, small * 1000) is
not generally a good idea.

Please give usleep_range an actual range to
work with and not a repeated single value.

Please think a little more about what a
good upper range for the maximum time to
sleep should be.

usleep_range(small * 1000, small * 2000)
or something similar maybe.

^ permalink raw reply

* [PATCH net-next] ipv4: tcp: dont cache output dst for syncookies
From: Eric Dumazet @ 2012-06-20 15:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Hans Schillstrom

From: Eric Dumazet <edumazet@google.com>

Don't cache output dst for syncookies, as this adds pressure on IP route
cache and rcu subsystem for no gain.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/flow.h                 |    1 +
 include/net/inet_connection_sock.h |    3 ++-
 net/dccp/ipv4.c                    |    2 +-
 net/ipv4/inet_connection_sock.c    |    8 ++++++--
 net/ipv4/route.c                   |    5 ++++-
 net/ipv4/tcp_ipv4.c                |   12 +++++++-----
 6 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 6c469db..bd524f5 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -22,6 +22,7 @@ struct flowi_common {
 #define FLOWI_FLAG_ANYSRC		0x01
 #define FLOWI_FLAG_PRECOW_METRICS	0x02
 #define FLOWI_FLAG_CAN_SLEEP		0x04
+#define FLOWI_FLAG_RT_NOCACHE		0x08
 	__u32	flowic_secid;
 };
 
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index e1b7734..af3c743 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -251,7 +251,8 @@ extern int inet_csk_get_port(struct sock *sk, unsigned short snum);
 
 extern struct dst_entry* inet_csk_route_req(struct sock *sk,
 					    struct flowi4 *fl4,
-					    const struct request_sock *req);
+					    const struct request_sock *req,
+					    bool nocache);
 extern struct dst_entry* inet_csk_route_child_sock(struct sock *sk,
 						   struct sock *newsk,
 						   const struct request_sock *req);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 07f5579..3eb76b5 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -504,7 +504,7 @@ static int dccp_v4_send_response(struct sock *sk, struct request_sock *req,
 	struct dst_entry *dst;
 	struct flowi4 fl4;
 
-	dst = inet_csk_route_req(sk, &fl4, req);
+	dst = inet_csk_route_req(sk, &fl4, req, false);
 	if (dst == NULL)
 		goto out;
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f9ee741..034ddbe 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -368,17 +368,21 @@ EXPORT_SYMBOL(inet_csk_reset_keepalive_timer);
 
 struct dst_entry *inet_csk_route_req(struct sock *sk,
 				     struct flowi4 *fl4,
-				     const struct request_sock *req)
+				     const struct request_sock *req,
+				     bool nocache)
 {
 	struct rtable *rt;
 	const struct inet_request_sock *ireq = inet_rsk(req);
 	struct ip_options_rcu *opt = inet_rsk(req)->opt;
 	struct net *net = sock_net(sk);
+	int flags = inet_sk_flowi_flags(sk) & ~FLOWI_FLAG_PRECOW_METRICS;
 
+	if (nocache)
+		flags |= FLOWI_FLAG_RT_NOCACHE;
 	flowi4_init_output(fl4, sk->sk_bound_dev_if, sk->sk_mark,
 			   RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE,
 			   sk->sk_protocol,
-			   inet_sk_flowi_flags(sk) & ~FLOWI_FLAG_PRECOW_METRICS,
+			   flags,
 			   (opt && opt->opt.srr) ? opt->opt.faddr : ireq->rmt_addr,
 			   ireq->loc_addr, ireq->rmt_port, inet_sk(sk)->inet_sport);
 	security_req_classify_flow(req, flowi4_to_flowi(fl4));
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a91f6d3..8d62d85 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1156,7 +1156,7 @@ restart:
 	candp = NULL;
 	now = jiffies;
 
-	if (!rt_caching(dev_net(rt->dst.dev))) {
+	if (!rt_caching(dev_net(rt->dst.dev)) || (rt->dst.flags & DST_NOCACHE)) {
 		/*
 		 * If we're not caching, just tell the caller we
 		 * were successful and don't touch the route.  The
@@ -2582,6 +2582,9 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 
 	rt_set_nexthop(rth, fl4, res, fi, type, 0);
 
+	if (fl4->flowi4_flags & FLOWI_FLAG_RT_NOCACHE)
+		rth->dst.flags |= DST_NOCACHE;
+
 	return rth;
 }
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13857df..6abc0fd 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -825,7 +825,8 @@ static void tcp_v4_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
 static int tcp_v4_send_synack(struct sock *sk, struct dst_entry *dst,
 			      struct request_sock *req,
 			      struct request_values *rvp,
-			      u16 queue_mapping)
+			      u16 queue_mapping,
+			      bool nocache)
 {
 	const struct inet_request_sock *ireq = inet_rsk(req);
 	struct flowi4 fl4;
@@ -833,7 +834,7 @@ static int tcp_v4_send_synack(struct sock *sk, struct dst_entry *dst,
 	struct sk_buff * skb;
 
 	/* First, grab a route. */
-	if (!dst && (dst = inet_csk_route_req(sk, &fl4, req)) == NULL)
+	if (!dst && (dst = inet_csk_route_req(sk, &fl4, req, nocache)) == NULL)
 		return -1;
 
 	skb = tcp_make_synack(sk, dst, req, rvp);
@@ -855,7 +856,7 @@ static int tcp_v4_rtx_synack(struct sock *sk, struct request_sock *req,
 			      struct request_values *rvp)
 {
 	TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_RETRANSSEGS);
-	return tcp_v4_send_synack(sk, NULL, req, rvp, 0);
+	return tcp_v4_send_synack(sk, NULL, req, rvp, 0, false);
 }
 
 /*
@@ -1388,7 +1389,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		 */
 		if (tmp_opt.saw_tstamp &&
 		    tcp_death_row.sysctl_tw_recycle &&
-		    (dst = inet_csk_route_req(sk, &fl4, req)) != NULL &&
+		    (dst = inet_csk_route_req(sk, &fl4, req, want_cookie)) != NULL &&
 		    fl4.daddr == saddr &&
 		    (peer = rt_get_peer((struct rtable *)dst, fl4.daddr)) != NULL) {
 			inet_peer_refcheck(peer);
@@ -1424,7 +1425,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 
 	if (tcp_v4_send_synack(sk, dst, req,
 			       (struct request_values *)&tmp_ext,
-			       skb_get_queue_mapping(skb)) ||
+			       skb_get_queue_mapping(skb),
+			       want_cookie) ||
 	    want_cookie)
 		goto drop_and_free;
 

^ permalink raw reply related

* Re: [PATCH 01/12] netvm: Prevent a stream-specific deadlock
From: Rik van Riel @ 2012-06-20 15:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-2-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> It could happen that all !SOCK_MEMALLOC sockets have buffered so
> much data that we're over the global rmem limit. This will prevent
> SOCK_MEMALLOC buffers from receiving data, which will prevent userspace
> from running, which is needed to reduce the buffered data.
>
> Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.
> Once this change it applied, it is important that sockets that set
> SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
> If this happens, a warning is generated and the tokens reclaimed to
> avoid accounting errors until the bug is fixed.
>
> [davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Acked-by: David S. Miller<davem@davemloft.net>

Acked-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC net-next 07/14] Fix intel/ixgbe
From: John Fastabend @ 2012-06-20 15:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Duyck, Yuval Mintz, netdev, davem, eilong, Jeff Kirsher
In-Reply-To: <1340182514.4604.843.camel@edumazet-glaptop>

On 6/20/2012 1:55 AM, Eric Dumazet wrote:
> On Tue, 2012-06-19 at 08:54 -0700, Alexander Duyck wrote:
>
>> This patch doesn't limit the number of queues.  It is limiting the
>> number of interrupts.  The two are not directly related as we can
>> support multiple queues per interrupt.
>>
>> Also this change assumes we are only using receive side scaling.  We
>> have other features such as DCB, FCoE, and Flow Director which require
>> additional queues.
>>
>
> Yet, it would be good if ixgbe doesnt allocate 36 queues on a 4 cpu
> machine.
>
> "tc -s class show dev eth0" output is full of not used classes.
>
>
>

We do this for the DCB/FCoE/RSS/Flow Director case where we want to
use multiple queues per traffic class (802.1Qaz). As it is now we
have to set the max queues at alloc_etherdev_mq() time so we use a
max of
	(num_cpu * max traffic classes) + num_cpu

The last num_cpu is in error and I have a patch in JeffK's tree to
remove this. In many cases it seems excessive but sometimes it is
helpful.

.John

^ permalink raw reply

* [patch net-next 0/2] team: two RCU fixups
From: Jiri Pirko @ 2012-06-20 15:31 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jbrouer, paulmck, wfg

Jiri Pirko (2):
  team: use rcu_access_pointer to access RCU pointer by writer
  team: use RCU_INIT_POINTER for NULL assignment of RCU pointer

 drivers/net/team/team_mode_activebackup.c |    7 +++++--
 drivers/net/team/team_mode_loadbalance.c  |   10 ++++++----
 2 files changed, 11 insertions(+), 6 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [patch net-next 2/2] team: use RCU_INIT_POINTER for NULL assignment of RCU pointer
From: Jiri Pirko @ 2012-06-20 15:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jbrouer, paulmck, wfg
In-Reply-To: <1340206321-5986-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/team/team_mode_loadbalance.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
index b4475a5..c385b45 100644
--- a/drivers/net/team/team_mode_loadbalance.c
+++ b/drivers/net/team/team_mode_loadbalance.c
@@ -97,7 +97,7 @@ static void lb_tx_hash_to_port_mapping_null_port(struct team *team,
 
 		pm = &lb_priv->ex->tx_hash_to_port_mapping[i];
 		if (rcu_access_pointer(pm->port) == port) {
-			rcu_assign_pointer(pm->port, NULL);
+			RCU_INIT_POINTER(pm->port, NULL);
 			team_option_inst_set_change(pm->opt_inst_info);
 			changed = true;
 		}
-- 
1.7.10.4

^ permalink raw reply related

* [patch net-next 1/2] team: use rcu_access_pointer to access RCU pointer by writer
From: Jiri Pirko @ 2012-06-20 15:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jbrouer, paulmck, wfg
In-Reply-To: <1340206321-5986-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/team/team_mode_activebackup.c |    7 +++++--
 drivers/net/team/team_mode_loadbalance.c  |    8 +++++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/team/team_mode_activebackup.c b/drivers/net/team/team_mode_activebackup.c
index 2fe02a8..c9e7621 100644
--- a/drivers/net/team/team_mode_activebackup.c
+++ b/drivers/net/team/team_mode_activebackup.c
@@ -61,8 +61,11 @@ static void ab_port_leave(struct team *team, struct team_port *port)
 
 static int ab_active_port_get(struct team *team, struct team_gsetter_ctx *ctx)
 {
-	if (ab_priv(team)->active_port)
-		ctx->data.u32_val = ab_priv(team)->active_port->dev->ifindex;
+	struct team_port *active_port;
+
+	active_port = rcu_access_pointer(ab_priv(team)->active_port);
+	if (active_port)
+		ctx->data.u32_val = active_port->dev->ifindex;
 	else
 		ctx->data.u32_val = 0;
 	return 0;
diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
index 45cc095..b4475a5 100644
--- a/drivers/net/team/team_mode_loadbalance.c
+++ b/drivers/net/team/team_mode_loadbalance.c
@@ -96,7 +96,7 @@ static void lb_tx_hash_to_port_mapping_null_port(struct team *team,
 		struct lb_port_mapping *pm;
 
 		pm = &lb_priv->ex->tx_hash_to_port_mapping[i];
-		if (pm->port == port) {
+		if (rcu_access_pointer(pm->port) == port) {
 			rcu_assign_pointer(pm->port, NULL);
 			team_option_inst_set_change(pm->opt_inst_info);
 			changed = true;
@@ -292,7 +292,7 @@ static int lb_bpf_func_set(struct team *team, struct team_gsetter_ctx *ctx)
 	if (lb_priv->ex->orig_fprog) {
 		/* Clear old filter data */
 		__fprog_destroy(lb_priv->ex->orig_fprog);
-		sk_unattached_filter_destroy(lb_priv->fp);
+		sk_unattached_filter_destroy(rcu_access_pointer(lb_priv->fp));
 	}
 
 	rcu_assign_pointer(lb_priv->fp, fp);
@@ -303,9 +303,11 @@ static int lb_bpf_func_set(struct team *team, struct team_gsetter_ctx *ctx)
 static int lb_tx_method_get(struct team *team, struct team_gsetter_ctx *ctx)
 {
 	struct lb_priv *lb_priv = get_lb_priv(team);
+	lb_select_tx_port_func_t *func;
 	char *name;
 
-	name = lb_select_tx_port_get_name(lb_priv->select_tx_port_func);
+	func = rcu_access_pointer(lb_priv->select_tx_port_func);
+	name = lb_select_tx_port_get_name(func);
 	BUG_ON(!name);
 	ctx->data.str_val = name;
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 02/12] selinux: tag avc cache alloc as non-critical
From: Rik van Riel @ 2012-06-20 15:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-3-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> Failing to allocate a cache entry will only harm performance not
> correctness.  Do not consume valuable reserve pages for something
> like that.
>
> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Acked-by: Eric Paris<eparis@redhat.com>

Acked-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 03/12] mm: Methods for teaching filesystems about PG_swapcache pages
From: Rik van Riel @ 2012-06-20 15:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-4-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> In order to teach filesystems to handle swap cache pages, three new
> page functions are introduced:
>
>    pgoff_t page_file_index(struct page *);
>    loff_t page_file_offset(struct page *);
>    struct address_space *page_file_mapping(struct page *);
>
> page_file_index() - gives the offset of this page in the file in
> PAGE_CACHE_SIZE blocks. Like page->index is for mapped pages, this
> function also gives the correct index for PG_swapcache pages.
>
> page_file_offset() - uses page_file_index(), so that it will give
> the expected result, even for PG_swapcache pages.
>
> page_file_mapping() - gives the mapping backing the actual page;
> that is for swap cache pages it will give swap_file->f_mapping.
>
> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] can: c_can_pci: fix compilation on non HAVE_CLK archs
From: Marc Kleine-Budde @ 2012-06-20 15:58 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Marc Kleine-Budde, Federico Vaga

In commit:

  5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI

the c_can_pci driver has been added. It uses clk_*() functions
resulting in a link error on archs without clock support. This
patch removed these clk_() functions as these parts of the driver
are not tested.

Cc: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/c_can/c_can_pci.c |   29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
index 914aecf..1011146 100644
--- a/drivers/net/can/c_can/c_can_pci.c
+++ b/drivers/net/can/c_can/c_can_pci.c
@@ -13,7 +13,6 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
-#include <linux/clk.h>
 #include <linux/pci.h>
 
 #include <linux/can/dev.h>
@@ -30,7 +29,7 @@ struct c_can_pci_data {
 	enum c_can_dev_id type;
 	/* Set the register alignment in the memory */
 	enum c_can_pci_reg_align reg_align;
-	/* Set the frequency if clk is not usable */
+	/* Set the frequency */
 	unsigned int freq;
 };
 
@@ -71,7 +70,6 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 	struct c_can_priv *priv;
 	struct net_device *dev;
 	void __iomem *addr;
-	struct clk *clk;
 	int ret;
 
 	ret = pci_enable_device(pdev);
@@ -113,18 +111,11 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 	priv->base = addr;
 
 	if (!c_can_pci_data->freq) {
-		/* get the appropriate clk */
-		clk = clk_get(&pdev->dev, NULL);
-		if (IS_ERR(clk)) {
-			dev_err(&pdev->dev, "no clock defined\n");
-			ret = -ENODEV;
-			goto out_free_c_can;
-		}
-		priv->can.clock.freq = clk_get_rate(clk);
-		priv->priv = clk;
+		dev_err(&pdev->dev, "no clock frequency defined\n");
+		ret = -ENODEV;
+		goto out_free_c_can;
 	} else {
 		priv->can.clock.freq = c_can_pci_data->freq;
-		priv->priv = NULL;
 	}
 
 	/* Configure CAN type */
@@ -138,7 +129,7 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 		break;
 	default:
 		ret = -EINVAL;
-		goto out_free_clock;
+		goto out_free_c_can;
 	}
 
 	/* Configure access to registers */
@@ -153,14 +144,14 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 		break;
 	default:
 		ret = -EINVAL;
-		goto out_free_clock;
+		goto out_free_c_can;
 	}
 
 	ret = register_c_can_dev(dev);
 	if (ret) {
 		dev_err(&pdev->dev, "registering %s failed (err=%d)\n",
 			KBUILD_MODNAME, ret);
-		goto out_free_clock;
+		goto out_free_c_can;
 	}
 
 	dev_dbg(&pdev->dev, "%s device registered (regs=%p, irq=%d)\n",
@@ -168,9 +159,6 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 
 	return 0;
 
-out_free_clock:
-	if (priv->priv)
-		clk_put(priv->priv);
 out_free_c_can:
 	pci_set_drvdata(pdev, NULL);
 	free_c_can_dev(dev);
@@ -193,9 +181,6 @@ static void __devexit c_can_pci_remove(struct pci_dev *pdev)
 
 	unregister_c_can_dev(dev);
 
-	if (priv->priv)
-		clk_put(priv->priv);
-
 	pci_set_drvdata(pdev, NULL);
 	free_c_can_dev(dev);
 
-- 
1.7.10

^ permalink raw reply related

* Re: [patch net-next 1/2] team: use rcu_access_pointer to access RCU pointer by writer
From: Eric Dumazet @ 2012-06-20 16:01 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jbrouer, paulmck, wfg
In-Reply-To: <1340206321-5986-2-git-send-email-jpirko@redhat.com>

On Wed, 2012-06-20 at 17:32 +0200, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
>  drivers/net/team/team_mode_activebackup.c |    7 +++++--
>  drivers/net/team/team_mode_loadbalance.c  |    8 +++++---
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/team/team_mode_activebackup.c b/drivers/net/team/team_mode_activebackup.c
> index 2fe02a8..c9e7621 100644
> --- a/drivers/net/team/team_mode_activebackup.c
> +++ b/drivers/net/team/team_mode_activebackup.c
> @@ -61,8 +61,11 @@ static void ab_port_leave(struct team *team, struct team_port *port)
>  
>  static int ab_active_port_get(struct team *team, struct team_gsetter_ctx *ctx)
>  {
> -	if (ab_priv(team)->active_port)
> -		ctx->data.u32_val = ab_priv(team)->active_port->dev->ifindex;
> +	struct team_port *active_port;
> +
> +	active_port = rcu_access_pointer(ab_priv(team)->active_port);

This is not the correct fix.

You cant safely dereference active_port if you got it from
rcu_access_pointer()

You should use rcu_dereference() of rcu_dereference_protected() or
rcu_dereference_bh() or similar variant, depending on the context.

> +	if (active_port)
> +		ctx->data.u32_val = active_port->dev->ifindex;
>  	else
>  		ctx->data.u32_val = 0;
>  	return 0;
> diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
> index 45cc095..b4475a5 100644
> --- a/drivers/net/team/team_mode_loadbalance.c
> +++ b/drivers/net/team/team_mode_loadbalance.c
> @@ -96,7 +96,7 @@ static void lb_tx_hash_to_port_mapping_null_port(struct team *team,
>  		struct lb_port_mapping *pm;
>  
>  		pm = &lb_priv->ex->tx_hash_to_port_mapping[i];
> -		if (pm->port == port) {
> +		if (rcu_access_pointer(pm->port) == port) {

This one is OK

>  			rcu_assign_pointer(pm->port, NULL);

I dont understand why you submit two patches...

>  			team_option_inst_set_change(pm->opt_inst_info);
>  			changed = true;
> @@ -292,7 +292,7 @@ static int lb_bpf_func_set(struct team *team, struct team_gsetter_ctx *ctx)
>  	if (lb_priv->ex->orig_fprog) {
>  		/* Clear old filter data */
>  		__fprog_destroy(lb_priv->ex->orig_fprog);
> -		sk_unattached_filter_destroy(lb_priv->fp);
> +		sk_unattached_filter_destroy(rcu_access_pointer(lb_priv->fp));
>  	}

^ permalink raw reply

* [PATCH v2] can: c_can_pci: fix compilation on non HAVE_CLK archs
From: Marc Kleine-Budde @ 2012-06-20 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Marc Kleine-Budde, Federico Vaga

In commit:

  5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI

the c_can_pci driver has been added. It uses clk_*() functions
resulting in a link error on archs without clock support. This
patch removed these clk_() functions as these parts of the driver
are not tested.

Cc: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
Resending with the missing "v2".

Marc

 drivers/net/can/c_can/c_can_pci.c |   29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
index 914aecf..1011146 100644
--- a/drivers/net/can/c_can/c_can_pci.c
+++ b/drivers/net/can/c_can/c_can_pci.c
@@ -13,7 +13,6 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
-#include <linux/clk.h>
 #include <linux/pci.h>
 
 #include <linux/can/dev.h>
@@ -30,7 +29,7 @@ struct c_can_pci_data {
 	enum c_can_dev_id type;
 	/* Set the register alignment in the memory */
 	enum c_can_pci_reg_align reg_align;
-	/* Set the frequency if clk is not usable */
+	/* Set the frequency */
 	unsigned int freq;
 };
 
@@ -71,7 +70,6 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 	struct c_can_priv *priv;
 	struct net_device *dev;
 	void __iomem *addr;
-	struct clk *clk;
 	int ret;
 
 	ret = pci_enable_device(pdev);
@@ -113,18 +111,11 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 	priv->base = addr;
 
 	if (!c_can_pci_data->freq) {
-		/* get the appropriate clk */
-		clk = clk_get(&pdev->dev, NULL);
-		if (IS_ERR(clk)) {
-			dev_err(&pdev->dev, "no clock defined\n");
-			ret = -ENODEV;
-			goto out_free_c_can;
-		}
-		priv->can.clock.freq = clk_get_rate(clk);
-		priv->priv = clk;
+		dev_err(&pdev->dev, "no clock frequency defined\n");
+		ret = -ENODEV;
+		goto out_free_c_can;
 	} else {
 		priv->can.clock.freq = c_can_pci_data->freq;
-		priv->priv = NULL;
 	}
 
 	/* Configure CAN type */
@@ -138,7 +129,7 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 		break;
 	default:
 		ret = -EINVAL;
-		goto out_free_clock;
+		goto out_free_c_can;
 	}
 
 	/* Configure access to registers */
@@ -153,14 +144,14 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 		break;
 	default:
 		ret = -EINVAL;
-		goto out_free_clock;
+		goto out_free_c_can;
 	}
 
 	ret = register_c_can_dev(dev);
 	if (ret) {
 		dev_err(&pdev->dev, "registering %s failed (err=%d)\n",
 			KBUILD_MODNAME, ret);
-		goto out_free_clock;
+		goto out_free_c_can;
 	}
 
 	dev_dbg(&pdev->dev, "%s device registered (regs=%p, irq=%d)\n",
@@ -168,9 +159,6 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 
 	return 0;
 
-out_free_clock:
-	if (priv->priv)
-		clk_put(priv->priv);
 out_free_c_can:
 	pci_set_drvdata(pdev, NULL);
 	free_c_can_dev(dev);
@@ -193,9 +181,6 @@ static void __devexit c_can_pci_remove(struct pci_dev *pdev)
 
 	unregister_c_can_dev(dev);
 
-	if (priv->priv)
-		clk_put(priv->priv);
-
 	pci_set_drvdata(pdev, NULL);
 	free_c_can_dev(dev);
 
-- 
1.7.10

^ permalink raw reply related

* Re: [PATCH 04/12] mm: Add support for a filesystem to activate swap files and use direct_IO for writing swap pages
From: Rik van Riel @ 2012-06-20 16:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-5-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> Currently swapfiles are managed entirely by the core VM by using ->bmap
> to allocate space and write to the blocks directly. This effectively
> ensures that the underlying blocks are allocated and avoids the need
> for the swap subsystem to locate what physical blocks store offsets
> within a file.
>
> If the swap subsystem is to use the filesystem information to locate
> the blocks, it is critical that information such as block groups,
> block bitmaps and the block descriptor table that map the swap file
> were resident in memory. This patch adds address_space_operations that
> the VM can call when activating or deactivating swap backed by a file.
>
>    int swap_activate(struct file *);
>    int swap_deactivate(struct file *);

Acked-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 05/12] mm: swap: Implement generic handler for swap_activate
From: Rik van Riel @ 2012-06-20 16:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-6-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> The version of swap_activate introduced is sufficient for swap-over-NFS
> but would not provide enough information to implement a generic handler.
> This patch shuffles things slightly to ensure the same information is
> available for aops->swap_activate() as is available to the core.
>
> No functionality change.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 06/12] mm: Add get_kernel_page[s] for pinning of kernel addresses for I/O
From: Rik van Riel @ 2012-06-20 16:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-7-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> This patch adds two new APIs get_kernel_pages() and get_kernel_page()
> that may be used to pin a vector of kernel addresses for IO. The initial
> user is expected to be NFS for allowing pages to be written to swap
> using aops->direct_IO(). Strictly speaking, swap-over-NFS only needs
> to pin one page for IO but it makes sense to express the API in terms
> of a vector and add a helper for pinning single pages.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: Rick Jones @ 2012-06-20 16:21 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, shemminger, netdev
In-Reply-To: <20120619.231401.2278176068934152926.davem@davemloft.net>

On 06/19/2012 11:14 PM, David Miller wrote:
> From: Eric Dumazet<eric.dumazet@gmail.com>
> Date: Wed, 20 Jun 2012 07:51:29 +0200
>
>> On Wed, 2012-06-20 at 07:49 +0200, Eric Dumazet wrote:
>>
>>> 2) small lived tcp sessions
>>>
>>>     input dst is now dirtied because of the additional
>>> dst_clone()/dst_release()
>>
>> Not realy a concern because we dirty cache line anyway
>>
>> dst_use_noref()
>> {
>> 	dst->__use++;
>> 	dst->lastuse = time;
>> }
>
> Right, the costs probably even out for short TCP flows.
>
> But better to do real tests than to believe what any of
> us say. :-)

netperf -c -C  -t TCP_CC ...  #just connect/close

or

netperf -c -C -t TCP_CRR ...  # with a request/response pair in there

For some definition of "real" anyway :)

rick jones

^ permalink raw reply

* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO
From: Alexander Duyck @ 2012-06-20 16:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <1340170590.4604.784.camel@edumazet-glaptop>

On 06/19/2012 10:36 PM, Eric Dumazet wrote:
> On Tue, 2012-06-19 at 17:43 -0700, Alexander Duyck wrote:
>> This patch is meant to help improve system performance when
>> netdev_alloc_frag is used in scenarios in which buffers are short lived.
>> This is accomplished by allowing the page offset to be reset in the event
>> that the page count is 1.  I also reordered the direction in which we give
>> out sections of the page so that we start at the end of the page and end at
>> the start.  The main motivation being that I preferred to have offset
>> represent the amount of page remaining to be used.
>>
>> My primary test case was using ixgbe in combination with TCP.  With this
>> patch applied I saw CPU utilization drop from 3.4% to 3.0% for a single
>> thread of netperf receiving a TCP stream via ixgbe.
>>
>> I also tested several scenarios in which the page reuse would not be
>> possible such as UDP flows and routing.  In both of these scenarios I saw
>> no noticeable performance degradation compared to the kernel without this
>> patch.
>>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>>
>>  net/core/skbuff.c |   15 +++++++++++----
>>  1 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 5b21522..eb3853c 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -317,15 +317,22 @@ void *netdev_alloc_frag(unsigned int fragsz)
>>  	if (unlikely(!nc->page)) {
>>  refill:
>>  		nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD);
>> -		nc->offset = 0;
>>  	}
>>  	if (likely(nc->page)) {
>> -		if (nc->offset + fragsz > PAGE_SIZE) {
>> +		unsigned int offset = PAGE_SIZE;
>> +
>> +		if (page_count(nc->page) != 1)
>> +			offset = nc->offset;
>> +
>> +		if (offset < fragsz) {
>>  			put_page(nc->page);
>>  			goto refill;
>>  		}
>> -		data = page_address(nc->page) + nc->offset;
>> -		nc->offset += fragsz;
>> +
>> +		offset -= fragsz;
>> +		nc->offset = offset;
>> +
>> +		data = page_address(nc->page) + offset;
>>  		get_page(nc->page);
>>  	}
>>  	local_irq_restore(flags);
>>
> I tested this idea one month ago and got not convincing results, because
> the branch was taken half of the time.
>
> The cases where page can be reused is probably specific to ixgbe because
> it uses a different allocator for the frags themselves.
> netdev_alloc_frag() is only used to allocate the skb head.
Actually it is pretty much anywhere a copy-break type setup exists.  I
think ixgbe and a few other drivers have this type of setup where
netdev_alloc_skb is called and the data is just copied into the buffer. 
My thought was if that I can improve this one case without hurting the
other cases I should just go ahead and submit it since it is a net win
performance wise.

I think one of the biggest advantages of this for ixgbe is that it
allows the buffer to become cache warm so that writing the shared info
and copying the header contents becomes very cheap compared to accessing
a cache cold page.

> For typical nics, we allocate frags to populate the RX ring _way_ before
> packet is received by the NIC.
>
> Then, I played with using order-2 pages instead of order-0 ones if
> PAGE_SIZE < 8192.
>
> No clear win either, but you might try this too.
The biggest issue I see with an order-2 page is that it means the memory
is going to take much longer to cycle out of a shared page.  As a result
changes like the one I just came up with would likely have little to no
benefit because we would run out of room in the frags list before we
could start reusing a fresh page.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH 07/12] mm: Add support for direct_IO to highmem pages
From: Rik van Riel @ 2012-06-20 17:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-8-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> The patch "mm: Add support for a filesystem to activate swap files and
> use direct_IO for writing swap pages" added support for using direct_IO
> to write swap pages but it is insufficient for highmem pages.
>
> To support highmem pages, this patch kmaps() the page before calling the
> direct_IO() handler. As direct_IO deals with virtual addresses an
> additional helper is necessary for get_kernel_pages() to lookup the
> struct page for a kmap virtual address.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* next-20120620 build error in netfilter
From: Valdis Kletnieks @ 2012-06-20 17:14 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netdev; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

Today's linux-next fails to build with CONFIG_NF_NAT_NEEDED=y and CONFIG_NF_NAT=m

  LD      init/built-in.o
net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust'
make: *** [vmlinux] Error 1

Breakage introduced with this commit:

commit 8c88f87cb27ad09086940bdd3e6955e5325ec89a
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Thu Jun 7 13:31:25 2012 +0200

    netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangled


[-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --]

^ permalink raw reply

* Re: [PATCH 08/12] nfs: teach the NFS client how to treat PG_swapcache pages
From: Rik van Riel @ 2012-06-20 17:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-9-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> Replace all relevant occurences of page->index and page->mapping in
> the NFS client with the new page_file_index() and page_file_mapping()
> functions.
>
> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO
From: Alexander Duyck @ 2012-06-20 17:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <4FE1FABF.6040309@intel.com>

On 06/20/2012 09:30 AM, Alexander Duyck wrote:
> On 06/19/2012 10:36 PM, Eric Dumazet wrote:
>> On Tue, 2012-06-19 at 17:43 -0700, Alexander Duyck wrote:
>>> This patch is meant to help improve system performance when
>>> netdev_alloc_frag is used in scenarios in which buffers are short lived.
>>> This is accomplished by allowing the page offset to be reset in the event
>>> that the page count is 1.  I also reordered the direction in which we give
>>> out sections of the page so that we start at the end of the page and end at
>>> the start.  The main motivation being that I preferred to have offset
>>> represent the amount of page remaining to be used.
>>>
>>> My primary test case was using ixgbe in combination with TCP.  With this
>>> patch applied I saw CPU utilization drop from 3.4% to 3.0% for a single
>>> thread of netperf receiving a TCP stream via ixgbe.
>>>
>>> I also tested several scenarios in which the page reuse would not be
>>> possible such as UDP flows and routing.  In both of these scenarios I saw
>>> no noticeable performance degradation compared to the kernel without this
>>> patch.
>>>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>> ---
>>>
>>>  net/core/skbuff.c |   15 +++++++++++----
>>>  1 files changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 5b21522..eb3853c 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -317,15 +317,22 @@ void *netdev_alloc_frag(unsigned int fragsz)
>>>  	if (unlikely(!nc->page)) {
>>>  refill:
>>>  		nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD);
>>> -		nc->offset = 0;
>>>  	}
>>>  	if (likely(nc->page)) {
>>> -		if (nc->offset + fragsz > PAGE_SIZE) {
>>> +		unsigned int offset = PAGE_SIZE;
>>> +
>>> +		if (page_count(nc->page) != 1)
>>> +			offset = nc->offset;
>>> +
>>> +		if (offset < fragsz) {
>>>  			put_page(nc->page);
>>>  			goto refill;
>>>  		}
>>> -		data = page_address(nc->page) + nc->offset;
>>> -		nc->offset += fragsz;
>>> +
>>> +		offset -= fragsz;
>>> +		nc->offset = offset;
>>> +
>>> +		data = page_address(nc->page) + offset;
>>>  		get_page(nc->page);
>>>  	}
>>>  	local_irq_restore(flags);
>>>
>> I tested this idea one month ago and got not convincing results, because
>> the branch was taken half of the time.
>>
>> The cases where page can be reused is probably specific to ixgbe because
>> it uses a different allocator for the frags themselves.
>> netdev_alloc_frag() is only used to allocate the skb head.
> Actually it is pretty much anywhere a copy-break type setup exists.  I
> think ixgbe and a few other drivers have this type of setup where
> netdev_alloc_skb is called and the data is just copied into the buffer. 
> My thought was if that I can improve this one case without hurting the
> other cases I should just go ahead and submit it since it is a net win
> performance wise.
>
> I think one of the biggest advantages of this for ixgbe is that it
> allows the buffer to become cache warm so that writing the shared info
> and copying the header contents becomes very cheap compared to accessing
> a cache cold page.
>
>> For typical nics, we allocate frags to populate the RX ring _way_ before
>> packet is received by the NIC.
>>
>> Then, I played with using order-2 pages instead of order-0 ones if
>> PAGE_SIZE < 8192.
>>
>> No clear win either, but you might try this too.
> The biggest issue I see with an order-2 page is that it means the memory
> is going to take much longer to cycle out of a shared page.  As a result
> changes like the one I just came up with would likely have little to no
> benefit because we would run out of room in the frags list before we
> could start reusing a fresh page.
>
> Thanks,
>
> Alex
>
Actually I think I just realized what the difference is.  I was looking
at things with LRO disabled.  With LRO enabled our hardware RSC feature
kind of defeats the whole point of the GRO or TCP coalescing anyway
since it will stuff 16 fragments into a single packet before we even
hand the packet off to the stack.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH 09/12] nfs: disable data cache revalidation for swapfiles
From: Rik van Riel @ 2012-06-20 17:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1340185081-22525-10-git-send-email-mgorman@suse.de>

On 06/20/2012 05:37 AM, Mel Gorman wrote:
> The VM does not like PG_private set on PG_swapcache pages. As suggested
> by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables
> NFS data cache revalidation on swap files.  as it does not make
> sense to have other clients change the file while it is being used as
> swap. This avoids setting PG_private on swap pages, since there ought
> to be no further races with invalidate_inode_pages2() to deal with.
>
> Since we cannot set PG_private we cannot use page->private which
> is already used by PG_swapcache pages to store the nfs_page. Thus
> augment the new nfs_page_find_request logic.
>
> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel<riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [ndisc6] Bad NIC causing  IPV6 NDP to stop working
From: Menny_Hamburger @ 2012-06-20 17:09 UTC (permalink / raw)
  To: netdev

Hi,

We have witnessed several cases where we suspect that a bad NIC on the machine caused neighbour discovery to stop working on all the other NICs - when this happens ping6 fails on every NIC we try it.
>From looking into the code I see that there is only a single socket assigned for NDP; Does it sound logical to allocate a socket per interface instead of a single global socket.

Thanks,
Menny

^ permalink raw reply

* Q: NET/JME: pci_get_drvdata pointer return check at jme_remove
From: devendra.aaru @ 2012-06-20 17:20 UTC (permalink / raw)
  To: netdev

Hi,

looking at the jme_init_one error path, the context of the driver data
is set to null.

If the driver unloads , the unload _remove_one, will be called and
deferencing the pointer, leading to a oops.

so we need to have a check at before doing the netdev_priv at remove_one.

Please correct me if my understanding is wrong....

Thanks,
Devendra.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox