Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] netxen : Error return off by one for XG port.
From: santosh nayak @ 2012-06-20 10:52 UTC (permalink / raw)
  To: sony.chacko, rajesh.borundia; +Cc: netdev, kernel-janitors, Santosh Nayak

From: Santosh Nayak <santoshprasadnayak@gmail.com>

There are  NETXEN_NIU_MAX_XG_PORTS ports.
Port indexing starts from zero.
Hence we should also return error for  'port == NETXEN_NIU_MAX_XG_PORTS'.

Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
---
 .../ethernet/qlogic/netxen/netxen_nic_ethtool.c    |    4 ++--
 drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
index d4f179f..9103e3e 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
@@ -511,7 +511,7 @@ netxen_nic_get_pauseparam(struct net_device *dev,
 				break;
 		}
 	} else if (adapter->ahw.port_type == NETXEN_NIC_XGBE) {
-		if ((port < 0) || (port > NETXEN_NIU_MAX_XG_PORTS))
+		if ((port < 0) || (port >= NETXEN_NIU_MAX_XG_PORTS))
 			return;
 		pause->rx_pause = 1;
 		val = NXRD32(adapter, NETXEN_NIU_XG_PAUSE_CTL);
@@ -577,7 +577,7 @@ netxen_nic_set_pauseparam(struct net_device *dev,
 		}
 		NXWR32(adapter, NETXEN_NIU_GB_PAUSE_CTL, val);
 	} else if (adapter->ahw.port_type == NETXEN_NIC_XGBE) {
-		if ((port < 0) || (port > NETXEN_NIU_MAX_XG_PORTS))
+		if ((port < 0) || (port >= NETXEN_NIU_MAX_XG_PORTS))
 			return -EIO;
 		val = NXRD32(adapter, NETXEN_NIU_XG_PAUSE_CTL);
 		if (port == 0) {
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
index de96a94..946160f 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
@@ -365,7 +365,7 @@ static int netxen_niu_disable_xg_port(struct netxen_adapter *adapter)
 	if (NX_IS_REVISION_P3(adapter->ahw.revision_id))
 		return 0;
 
-	if (port > NETXEN_NIU_MAX_XG_PORTS)
+	if (port >= NETXEN_NIU_MAX_XG_PORTS)
 		return -EINVAL;
 
 	mac_cfg = 0;
@@ -392,7 +392,7 @@ static int netxen_p2_nic_set_promisc(struct netxen_adapter *adapter, u32 mode)
 	u32 port = adapter->physical_port;
 	u16 board_type = adapter->ahw.board_type;
 
-	if (port > NETXEN_NIU_MAX_XG_PORTS)
+	if (port >= NETXEN_NIU_MAX_XG_PORTS)
 		return -EINVAL;
 
 	mac_cfg = NXRD32(adapter, NETXEN_NIU_XGE_CONFIG_0 + (0x10000 * port));
-- 
1.7.4.4

^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Marc Kleine-Budde @ 2012-06-20 10:33 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, viresh.kumar2, bhupesh.sharma, netdev, linux-next,
	linux-kernel, federico.vaga, giancarlo.asnaghi, wg, spear-devel,
	Andrew Morton
In-Reply-To: <20120620202604.407af721f746045ae00c8268@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]

On 06/20/2012 12:26 PM, Stephen Rothwell wrote:
> Hi all,
> 
> On Wed, 20 Jun 2012 01:20:37 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>>
>> From: viresh kumar <viresh.kumar2@arm.com>
>> Date: Wed, 20 Jun 2012 09:08:34 +0100
>>
>>> Please see following patchset from me, that got applied in linux-next
>>>
>>> https://lkml.org/lkml/2012/4/24/154
>>>
>>> Please check if this patchset is present in your build repo. I believe it should be
>>> there. If it is, then you shouldn't get these errors.
>>
>> Well, then Stephen shouldn't get those errors either.
>>
>> But obviously he did.
>>
>> But all of this talk about changes existing only in linux-next is
>> entirely moot.  Because The damn thing MUST build independently inside
>> of net-next which doesn't have those clock layer changes.
>>
>> Someone send me a clean fix for net-next now.
> 
> I get those errors because those patches are in the akpm tree which is
> merged after everything else ...
> 
> One possibility is to put those changes in another (stable) tree and
> merge that into the net-next tree (and any other tree that needs it).

We're about to remove the offending clk_*() functions from the driver,
as they are untested anyway. The hardware the driver was developed for
uses a hardcoded clock rate in the driver anyway, as it cannot be
retrieved from clock tree. As soon as there is hardware available that
will work with the clock tree, we can add those functions back.

Sorry for the noise,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2012-06-20 10:26 UTC (permalink / raw)
  To: David Miller
  Cc: viresh.kumar2, bhupesh.sharma, netdev, linux-next, linux-kernel,
	federico.vaga, giancarlo.asnaghi, wg, mkl, spear-devel,
	Andrew Morton
In-Reply-To: <20120620.012037.783895812206310043.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1139 bytes --]

Hi all,

On Wed, 20 Jun 2012 01:20:37 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>
> From: viresh kumar <viresh.kumar2@arm.com>
> Date: Wed, 20 Jun 2012 09:08:34 +0100
> 
> > Please see following patchset from me, that got applied in linux-next
> > 
> > https://lkml.org/lkml/2012/4/24/154
> > 
> > Please check if this patchset is present in your build repo. I believe it should be
> > there. If it is, then you shouldn't get these errors.
> 
> Well, then Stephen shouldn't get those errors either.
> 
> But obviously he did.
> 
> But all of this talk about changes existing only in linux-next is
> entirely moot.  Because The damn thing MUST build independently inside
> of net-next which doesn't have those clock layer changes.
> 
> Someone send me a clean fix for net-next now.

I get those errors because those patches are in the akpm tree which is
merged after everything else ...

One possibility is to put those changes in another (stable) tree and
merge that into the net-next tree (and any other tree that needs it).
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] usbnet: Activate halt interrupt endpoint before re-submit URB
From: Oliver Neukum @ 2012-06-20 10:21 UTC (permalink / raw)
  To: Ming Lei; +Cc: Huajun Li, David Miller, stern, linux-usb, netdev
In-Reply-To: <CACVXFVMy6Hrqgw6rmAXv5YXD86sLF+FUPTRRcXKs40uwN6ioCg@mail.gmail.com>

Am Mittwoch, 20. Juni 2012, 12:15:25 schrieb Ming Lei:
> On Wed, Jun 20, 2012 at 4:58 PM, Oliver Neukum <oneukum@suse.de> wrote:
> > Am Mittwoch, 20. Juni 2012, 10:07:55 schrieb Ming Lei:
> >> BTW, maybe it is better to add below
> >>
> >>     usbnet_defer_kevent(dev, EVENT_STS_HALT);
> >>
> >> for -EPIPE returned from usb_urb_submit if it will be resent.
> >
> > Why? If it failed once it'll probably also fail the next time.
> 
> -EPIPE just means the endpoint is halted, either from usb_urb_submit
> or urb->status, so the HALT should be cleared in the situation.

It probably was halted and cleared. However that you cleared
a halt doesn't mean that the reason for stalling went away.
So you must cope with an endpoint being halted again right after
it was cleared.

> > In that case we'd need to do something more intrusive
> > like resetting the device, but that cannot be done well
> > in the generic usbnet part.
> 
> IMO, resetting is not needed for -EPIPE, but may be needed for
> -EPROTO failure.

We don't need it for a single failure, but what else would we do
if we keep getting -EPIPE?

	Regards
		Oliver

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: Federico Vaga @ 2012-06-20 10:18 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: David Miller, netdev, linux-can
In-Reply-To: <4FE1A1A1.3000105@pengutronix.de>

> I think we finally can see the big picture now; I'm preparing a patch
> which removes the clk_*() functions.

Thank you, and sorry for the big trouble

-- 
Federico Vaga

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Federico Vaga @ 2012-06-20 10:17 UTC (permalink / raw)
  To: David Miller
  Cc: mkl, bhupesh.sharma, sfr, netdev, linux-next, linux-kernel,
	giancarlo.asnaghi, wg
In-Reply-To: <20120620.025837.721158723130014230.davem@davemloft.net>

> Why would you try to be generic by using an interface currently
> only available on certain platforms?

I know, I was wrong.

> That is how you make drivers non-portable, and not generic.

Now is fixed in my mind; I learn the lesson.

-- 
Federico Vaga

^ permalink raw reply

* Re: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.
From: santosh prasad nayak @ 2012-06-20 10:16 UTC (permalink / raw)
  To: Rajesh Borundia
  Cc: Dan Carpenter, Sony Chacko, netdev,
	kernel-janitors@vger.kernel.org
In-Reply-To: <13A253B3F9BEFE43B93C09CF75F63CAA81A886EF19@MNEXMB1.qlogic.org>

On Wed, Jun 20, 2012 at 3:21 PM, Rajesh Borundia
<rajesh.borundia@qlogic.com> wrote:
> _______________________________________
> From: santosh prasad nayak [santoshprasadnayak@gmail.com]
> Sent: Wednesday, June 20, 2012 1:29 PM
> To: Dan Carpenter; Rajesh Borundia
> Cc: Sony Chacko; netdev; kernel-janitors@vger.kernel.org
> Subject: Re: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.
>
> On Wed, Jun 20, 2012 at 1:14 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
>> On Wed, Jun 20, 2012 at 12:57:39PM +0530, santosh nayak wrote:
>>> From: Santosh Nayak <santoshprasadnayak@gmail.com>
>>>
>>> There are 'NETXEN_NIU_MAX_GBE_PORTS'  GBE ports. Port indexing starts
>>> from zero.
>>> Hence we should also return error for "port == NETXEN_NIU_MAX_GBE_PORTS"
>>>
>>
>> I don't know this code well enough to say if you are right or not,
>> but what about for port == NETXEN_NIU_MAX_XG_PORTS a few lines later
>> in both functions?
>
>
> I think "for port == NETXEN_NIU_MAX_XG_PORTS"  error should be returned.
>
>
> @Rajesh,
>
> Can you please comment on it ?
>
>
> regards
> santosh
>
>>
>> regards,
>> dan carpenter
>>
>
> Yes error should be returned for  both port == NETXEN_NIU_MAX_XG_PORTS and
> port ==  NETXEN_NIU_MAX_GBE_PORTS.


Ok.

The current patch is for GBE port.
For XG port I will send another patch.

regards
santosh


>
>
> Rajesh
>

^ permalink raw reply

* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-20 10:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev
In-Reply-To: <20120619.231412.1236237191660427779.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 19 Jun 2012 23:14:12 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 20 Jun 2012 07:59:00 +0200
> 
>> On Tue, 2012-06-19 at 21:46 -0700, David Miller wrote:
>> 
>>> These numbers can be decreased further, because since we're already
>>> looking at the TCP header we can pre-cook the TCP control block in the
>>> SKB and skip much of the stuff that tcp_v4_rcv() does since we've done
>>> it already in the early demux code.
>> 
>> It could be done at GRO level and remove one another demux.
>> 
>> As routers probably have no use of GRO, no need of additional knob.
> 
> That's a great idea.

Here's what I have so far, the ipv6 implementation we get nearly for
free :-)

Initially I tried to use ->gro_complete() for this as it was more
natural, but we abort before we get there for a lot of cases where we
want to use the early demux and cached route (ACKs, FINs, sub-mss
sized packets, etc.)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index 967b926..a1b1b53 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -37,7 +37,6 @@
 
 /* This is used to register protocols. */
 struct net_protocol {
-	int			(*early_demux)(struct sk_buff *skb);
 	int			(*handler)(struct sk_buff *skb);
 	void			(*err_handler)(struct sk_buff *skb, u32 info);
 	int			(*gso_send_check)(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b21522..c1b5626 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2956,6 +2956,12 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		return -ENOMEM;
 
 	__copy_skb_header(nskb, p);
+	if (p->sk) {
+		nskb->sk = p->sk;
+		nskb->destructor = p->destructor;
+		p->sk = NULL;
+		p->destructor = NULL;
+	}
 	nskb->mac_len = p->mac_len;
 
 	skb_reserve(nskb, headroom);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 07a02f6..0aabad7 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1519,7 +1519,6 @@ static const struct net_protocol igmp_protocol = {
 #endif
 
 static const struct net_protocol tcp_protocol = {
-	.early_demux	=	tcp_v4_early_demux,
 	.handler	=	tcp_v4_rcv,
 	.err_handler	=	tcp_v4_err,
 	.gso_send_check	=	tcp_v4_gso_send_check,
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 93b092c..c4fe1d2 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -323,32 +323,19 @@ static int ip_rcv_finish(struct sk_buff *skb)
 	 *	how the packet travels inside Linux networking.
 	 */
 	if (skb_dst(skb) == NULL) {
-		const struct net_protocol *ipprot;
-		int protocol = iph->protocol;
-		int err;
-
-		rcu_read_lock();
-		ipprot = rcu_dereference(inet_protos[protocol]);
-		err = -ENOENT;
-		if (ipprot && ipprot->early_demux)
-			err = ipprot->early_demux(skb);
-		rcu_read_unlock();
-
-		if (err) {
-			err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
-						   iph->tos, skb->dev);
-			if (unlikely(err)) {
-				if (err == -EHOSTUNREACH)
-					IP_INC_STATS_BH(dev_net(skb->dev),
-							IPSTATS_MIB_INADDRERRORS);
-				else if (err == -ENETUNREACH)
-					IP_INC_STATS_BH(dev_net(skb->dev),
-							IPSTATS_MIB_INNOROUTES);
-				else if (err == -EXDEV)
-					NET_INC_STATS_BH(dev_net(skb->dev),
-							 LINUX_MIB_IPRPFILTER);
-				goto drop;
-			}
+		int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+					       iph->tos, skb->dev);
+		if (unlikely(err)) {
+			if (err == -EHOSTUNREACH)
+				IP_INC_STATS_BH(dev_net(skb->dev),
+						IPSTATS_MIB_INADDRERRORS);
+			else if (err == -ENETUNREACH)
+				IP_INC_STATS_BH(dev_net(skb->dev),
+						IPSTATS_MIB_INNOROUTES);
+			else if (err == -EXDEV)
+				NET_INC_STATS_BH(dev_net(skb->dev),
+						 LINUX_MIB_IPRPFILTER);
+			goto drop;
 		}
 	}
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13857df..2a483ad 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1671,52 +1671,6 @@ csum_err:
 }
 EXPORT_SYMBOL(tcp_v4_do_rcv);
 
-int tcp_v4_early_demux(struct sk_buff *skb)
-{
-	struct net *net = dev_net(skb->dev);
-	const struct iphdr *iph;
-	const struct tcphdr *th;
-	struct sock *sk;
-	int err;
-
-	err = -ENOENT;
-	if (skb->pkt_type != PACKET_HOST)
-		goto out_err;
-
-	if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct tcphdr)))
-		goto out_err;
-
-	iph = ip_hdr(skb);
-	th = (struct tcphdr *) ((char *)iph + ip_hdrlen(skb));
-
-	if (th->doff < sizeof(struct tcphdr) / 4)
-		goto out_err;
-
-	if (!pskb_may_pull(skb, ip_hdrlen(skb) + th->doff * 4))
-		goto out_err;
-
-	sk = __inet_lookup_established(net, &tcp_hashinfo,
-				       iph->saddr, th->source,
-				       iph->daddr, th->dest,
-				       skb->dev->ifindex);
-	if (sk) {
-		skb->sk = sk;
-		skb->destructor = sock_edemux;
-		if (sk->sk_state != TCP_TIME_WAIT) {
-			struct dst_entry *dst = sk->sk_rx_dst;
-			if (dst)
-				dst = dst_check(dst, 0);
-			if (dst) {
-				skb_dst_set_noref(skb, dst);
-				err = 0;
-			}
-		}
-	}
-
-out_err:
-	return err;
-}
-
 /*
  *	From tcp_input.c
  */
@@ -2576,6 +2530,7 @@ void tcp4_proc_exit(void)
 struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 {
 	const struct iphdr *iph = skb_gro_network_header(skb);
+	struct sk_buff **pp;
 
 	switch (skb->ip_summed) {
 	case CHECKSUM_COMPLETE:
@@ -2591,7 +2546,36 @@ struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		return NULL;
 	}
 
-	return tcp_gro_receive(head, skb);
+	pp = tcp_gro_receive(head, skb);
+
+	if (!NAPI_GRO_CB(skb)->same_flow) {
+		const struct tcphdr *th = tcp_hdr(skb);
+		struct net_device *dev = skb->dev;
+		struct sock *sk;
+
+		sk = __inet_lookup_established(dev_net(dev), &tcp_hashinfo,
+					       iph->saddr, th->source,
+					       iph->daddr, th->dest,
+					       dev->ifindex);
+		if (sk) {
+			skb_orphan(skb);
+			skb->sk = sk;
+			skb->destructor = sock_edemux;
+			if (!skb_dst(skb) &&
+			    sk->sk_state != TCP_TIME_WAIT) {
+				struct dst_entry *dst = sk->sk_rx_dst;
+				if (dst)
+					dst = dst_check(dst, 0);
+				if (dst) {
+					struct rtable *rt = (struct rtable *) dst;
+
+					if (rt->rt_iif == dev->ifindex)
+						skb_dst_set_noref(skb, dst);
+				}
+			}
+		}
+	}
+	return pp;
 }
 
 int tcp4_gro_complete(struct sk_buff *skb)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 26a8862..b8ea463 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -797,6 +797,7 @@ static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
 					 struct sk_buff *skb)
 {
 	const struct ipv6hdr *iph = skb_gro_network_header(skb);
+	struct sk_buff **pp;
 
 	switch (skb->ip_summed) {
 	case CHECKSUM_COMPLETE:
@@ -812,7 +813,32 @@ static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
 		return NULL;
 	}
 
-	return tcp_gro_receive(head, skb);
+	pp = tcp_gro_receive(head, skb);
+
+	if (!NAPI_GRO_CB(skb)->same_flow) {
+		const struct tcphdr *th = tcp_hdr(skb);
+		struct net_device *dev = skb->dev;
+		struct sock *sk;
+
+		sk = __inet6_lookup_established(dev_net(dev), &tcp_hashinfo,
+						&iph->saddr, th->source,
+						&iph->daddr, th->dest,
+						dev->ifindex);
+		if (sk) {
+			skb_orphan(skb);
+			skb->sk = sk;
+			skb->destructor = sock_edemux;
+			if (!skb_dst(skb) &&
+			    sk->sk_state != TCP_TIME_WAIT) {
+				struct dst_entry *dst = sk->sk_rx_dst;
+				if (dst)
+					dst = dst_check(dst, 0);
+				if (dst)
+					skb_dst_set(skb, dst);
+			}
+		}
+	}
+	return pp;
 }
 
 static int tcp6_gro_complete(struct sk_buff *skb)

^ permalink raw reply related

* Re: [PATCH] usbnet: Activate halt interrupt endpoint before re-submit URB
From: Ming Lei @ 2012-06-20 10:15 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Huajun Li, David Miller, stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201206201058.55519.oneukum-l3A5Bk7waGM@public.gmane.org>

On Wed, Jun 20, 2012 at 4:58 PM, Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org> wrote:
> Am Mittwoch, 20. Juni 2012, 10:07:55 schrieb Ming Lei:
>> BTW, maybe it is better to add below
>>
>>     usbnet_defer_kevent(dev, EVENT_STS_HALT);
>>
>> for -EPIPE returned from usb_urb_submit if it will be resent.
>
> Why? If it failed once it'll probably also fail the next time.

-EPIPE just means the endpoint is halted, either from usb_urb_submit
or urb->status, so the HALT should be cleared in the situation.

> In that case we'd need to do something more intrusive
> like resetting the device, but that cannot be done well
> in the generic usbnet part.

IMO, resetting is not needed for -EPIPE, but may be needed for
-EPROTO failure.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: David Miller @ 2012-06-20 10:12 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, federico.vaga
In-Reply-To: <4FE1A1A1.3000105@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 20 Jun 2012 12:10:41 +0200

> I think we finally can see the big picture now; I'm preparing a patch
> which removes the clk_*() functions.

Thank you.

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: Marc Kleine-Budde @ 2012-06-20 10:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-can, federico.vaga
In-Reply-To: <20120620.025452.2203668280120884694.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]

On 06/20/2012 11:54 AM, David Miller wrote:
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Date: Wed, 20 Jun 2012 11:48:08 +0200
> 
>> In commit:
>>
>>   5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI
>>
>> the c_can_pci driver has been added. It uses clk_*() functions
>> unconditionally, resulting in a link error on archs without
>> clock support. This patch adds a "depends on HAVE_CLK" to the
>> Kconfig symbol.
> 
> This is an unreasonable change and I just explained why in my email to
> Frederico, did you not see it?

I send that mail before I received Frederico's and your Mail.

> He says that this driver was only tested on an architecture that
> currently doesn't even have clock support in any existing tree, and
> therefore completely relies upon local changes they have to add clock
> support to that platform.
> 
> Which means you're change is restricting compilation of this driver to
> platforms the driver was never, ever, tested on.
> 
> Can you see what a complete joke this is?

I think we finally can see the big picture now; I'm preparing a patch
which removes the clk_*() functions.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: 10GBE performance drop with net.ipv4.tcp_timestamps=0
From: Eric Dumazet @ 2012-06-20 10:06 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Linux Netdev List
In-Reply-To: <4FE19CFC.8030408@profihost.ag>

On Wed, 2012-06-20 at 11:50 +0200, Stefan Priebe - Profihost AG wrote:
> Am 20.06.2012 11:47, schrieb Eric Dumazet:
> > On Wed, 2012-06-20 at 11:33 +0200, Stefan Priebe - Profihost AG wrote:
> >
> >> Sure. In that case i get 4Gbit/s in both variants. I also tried two
> >> other different machines same result.
> >>
> >
> > So 3.5 on receiver is the problem, it seems ?
> Yes.
> 
> > And you checked all the stuff about irq affinities, i presume, since a
> > lot of things might have changed between 2.6.32 and 3.5 ?
> 
> It is a single core E5 Xeon - i've set the affinity like this:

And you still have the retransmits in "netstat -s" output ?

Might be a firmware or pci issue, I have same cards but no problem here.

Check LRO is on ?

ethtool -k eth2

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Federico Vaga @ 2012-06-20  9:59 UTC (permalink / raw)
  To: David Miller
  Cc: mkl, bhupesh.sharma, sfr, netdev, linux-next, linux-kernel,
	giancarlo.asnaghi, wg
In-Reply-To: <20120620.020611.1696375957120854262.davem@davemloft.net>

> Then the driver should NEVER have been submitted without the
> required infrastructure in place first.

This particular driver don't use the clk framework at the moment. I put 
that lines about clk to try to be generic as possibile, but I see that I 
made a mistake: I'm sorry. An alternative solution to HAVE_CLK 
dependency can be: remove the clk_* lines because actualy nobody use 
them. In the future, if our c_can migrate to clk and our clk framework 
is accepted in the kernel, we can re-add the clk_* lines.

-- 
Federico Vaga

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: David Miller @ 2012-06-20  9:58 UTC (permalink / raw)
  To: federico.vaga
  Cc: mkl, bhupesh.sharma, sfr, netdev, linux-next, linux-kernel,
	giancarlo.asnaghi, wg
In-Reply-To: <5059693.iaLYtpk3E4@harkonnen>

From: Federico Vaga <federico.vaga@gmail.com>
Date: Wed, 20 Jun 2012 11:59:26 +0200

>> Then the driver should NEVER have been submitted without the
>> required infrastructure in place first.
> 
> This particular driver don't use the clk framework at the moment. I put 
> that lines about clk to try to be generic as possibile, but I see that I 
> made a mistake: I'm sorry.

Why would you try to be generic by using an interface currently
only available on certain platforms?

That is how you make drivers non-portable, and not generic.

^ permalink raw reply

* RE: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.
From: Rajesh Borundia @ 2012-06-20  9:51 UTC (permalink / raw)
  To: santosh prasad nayak, Dan Carpenter
  Cc: Sony Chacko, netdev, kernel-janitors@vger.kernel.org
In-Reply-To: <CAOD=uF6Oe1KZqmtUpY57u9GJh9BqxoK0DP0n4yP_DrMUjfXbCQ@mail.gmail.com>

_______________________________________
From: santosh prasad nayak [santoshprasadnayak@gmail.com]
Sent: Wednesday, June 20, 2012 1:29 PM
To: Dan Carpenter; Rajesh Borundia
Cc: Sony Chacko; netdev; kernel-janitors@vger.kernel.org
Subject: Re: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.

On Wed, Jun 20, 2012 at 1:14 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> On Wed, Jun 20, 2012 at 12:57:39PM +0530, santosh nayak wrote:
>> From: Santosh Nayak <santoshprasadnayak@gmail.com>
>>
>> There are 'NETXEN_NIU_MAX_GBE_PORTS'  GBE ports. Port indexing starts
>> from zero.
>> Hence we should also return error for "port == NETXEN_NIU_MAX_GBE_PORTS"
>>
>
> I don't know this code well enough to say if you are right or not,
> but what about for port == NETXEN_NIU_MAX_XG_PORTS a few lines later
> in both functions?

I think "for port == NETXEN_NIU_MAX_XG_PORTS"  error should be returned.

@Rajesh,

Can you please comment on it ?

regards
santosh

>
> regards,
> dan carpenter
>

Yes error should be returned for  both port == NETXEN_NIU_MAX_XG_PORTS and
port ==  NETXEN_NIU_MAX_GBE_PORTS.

Rajesh 

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: David Miller @ 2012-06-20  9:54 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, federico.vaga
In-Reply-To: <1340185688-9454-1-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 20 Jun 2012 11:48:08 +0200

> In commit:
> 
>   5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI
> 
> the c_can_pci driver has been added. It uses clk_*() functions
> unconditionally, resulting in a link error on archs without
> clock support. This patch adds a "depends on HAVE_CLK" to the
> Kconfig symbol.

This is an unreasonable change and I just explained why in my email to
Frederico, did you not see it?

He says that this driver was only tested on an architecture that
currently doesn't even have clock support in any existing tree, and
therefore completely relies upon local changes they have to add clock
support to that platform.

Which means you're change is restricting compilation of this driver to
platforms the driver was never, ever, tested on.

Can you see what a complete joke this is?

^ permalink raw reply

* Re: 10GBE performance drop with net.ipv4.tcp_timestamps=0
From: Stefan Priebe - Profihost AG @ 2012-06-20  9:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Netdev List
In-Reply-To: <1340185645.4604.853.camel@edumazet-glaptop>

Am 20.06.2012 11:47, schrieb Eric Dumazet:
> On Wed, 2012-06-20 at 11:33 +0200, Stefan Priebe - Profihost AG wrote:
>
>> Sure. In that case i get 4Gbit/s in both variants. I also tried two
>> other different machines same result.
>>
>
> So 3.5 on receiver is the problem, it seems ?
Yes.

> And you checked all the stuff about irq affinities, i presume, since a
> lot of things might have changed between 2.6.32 and 3.5 ?

It is a single core E5 Xeon - i've set the affinity like this:

eth2 mask=1 for /proc/irq/83/smp_affinity
eth2 mask=2 for /proc/irq/84/smp_affinity
eth2 mask=4 for /proc/irq/85/smp_affinity
eth2 mask=8 for /proc/irq/86/smp_affinity
eth2 mask=10 for /proc/irq/87/smp_affinity
eth2 mask=20 for /proc/irq/88/smp_affinity
eth2 mask=40 for /proc/irq/89/smp_affinity
eth2 mask=80 for /proc/irq/90/smp_affinity

> cat /proc/interrupts

# cat /proc/interrupts
             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 
       CPU6       CPU7
    0:        141          0          0          0          0          0 
          0          0   IO-APIC-edge      timer
    1:          1          8          0          0          0          0 
          0          0   IO-APIC-edge      i8042
    9:          0          0          0          0          0          0 
          0          0   IO-APIC-fasteoi   acpi
   12:          0          3          0          0          0          0 
          0          0   IO-APIC-edge      i8042
   14:          0          0          0          0          0          0 
          0          0   IO-APIC-edge      ide0
   15:          0          0          0          0          0          0 
          0          0   IO-APIC-edge      ide1
   16:          0          0         26          0          0          0 
          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
   23:          0          0         30          0          0          0 
          0          0   IO-APIC-fasteoi   ehci_hcd:usb2
   64:          0          0          0      81979          0          0 
          0          0   PCI-MSI-edge      ahci
   65:          0          0          0          1          0          0 
          0          0   PCI-MSI-edge      eth0
   66:          0          0          0          0       1090          0 
          0          0   PCI-MSI-edge      eth0-TxRx-0
   67:          0          0          0          0        411          0 
          0          0   PCI-MSI-edge      eth0-TxRx-1
   68:          0          0          0          0        592          0 
          0          0   PCI-MSI-edge      eth0-TxRx-2
   69:          0          0          0          0        472          0 
          0          0   PCI-MSI-edge      eth0-TxRx-3
   70:          0          0          0          0          0       1196 
          0          0   PCI-MSI-edge      eth0-TxRx-4
   71:          0          0          0          0          0        374 
          0          0   PCI-MSI-edge      eth0-TxRx-5
   72:          0          0          0          0          0        405 
          0          0   PCI-MSI-edge      eth0-TxRx-6
   73:          0          0          0          0          0        468 
          0          0   PCI-MSI-edge      eth0-TxRx-7
   83:      31278          0          0         65          0          0 
          0          0   PCI-MSI-edge      eth2-TxRx-0
   84:          0      36311          0          0         61          0 
          0          0   PCI-MSI-edge      eth2-TxRx-1
   85:          0          0      46189          0         61          0 
          0          0   PCI-MSI-edge      eth2-TxRx-2
   86:          0          0          0      28712         67          0 
          0          0   PCI-MSI-edge      eth2-TxRx-3
   87:          0          0          0          0      28089          0 
          0          0   PCI-MSI-edge      eth2-TxRx-4
   88:          0          0          0          0          0      34982 
          0          0   PCI-MSI-edge      eth2-TxRx-5
   89:          0          0          0          0          0         61 
      32420          0   PCI-MSI-edge      eth2-TxRx-6
   90:          0          0          0          0          0         61 
          0      25922   PCI-MSI-edge      eth2-TxRx-7
   91:          0          0          0          0          0          3 
          0          0   PCI-MSI-edge      eth2
  NMI:         13         12         15         22          5          5 
          5          5   Non-maskable interrupts
  LOC:      58919      61420      65519      82647      35519      40489 
      27141      30228   Local timer interrupts
  SPU:          0          0          0          0          0          0 
          0          0   Spurious interrupts
  PMI:         13         12         15         22          5          5 
          5          5   Performance monitoring interrupts
  IWI:          0          0          0          0          0          0 
          0          0   IRQ work interrupts
  RTR:          6          0          0          0          0          0 
          0          0   APIC ICR read retries
  RES:      15116       4521       2418       1814       2375       1615 
       1488       1367   Rescheduling interrupts
  CAL:        134        148        100        162        170        172 
        172        172   Function call interrupts
  TLB:        422        486        415        483        460        460 
        476        398   TLB shootdowns
  TRM:          0          0          0          0          0          0 
          0          0   Thermal event interrupts
  THR:          0          0          0          0          0          0 
          0          0   Threshold APIC interrupts
  MCE:          0          0          0          0          0          0 
          0          0   Machine check exceptions
  MCP:          4          4          4          4          4          4 
          4          4   Machine check polls

> what kind of NIC it is ?

# lspci | grep 10-Giga
06:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
06:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)

Stefan

^ permalink raw reply

* [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: Marc Kleine-Budde @ 2012-06-20  9:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Marc Kleine-Budde, Federico Vaga

In commit:

  5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI

the c_can_pci driver has been added. It uses clk_*() functions
unconditionally, resulting in a link error on archs without
clock support. This patch adds a "depends on HAVE_CLK" to the
Kconfig symbol.

An upcoming patch from Viresh Kumar adds a generic dummy
implementation. As soons as this patch has been merged, this
Kconfig symbol can be removed.

Cc: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/c_can/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/can/c_can/Kconfig b/drivers/net/can/c_can/Kconfig
index 3b83baf..2835277 100644
--- a/drivers/net/can/c_can/Kconfig
+++ b/drivers/net/can/c_can/Kconfig
@@ -17,6 +17,7 @@ config CAN_C_CAN_PLATFORM
 config CAN_C_CAN_PCI
 	tristate "Generic PCI Bus based C_CAN/D_CAN driver"
 	depends on PCI
+	depends on HAVE_CLK
 	---help---
 	  This driver adds support for the C_CAN/D_CAN chips connected
 	  to the PCI bus.
-- 
1.7.10

^ permalink raw reply related

* Re: 10GBE performance drop with net.ipv4.tcp_timestamps=0
From: Eric Dumazet @ 2012-06-20  9:47 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Linux Netdev List
In-Reply-To: <4FE198FE.5050202@profihost.ag>

On Wed, 2012-06-20 at 11:33 +0200, Stefan Priebe - Profihost AG wrote:

> Sure. In that case i get 4Gbit/s in both variants. I also tried two 
> other different machines same result.
> 

So 3.5 on receiver is the problem, it seems ?

And you checked all the stuff about irq affinities, i presume, since a
lot of things might have changed between 2.6.32 and 3.5 ?

cat /proc/interrupts

what kind of NIC it is ?

^ permalink raw reply

* [PATCH 12/12] Avoid dereferencing bd_disk during swap_entry_free for network storage
From: Mel Gorman @ 2012-06-20  9:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

Commit [b3a27d: swap: Add swap slot free callback to
block_device_operations] dereferences p->bdev->bd_disk but this is a
NULL dereference if using swap-over-NFS. This patch checks SWP_BLKDEV
on the swap_info_struct before dereferencing.

With reference to this callback, Christoph Hellwig stated "Please
just remove the callback entirely.  It has no user outside the staging
tree and was added clearly against the rules for that staging tree".
This would also be my preference but there was not an obvious way of
keeping zram in staging/ happy.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swapfile.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 7307fc9..e6c4b13 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -549,7 +549,6 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 
 	/* free if no reference */
 	if (!usage) {
-		struct gendisk *disk = p->bdev->bd_disk;
 		if (offset < p->lowest_bit)
 			p->lowest_bit = offset;
 		if (offset > p->highest_bit)
@@ -560,9 +559,11 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 		nr_swap_pages++;
 		p->inuse_pages--;
 		frontswap_invalidate_page(p->type, offset);
-		if ((p->flags & SWP_BLKDEV) &&
-				disk->fops->swap_slot_free_notify)
-			disk->fops->swap_slot_free_notify(p->bdev, offset);
+		if (p->flags & SWP_BLKDEV) {
+			struct gendisk *disk = p->bdev->bd_disk;
+			if (disk->fops->swap_slot_free_notify)
+				disk->fops->swap_slot_free_notify(p->bdev, offset);
+		}
 	}
 
 	return usage;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 11/12] nfs: Prevent page allocator recursions with swap over NFS.
From: Mel Gorman @ 2012-06-20  9:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate
IO, just not of any filesystem data.

The problem is that previously NOFS was correct because that avoids
recursion into the NFS code. With swap-over-NFS, it is no longer
correct as swap IO can lead to this recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/pagelist.c |    2 +-
 fs/nfs/write.c    |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 9ef8b3c..7de1646 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -70,7 +70,7 @@ void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos)
 static inline struct nfs_page *
 nfs_page_alloc(void)
 {
-	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL);
+	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_NOIO);
 	if (p)
 		INIT_LIST_HEAD(&p->wb_list);
 	return p;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index f45b9ca..6f90681 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -52,7 +52,7 @@ static mempool_t *nfs_commit_mempool;
 
 struct nfs_commit_data *nfs_commitdata_alloc(void)
 {
-	struct nfs_commit_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
+	struct nfs_commit_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -70,7 +70,7 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
 
 struct nfs_write_header *nfs_writehdr_alloc(void)
 {
-	struct nfs_write_header *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS);
+	struct nfs_write_header *p = mempool_alloc(nfs_wdata_mempool, GFP_NOIO);
 
 	if (p) {
 		struct nfs_pgio_header *hdr = &p->header;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 10/12] nfs: enable swap on NFS
From: Mel Gorman @ 2012-06-20  9:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect
under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the
protocol ->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us
to receive the packets required for the TCP connection buildup.

[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/Kconfig              |    8 +++++
 fs/nfs/direct.c             |   82 ++++++++++++++++++++++++++++---------------
 fs/nfs/file.c               |   22 ++++++++++--
 include/linux/nfs_fs.h      |    4 +--
 include/linux/sunrpc/xprt.h |    3 ++
 net/sunrpc/Kconfig          |    5 +++
 net/sunrpc/clnt.c           |    2 ++
 net/sunrpc/sched.c          |    7 ++--
 net/sunrpc/xprtsock.c       |   53 ++++++++++++++++++++++++++++
 9 files changed, 152 insertions(+), 34 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index f90f4f5..af4dedd 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -86,6 +86,14 @@ config NFS_V4
 
 	  If unsure, say Y.
 
+config NFS_SWAP
+	bool "Provide swap over NFS support"
+	default n
+	depends on NFS_FS
+	select SUNRPC_SWAP
+	help
+	  This option enables swapon to work on files located on NFS mounts.
+
 config NFS_V4_1
 	bool "NFS client support for NFSv4.1 (EXPERIMENTAL)"
 	depends on NFS_FS && NFS_V4 && EXPERIMENTAL
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 3168f6e..167fe33 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -115,17 +115,28 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
  * @nr_segs: size of iovec array
  *
  * The presence of this routine in the address space ops vector means
- * the NFS client supports direct I/O.  However, we shunt off direct
- * read and write requests before the VFS gets them, so this method
- * should never be called.
+ * the NFS client supports direct I/O. However, for most direct IO, we
+ * shunt off direct read and write requests before the VFS gets them,
+ * so this method is only ever called for swap.
  */
 ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t pos, unsigned long nr_segs)
 {
+#ifndef CONFIG_NFS_SWAP
 	dprintk("NFS: nfs_direct_IO (%s) off/no(%Ld/%lu) EINVAL\n",
 			iocb->ki_filp->f_path.dentry->d_name.name,
 			(long long) pos, nr_segs);
 
 	return -EINVAL;
+#else
+	VM_BUG_ON(iocb->ki_left != PAGE_SIZE);
+	VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE);
+
+	if (rw == READ || rw == KERNEL_READ)
+		return nfs_file_direct_read(iocb, iov, nr_segs, pos,
+				rw == READ ? true : false);
+	return nfs_file_direct_write(iocb, iov, nr_segs, pos,
+				rw == WRITE ? true : false);
+#endif /* CONFIG_NFS_SWAP */
 }
 
 static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
@@ -303,7 +314,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {
  */
 static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *desc,
 						const struct iovec *iov,
-						loff_t pos)
+						loff_t pos, bool uio)
 {
 	struct nfs_direct_req *dreq = desc->pg_dreq;
 	struct nfs_open_context *ctx = dreq->ctx;
@@ -331,12 +342,20 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
 					  GFP_KERNEL);
 		if (!pagevec)
 			break;
-		down_read(&current->mm->mmap_sem);
-		result = get_user_pages(current, current->mm, user_addr,
+		if (uio) {
+			down_read(&current->mm->mmap_sem);
+			result = get_user_pages(current, current->mm, user_addr,
 					npages, 1, 0, pagevec, NULL);
-		up_read(&current->mm->mmap_sem);
-		if (result < 0)
-			break;
+			up_read(&current->mm->mmap_sem);
+			if (result < 0)
+				break;
+		} else {
+			WARN_ON(npages != 1);
+			result = get_kernel_page(user_addr, 1, pagevec);
+			if (WARN_ON(result != 1))
+				break;
+		}
+
 		if ((unsigned)result < npages) {
 			bytes = result * PAGE_SIZE;
 			if (bytes <= pgbase) {
@@ -386,7 +405,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
 static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 					      const struct iovec *iov,
 					      unsigned long nr_segs,
-					      loff_t pos)
+					      loff_t pos, bool uio)
 {
 	struct nfs_pageio_descriptor desc;
 	ssize_t result = -EINVAL;
@@ -400,7 +419,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 
 	for (seg = 0; seg < nr_segs; seg++) {
 		const struct iovec *vec = &iov[seg];
-		result = nfs_direct_read_schedule_segment(&desc, vec, pos);
+		result = nfs_direct_read_schedule_segment(&desc, vec, pos, uio);
 		if (result < 0)
 			break;
 		requested_bytes += result;
@@ -426,7 +445,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 }
 
 static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
-			       unsigned long nr_segs, loff_t pos)
+			       unsigned long nr_segs, loff_t pos, bool uio)
 {
 	ssize_t result = -ENOMEM;
 	struct inode *inode = iocb->ki_filp->f_mapping->host;
@@ -444,7 +463,7 @@ static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
-	result = nfs_direct_read_schedule_iovec(dreq, iov, nr_segs, pos);
+	result = nfs_direct_read_schedule_iovec(dreq, iov, nr_segs, pos, uio);
 	if (!result)
 		result = nfs_direct_wait(dreq);
 	NFS_I(inode)->read_io += result;
@@ -605,7 +624,7 @@ static void nfs_direct_write_complete(struct nfs_direct_req *dreq, struct inode
  */
 static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *desc,
 						 const struct iovec *iov,
-						 loff_t pos)
+						 loff_t pos, bool uio)
 {
 	struct nfs_direct_req *dreq = desc->pg_dreq;
 	struct nfs_open_context *ctx = dreq->ctx;
@@ -633,12 +652,19 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
 		if (!pagevec)
 			break;
 
-		down_read(&current->mm->mmap_sem);
-		result = get_user_pages(current, current->mm, user_addr,
-					npages, 0, 0, pagevec, NULL);
-		up_read(&current->mm->mmap_sem);
-		if (result < 0)
-			break;
+		if (uio) {
+			down_read(&current->mm->mmap_sem);
+			result = get_user_pages(current, current->mm, user_addr,
+						npages, 0, 0, pagevec, NULL);
+			up_read(&current->mm->mmap_sem);
+			if (result < 0)
+				break;
+		} else {
+			WARN_ON(npages != 1);
+			result = get_kernel_page(user_addr, 0, pagevec);
+			if (WARN_ON(result != 1))
+				break;
+		}
 
 		if ((unsigned)result < npages) {
 			bytes = result * PAGE_SIZE;
@@ -769,7 +795,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = {
 static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 					       const struct iovec *iov,
 					       unsigned long nr_segs,
-					       loff_t pos)
+					       loff_t pos, bool uio)
 {
 	struct nfs_pageio_descriptor desc;
 	struct inode *inode = dreq->inode;
@@ -785,7 +811,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 
 	for (seg = 0; seg < nr_segs; seg++) {
 		const struct iovec *vec = &iov[seg];
-		result = nfs_direct_write_schedule_segment(&desc, vec, pos);
+		result = nfs_direct_write_schedule_segment(&desc, vec, pos, uio);
 		if (result < 0)
 			break;
 		requested_bytes += result;
@@ -813,7 +839,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 
 static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
 				unsigned long nr_segs, loff_t pos,
-				size_t count)
+				size_t count, bool uio)
 {
 	ssize_t result = -ENOMEM;
 	struct inode *inode = iocb->ki_filp->f_mapping->host;
@@ -831,7 +857,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
-	result = nfs_direct_write_schedule_iovec(dreq, iov, nr_segs, pos);
+	result = nfs_direct_write_schedule_iovec(dreq, iov, nr_segs, pos, uio);
 	if (!result)
 		result = nfs_direct_wait(dreq);
 out_release:
@@ -862,7 +888,7 @@ out:
  * cache.
  */
 ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t pos)
+				unsigned long nr_segs, loff_t pos, bool uio)
 {
 	ssize_t retval = -EINVAL;
 	struct file *file = iocb->ki_filp;
@@ -887,7 +913,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov,
 
 	task_io_account_read(count);
 
-	retval = nfs_direct_read(iocb, iov, nr_segs, pos);
+	retval = nfs_direct_read(iocb, iov, nr_segs, pos, uio);
 	if (retval > 0)
 		iocb->ki_pos = pos + retval;
 
@@ -918,7 +944,7 @@ out:
  * is no atomic O_APPEND write facility in the NFS protocol.
  */
 ssize_t nfs_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t pos)
+				unsigned long nr_segs, loff_t pos, bool uio)
 {
 	ssize_t retval = -EINVAL;
 	struct file *file = iocb->ki_filp;
@@ -950,7 +976,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 
 	task_io_account_write(count);
 
-	retval = nfs_direct_write(iocb, iov, nr_segs, pos, count);
+	retval = nfs_direct_write(iocb, iov, nr_segs, pos, count, uio);
 	if (retval > 0) {
 		struct inode *inode = mapping->host;
 
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 406caac..d010335 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -194,7 +194,7 @@ nfs_file_read(struct kiocb *iocb, const struct iovec *iov,
 	ssize_t result;
 
 	if (iocb->ki_filp->f_flags & O_DIRECT)
-		return nfs_file_direct_read(iocb, iov, nr_segs, pos);
+		return nfs_file_direct_read(iocb, iov, nr_segs, pos, true);
 
 	dprintk("NFS: read(%s/%s, %lu@%lu)\n",
 		dentry->d_parent->d_name.name, dentry->d_name.name,
@@ -494,6 +494,20 @@ static int nfs_launder_page(struct page *page)
 	return nfs_wb_page(inode, page);
 }
 
+#ifdef CONFIG_NFS_SWAP
+static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
+						sector_t *span)
+{
+	*span = sis->pages;
+	return xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 1);
+}
+
+static void nfs_swap_deactivate(struct file *file)
+{
+	xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 0);
+}
+#endif
+
 const struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -508,6 +522,10 @@ const struct address_space_operations nfs_file_aops = {
 	.migratepage = nfs_migrate_page,
 	.launder_page = nfs_launder_page,
 	.error_remove_page = generic_error_remove_page,
+#ifdef CONFIG_NFS_SWAP
+	.swap_activate = nfs_swap_activate,
+	.swap_deactivate = nfs_swap_deactivate,
+#endif
 };
 
 /*
@@ -582,7 +600,7 @@ static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
 	size_t count = iov_length(iov, nr_segs);
 
 	if (iocb->ki_filp->f_flags & O_DIRECT)
-		return nfs_file_direct_write(iocb, iov, nr_segs, pos);
+		return nfs_file_direct_write(iocb, iov, nr_segs, pos, true);
 
 	dprintk("NFS: write(%s/%s, %lu@%Ld)\n",
 		dentry->d_parent->d_name.name, dentry->d_name.name,
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index b23cfc1..fae495a 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -477,10 +477,10 @@ extern ssize_t nfs_direct_IO(int, struct kiocb *, const struct iovec *, loff_t,
 			unsigned long);
 extern ssize_t nfs_file_direct_read(struct kiocb *iocb,
 			const struct iovec *iov, unsigned long nr_segs,
-			loff_t pos);
+			loff_t pos, bool uio);
 extern ssize_t nfs_file_direct_write(struct kiocb *iocb,
 			const struct iovec *iov, unsigned long nr_segs,
-			loff_t pos);
+			loff_t pos, bool uio);
 
 /*
  * linux/fs/nfs/dir.c
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 77d278d..cff40aa 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -174,6 +174,8 @@ struct rpc_xprt {
 	unsigned long		state;		/* transport state */
 	unsigned char		shutdown   : 1,	/* being shut down */
 				resvport   : 1; /* use a reserved port */
+	unsigned int		swapper;	/* we're swapping over this
+						   transport */
 	unsigned int		bind_index;	/* bind function index */
 
 	/*
@@ -316,6 +318,7 @@ void			xprt_release_rqst_cong(struct rpc_task *task);
 void			xprt_disconnect_done(struct rpc_xprt *xprt);
 void			xprt_force_disconnect(struct rpc_xprt *xprt);
 void			xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
+int			xs_swapper(struct rpc_xprt *xprt, int enable);
 
 /*
  * Reserved bit positions in xprt->state
diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
index 9fe8857..03d03e3 100644
--- a/net/sunrpc/Kconfig
+++ b/net/sunrpc/Kconfig
@@ -21,6 +21,11 @@ config SUNRPC_XPRT_RDMA
 
 	  If unsure, say N.
 
+config SUNRPC_SWAP
+	bool
+	depends on SUNRPC
+	select NETVM
+
 config RPCSEC_GSS_KRB5
 	tristate "Secure RPC: Kerberos V mechanism"
 	depends on SUNRPC && CRYPTO
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index f56f045..09e71d1 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -717,6 +717,8 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
 		atomic_inc(&clnt->cl_count);
 		if (clnt->cl_softrtry)
 			task->tk_flags |= RPC_TASK_SOFT;
+		if (task->tk_client->cl_xprt->swapper)
+			task->tk_flags |= RPC_TASK_SWAPPER;
 		/* Add to the client's list of all tasks */
 		spin_lock(&clnt->cl_lock);
 		list_add_tail(&task->tk_task, &clnt->cl_tasks);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 994cfea..83a4c43 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -812,7 +812,10 @@ static void rpc_async_schedule(struct work_struct *work)
 void *rpc_malloc(struct rpc_task *task, size_t size)
 {
 	struct rpc_buffer *buf;
-	gfp_t gfp = RPC_IS_SWAPPER(task) ? GFP_ATOMIC : GFP_NOWAIT;
+	gfp_t gfp = GFP_NOWAIT;
+
+	if (RPC_IS_SWAPPER(task))
+		gfp |= __GFP_MEMALLOC;
 
 	size += sizeof(struct rpc_buffer);
 	if (size <= RPC_BUFFER_MAXSIZE)
@@ -886,7 +889,7 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta
 static struct rpc_task *
 rpc_alloc_task(void)
 {
-	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOFS);
+	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOIO);
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 890b03f..b84df34 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1930,6 +1930,45 @@ out:
 	xprt_wake_pending_tasks(xprt, status);
 }
 
+#ifdef CONFIG_SUNRPC_SWAP
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+
+	if (xprt->swapper)
+		sk_set_memalloc(transport->inet);
+}
+
+/**
+ * xs_swapper - Tag this transport as being used for swap.
+ * @xprt: transport to tag
+ * @enable: enable/disable
+ *
+ */
+int xs_swapper(struct rpc_xprt *xprt, int enable)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+	int err = 0;
+
+	if (enable) {
+		xprt->swapper++;
+		xs_set_memalloc(xprt);
+	} else if (xprt->swapper) {
+		xprt->swapper--;
+		sk_clear_memalloc(transport->inet);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(xs_swapper);
+#else
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+}
+#endif
+
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 {
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -1954,6 +1993,8 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 		transport->sock = sock;
 		transport->inet = sk;
 
+		xs_set_memalloc(xprt);
+
 		write_unlock_bh(&sk->sk_callback_lock);
 	}
 	xs_udp_do_set_buffer_size(xprt);
@@ -1965,11 +2006,15 @@ static void xs_udp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -1988,6 +2033,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /*
@@ -2078,6 +2124,8 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 	if (!xprt_bound(xprt))
 		goto out;
 
+	xs_set_memalloc(xprt);
+
 	/* Tell the socket layer to start connecting... */
 	xprt->stat.connect_count++;
 	xprt->stat.connect_start = jiffies;
@@ -2108,11 +2156,15 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct socket *sock = transport->sock;
 	struct rpc_xprt *xprt = &transport->xprt;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2174,6 +2226,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /**
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 09/12] nfs: disable data cache revalidation for swapfiles
From: Mel Gorman @ 2012-06-20  9:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

The VM does not like PG_private set on PG_swapcache pages. As suggested
by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables
NFS data cache revalidation on swap files.  as it does not make
sense to have other clients change the file while it is being used as
swap. This avoids setting PG_private on swap pages, since there ought
to be no further races with invalidate_inode_pages2() to deal with.

Since we cannot set PG_private we cannot use page->private which
is already used by PG_swapcache pages to store the nfs_page. Thus
augment the new nfs_page_find_request logic.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/inode.c |    4 ++++
 fs/nfs/write.c |   49 +++++++++++++++++++++++++++++++++++--------------
 2 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e605d69..30cc0b1 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -883,6 +883,10 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
 	struct nfs_inode *nfsi = NFS_I(inode);
 	int ret = 0;
 
+	/* swapfiles are not supposed to be shared. */
+	if (IS_SWAPFILE(inode))
+		goto out;
+
 	if (nfs_mapping_need_revalidate_inode(inode)) {
 		ret = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
 		if (ret < 0)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 5fa5516..f45b9ca 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -139,15 +139,28 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
 	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
 }
 
-static struct nfs_page *nfs_page_find_request_locked(struct page *page)
+static struct nfs_page *
+nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page)
 {
 	struct nfs_page *req = NULL;
 
-	if (PagePrivate(page)) {
+	if (PagePrivate(page))
 		req = (struct nfs_page *)page_private(page);
-		if (req != NULL)
-			kref_get(&req->wb_kref);
+	else if (unlikely(PageSwapCache(page))) {
+		struct nfs_page *freq, *t;
+
+		/* Linearly search the commit list for the correct req */
+		list_for_each_entry_safe(freq, t, &nfsi->commit_info.list, wb_list) {
+			if (freq->wb_page == page) {
+				req = freq;
+				break;
+			}
+		}
 	}
+
+	if (req)
+		kref_get(&req->wb_kref);
+
 	return req;
 }
 
@@ -157,7 +170,7 @@ static struct nfs_page *nfs_page_find_request(struct page *page)
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
-	req = nfs_page_find_request_locked(page);
+	req = nfs_page_find_request_locked(NFS_I(inode), page);
 	spin_unlock(&inode->i_lock);
 	return req;
 }
@@ -258,7 +271,7 @@ static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblo
 
 	spin_lock(&inode->i_lock);
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			break;
 		if (nfs_lock_request(req))
@@ -412,9 +425,15 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
 	spin_lock(&inode->i_lock);
 	if (!nfsi->npages && nfs_have_delegation(inode, FMODE_WRITE))
 		inode->i_version++;
-	set_bit(PG_MAPPED, &req->wb_flags);
-	SetPagePrivate(req->wb_page);
-	set_page_private(req->wb_page, (unsigned long)req);
+	/*
+	 * Swap-space should not get truncated. Hence no need to plug the race
+	 * with invalidate/truncate.
+	 */
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_bit(PG_MAPPED, &req->wb_flags);
+		SetPagePrivate(req->wb_page);
+		set_page_private(req->wb_page, (unsigned long)req);
+	}
 	nfsi->npages++;
 	kref_get(&req->wb_kref);
 	spin_unlock(&inode->i_lock);
@@ -431,9 +450,11 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 	BUG_ON (!NFS_WBACK_BUSY(req));
 
 	spin_lock(&inode->i_lock);
-	set_page_private(req->wb_page, 0);
-	ClearPagePrivate(req->wb_page);
-	clear_bit(PG_MAPPED, &req->wb_flags);
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_page_private(req->wb_page, 0);
+		ClearPagePrivate(req->wb_page);
+		clear_bit(PG_MAPPED, &req->wb_flags);
+	}
 	nfsi->npages--;
 	spin_unlock(&inode->i_lock);
 	nfs_release_request(req);
@@ -729,7 +750,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 	spin_lock(&inode->i_lock);
 
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			goto out_unlock;
 
@@ -1744,7 +1765,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_offset(page);
+	loff_t range_start = page_file_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 08/12] nfs: teach the NFS client how to treat PG_swapcache pages
From: Mel Gorman @ 2012-06-20  9:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

Replace all relevant occurences of page->index and page->mapping in
the NFS client with the new page_file_index() and page_file_mapping()
functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/file.c     |    6 +++---
 fs/nfs/internal.h |    7 ++++---
 fs/nfs/pagelist.c |    2 +-
 fs/nfs/read.c     |    6 +++---
 fs/nfs/write.c    |   38 +++++++++++++++++++-------------------
 5 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a6708e6b..406caac 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -442,7 +442,7 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset)
 	if (offset != 0)
 		return;
 	/* Cancel any unstarted writes on this page */
-	nfs_wb_page_cancel(page->mapping->host, page);
+	nfs_wb_page_cancel(page_file_mapping(page)->host, page);
 
 	nfs_fscache_invalidate_page(page, page->mapping->host);
 }
@@ -484,7 +484,7 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
  */
 static int nfs_launder_page(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_inode *nfsi = NFS_I(inode);
 
 	dfprintk(PAGECACHE, "NFS: launder_page(%ld, %llu)\n",
@@ -533,7 +533,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
 
 	lock_page(page);
-	mapping = page->mapping;
+	mapping = page_file_mapping(page);
 	if (mapping != dentry->d_inode->i_mapping)
 		goto out_unlock;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 18f99ef..43ea79a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -463,13 +463,14 @@ void nfs_super_set_maxbytes(struct super_block *sb, __u64 maxfilesize)
 static inline
 unsigned int nfs_page_length(struct page *page)
 {
-	loff_t i_size = i_size_read(page->mapping->host);
+	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
+		pgoff_t page_index = page_file_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page->index < end_index)
+		if (page_index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page->index == end_index)
+		if (page_index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index aed913c..9ef8b3c 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -117,7 +117,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	 * long write-back delay. This will be adjusted in
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
-	req->wb_index	= page->index;
+	req->wb_index	= page_file_index(page);
 	page_cache_get(page);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 86ced78..c5b83ce 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -532,11 +532,11 @@ static const struct rpc_call_ops nfs_read_common_ops = {
 int nfs_readpage(struct file *file, struct page *page)
 {
 	struct nfs_open_context *ctx;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page->index);
+		page, PAGE_CACHE_SIZE, page_file_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
@@ -590,7 +590,7 @@ static int
 readpage_async_filler(void *data, struct page *page)
 {
 	struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *new;
 	unsigned int len;
 	int error;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 4d6861c..5fa5516 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -153,7 +153,7 @@ static struct nfs_page *nfs_page_find_request_locked(struct page *page)
 
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
@@ -165,16 +165,16 @@ static struct nfs_page *nfs_page_find_request(struct page *page)
 /* Adjust the file length if we're writing beyond the end */
 static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int count)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	loff_t end, i_size;
 	pgoff_t end_index;
 
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page->index < end_index)
+	if (i_size > 0 && page_file_index(page) < end_index)
 		goto out;
-	end = ((loff_t)page->index << PAGE_CACHE_SHIFT) + ((loff_t)offset+count);
+	end = page_file_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -187,7 +187,7 @@ out:
 static void nfs_set_pageerror(struct page *page)
 {
 	SetPageError(page);
-	nfs_zap_mapping(page->mapping->host, page->mapping);
+	nfs_zap_mapping(page_file_mapping(page)->host, page_file_mapping(page));
 }
 
 /* We can set the PG_uptodate flag if we see that a write request
@@ -228,7 +228,7 @@ static int nfs_set_page_writeback(struct page *page)
 	int ret = test_set_page_writeback(page);
 
 	if (!ret) {
-		struct inode *inode = page->mapping->host;
+		struct inode *inode = page_file_mapping(page)->host;
 		struct nfs_server *nfss = NFS_SERVER(inode);
 
 		if (atomic_long_inc_return(&nfss->writeback) >
@@ -242,7 +242,7 @@ static int nfs_set_page_writeback(struct page *page)
 
 static void nfs_end_page_writeback(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_server *nfss = NFS_SERVER(inode);
 
 	end_page_writeback(page);
@@ -252,7 +252,7 @@ static void nfs_end_page_writeback(struct page *page)
 
 static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblock)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req;
 	int ret;
 
@@ -313,13 +313,13 @@ out:
 
 static int nfs_do_writepage(struct page *page, struct writeback_control *wbc, struct nfs_pageio_descriptor *pgio)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int ret;
 
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page->index);
+	nfs_pageio_cond_complete(pgio, page_file_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -336,8 +336,8 @@ static int nfs_writepage_locked(struct page *page, struct writeback_control *wbc
 	struct nfs_pageio_descriptor pgio;
 	int err;
 
-	nfs_pageio_init_write(&pgio, page->mapping->host, wb_priority(wbc),
-			      &nfs_async_write_completion_ops);
+	nfs_pageio_init_write(&pgio, page_file_mapping(page)->host,
+			      wb_priority(wbc), &nfs_async_write_completion_ops);
 	err = nfs_do_writepage(page, wbc, &pgio);
 	nfs_pageio_complete(&pgio);
 	if (err < 0)
@@ -470,7 +470,7 @@ nfs_request_add_commit_list(struct nfs_page *req, struct list_head *dst,
 	spin_unlock(cinfo->lock);
 	if (!cinfo->dreq) {
 		inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-		inc_bdi_stat(req->wb_page->mapping->backing_dev_info,
+		inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
 			     BDI_RECLAIMABLE);
 		__mark_inode_dirty(req->wb_context->dentry->d_inode,
 				   I_DIRTY_DATASYNC);
@@ -537,7 +537,7 @@ static void
 nfs_clear_page_commit(struct page *page)
 {
 	dec_zone_page_state(page, NR_UNSTABLE_NFS);
-	dec_bdi_stat(page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+	dec_bdi_stat(page_file_mapping(page)->backing_dev_info, BDI_RECLAIMABLE);
 }
 
 static void
@@ -788,7 +788,7 @@ out_err:
 static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 		struct page *page, unsigned int offset, unsigned int bytes)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page	*req;
 
 	req = nfs_try_to_update_request(inode, page, offset, bytes);
@@ -841,7 +841,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 		nfs_release_request(req);
 		if (!do_flush)
 			return 0;
-		status = nfs_wb_page(page->mapping->host, page);
+		status = nfs_wb_page(page_file_mapping(page)->host, page);
 	} while (status == 0);
 	return status;
 }
@@ -871,7 +871,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 		unsigned int offset, unsigned int count)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
-	struct inode	*inode = page->mapping->host;
+	struct inode	*inode = page_file_mapping(page)->host;
 	int		status = 0;
 
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
@@ -879,7 +879,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 	dprintk("NFS:       nfs_updatepage(%s/%s %d@%lld)\n",
 		file->f_path.dentry->d_parent->d_name.name,
 		file->f_path.dentry->d_name.name, count,
-		(long long)(page_offset(page) + offset));
+		(long long)(page_file_offset(page) + offset));
 
 	/* If we're not using byte range locks, and we know the page
 	 * is up to date, it may be more efficient to extend the write
@@ -1475,7 +1475,7 @@ void nfs_retry_commit(struct list_head *page_list,
 		nfs_mark_request_commit(req, lseg, cinfo);
 		if (!cinfo->dreq) {
 			dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-			dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+			dec_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
 				     BDI_RECLAIMABLE);
 		}
 		nfs_unlock_and_release_request(req);
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 07/12] mm: Add support for direct_IO to highmem pages
From: Mel Gorman @ 2012-06-20  9:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

The patch "mm: Add support for a filesystem to activate swap files and
use direct_IO for writing swap pages" added support for using direct_IO
to write swap pages but it is insufficient for highmem pages.

To support highmem pages, this patch kmaps() the page before calling the
direct_IO() handler. As direct_IO deals with virtual addresses an
additional helper is necessary for get_kernel_pages() to lookup the
struct page for a kmap virtual address.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/highmem.h |    7 +++++++
 mm/highmem.c            |   12 ++++++++++++
 mm/memory.c             |    3 +--
 mm/page_io.c            |    3 ++-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index d3999b4..e186e3c 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -39,10 +39,17 @@ extern unsigned long totalhigh_pages;
 
 void kmap_flush_unused(void);
 
+struct page *kmap_to_page(void *addr);
+
 #else /* CONFIG_HIGHMEM */
 
 static inline unsigned int nr_free_highpages(void) { return 0; }
 
+static inline struct page *kmap_to_page(void *addr)
+{
+	return virt_to_page(addr);
+}
+
 #define totalhigh_pages 0UL
 
 #ifndef ARCH_HAS_KMAP
diff --git a/mm/highmem.c b/mm/highmem.c
index 57d82c6..d517cd1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -94,6 +94,18 @@ static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);
 		do { spin_unlock(&kmap_lock); (void)(flags); } while (0)
 #endif
 
+struct page *kmap_to_page(void *vaddr)
+{
+	unsigned long addr = (unsigned long)vaddr;
+
+	if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
+		int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+		return pte_page(pkmap_page_table[i]);
+	}
+
+	return virt_to_page(addr);
+}
+
 static void flush_all_zero_pkmaps(void)
 {
 	int i;
diff --git a/mm/memory.c b/mm/memory.c
index 2d55992..5948b8f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1855,8 +1855,7 @@ int get_kernel_pages(const struct kvec *kiov, int nr_segs, int write,
 		if (WARN_ON(kiov[seg].iov_len != PAGE_SIZE))
 			return seg;
 
-		/* virt_to_page sanity checks the PFN */
-		pages[seg] = virt_to_page(kiov[seg].iov_base);
+		pages[seg] = kmap_to_page(kiov[seg].iov_base);
 		page_cache_get(pages[seg]);
 	}
 
diff --git a/mm/page_io.c b/mm/page_io.c
index 4a37962..78eee32 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -205,7 +205,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
 		struct iovec iov = {
-			.iov_base = page_address(page),
+			.iov_base = kmap(page),
 			.iov_len  = PAGE_SIZE,
 		};
 
@@ -218,6 +218,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		ret = mapping->a_ops->direct_IO(KERNEL_WRITE,
 						&kiocb, &iov,
 						kiocb.ki_pos, 1);
+		kunmap(page);
 		if (ret == PAGE_SIZE) {
 			count_vm_event(PSWPOUT);
 			ret = 0;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox