Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Netlink for kernel<->user space communication?
From: Stephen Hemminger @ 2012-05-10 16:36 UTC (permalink / raw)
  To: Arvid Brodin; +Cc: netdev@vger.kernel.org
In-Reply-To: <4FAAFE76.9060508@xdin.com>

On Wed, 9 May 2012 23:32:08 +0000
Arvid Brodin <Arvid.Brodin@xdin.com> wrote:

> On 2012-05-08 00:33, Stephen Hemminger wrote:
> > On Mon, 7 May 2012 18:43:23 +0000
> > Arvid Brodin <Arvid.Brodin@xdin.com> wrote:
> > 
> >> On Tue, 24 Apr 2012 16:57:55 -0700
> >> Stephen Hemminger <shemminger@xxxxxxxxxx> wrote:
> >>> On Tue, 24 Apr 2012 23:52:34 +0000
> >>> Arvid Brodin <Arvid.Brodin@xxxxxxxx> wrote:
> >>>
> >>>> Hi.
> >>>>
> >>>> I'm writing a kernel driver for the HSR protocol, a standard for high availability
> >>>> networks. I want to send messages from the kernel to user space about broken network
> >>>> links. I also want user space to be able to ask the kernel about its view of the status of
> >>>> nodes on the network.
> >>>>
> >>>> Netlink seems like a good tool for this. (Is it?)
> >>>
> >>> Yes.
> >>>
> >>>> But do I use raw netlink? (Described here: http://www.linuxjournal.com/article/7356 - but
> >>>> this seems a bit out of date, the kernel API description differs from today's kernel
> >>>> implementation.)
> >>>
> >>> No. Your driver probably looks like a device so you should be
> >>> using rtnetlink messages.
> >>
> >> I'm already using rtnetlink messages to add and remove my device, which works fine (see
> >> e.g. http://www.spinics.net/lists/netdev/msg192817.html - although I didn't think it
> >> meaningful to include the iproute2 patch here, until the kernel part is ready).
> >>
> >> The protocol specifies transmission of "supervision frames" every 2 seconds, e.g. to check
> >> link integrity. Every such frame should be received from two directions in the ring - if
> >> only one is received, then there is a link problem.
> > 
> > Why not just manipulate the carrier or operational state (see Documentation/networking/operstate)
> > and use the existing notification on link changes. If you don't get heartbeat then change
> > the state of the device to indicate lower device is down with set_operstate(), the necessary
> > link everts propgate back as netlink events.
> 
> With HSR, all nodes in the network ring can detect a link problem anywhere in the ring. So
> I need a way to communicate link problems that does not concern the host's devices at all,
> but rather the state of the network as a whole. A typical message might say "Frames from
> node 01:23:45:67:89:AB is only received over Slave Interface 1!" This indicates a problem
> since all frames should be received over both slave interfaces. The broken link can be
> anywhere between this node and the indicated node. If user space is aware of the network
> topology, it can figure out exactly where the damage is by looking at which nodes' frames
> are received over which slave interface.
> 
> (Thanks for the operstates info though, I hadn't discovered IF_OPER_LOWERLAYERDOWN! I will
> use it to indicate a local slave is down.)

Sounds like a message that is specific to the protocol. Maybe just a log
message would suffice, or having a protocol specific event channel.

> >> I'd like to notify user space about every such occurence. Is there a rtnetlink message
> >> type that fits this? The stuff in rtnetlink.h seems to be mostly concerned with specific
> >> user space commands (there is something called RTNLGRP_NOTIFY but I couldn't find any
> >> instances of it being used in the kernel, nor any documentation).
> >>
> > 
> > I am trying to steer you to use existing API's because then existing programs and
> > infrastructure can deal with the new device type.
> 
> I really appreciate that! I want to use existing API's as far as possible. That's why I
> keep sending you all these questions. :)
> 
> 

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net: Convert compare_ether_addr to ether_addr_equal
From: David Miller @ 2012-05-10 16:33 UTC (permalink / raw)
  To: joe; +Cc: jussi.kivilinna, linville, netdev, linux-kernel, linux-wireless
In-Reply-To: <1336666288.22495.14.camel@joe2Laptop>

From: Joe Perches <joe@perches.com>
Date: Thu, 10 May 2012 09:11:28 -0700

> (cc's trimmed)
> 
> On Thu, 2012-05-10 at 17:32 +0300, Jussi Kivilinna wrote:
>> Quoting Joe Perches <joe@perches.com>:
>> > Use the new bool function ether_addr_equal to add
>> > some clarity and reduce the likelihood for misuse
>> > of compare_ether_addr for sorting.
> []
>> > diff --git a/drivers/net/wireless/rndis_wlan.c
> []
>> > @@ -2139,7 +2139,7 @@ resize_buf:
>> >  	while (check_bssid_list_item(bssid, bssid_len, buf, len)) {
>> >  		if (rndis_bss_info_update(usbdev, bssid) && match_bssid &&
>> >  		    matched) {
>> > -			if (compare_ether_addr(bssid->mac, match_bssid))
>> > +			if (!ether_addr_equal(bssid->mac, match_bssid))
>> 
>> While reviewing this, noticed that above original code is wrong. It  
>> should be !compare_ether_addr. So do I push patch fixing this through  
>> wireless-testing althought it will later cause conflict with this patch?
>> 
>> -Jussi
>> 
>> >  				*matched = true;
>> >  		}
>> >
> 
> Up to John.
> 
> Here's the patch I would send against net-next
> updating the test and the style a little.

I think in this specific case it's better to push this one directly
through net-next.  But yes, it's up to John.

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net: Convert compare_ether_addr to ether_addr_equal
From: David Miller @ 2012-05-10 16:30 UTC (permalink / raw)
  To: jussi.kivilinna
  Cc: joe, fubar, andy, benve, roprabhu, neepatel, nistrive, grundler,
	anirban.chakraborty, sony.chacko, linux-driver, linux-net-drivers,
	bhutchings, cmetcalf, linville, jirislaby, mickflemm, mcgrof,
	jouni, vthiagar, senthilb, chunkeey, stas.yakovlev, sgruszka,
	johannes.berg, wey-yi.w.guy, ilw, buytenh, Larry.Finger,
	chaoming_li, netdev, linux-kernel, linux-wireless, ath5k-devel,
	ath9k-devel
In-Reply-To: <20120510173201.16131du3cg90nybo@www.81.fi>

Never, EVER, quote an entire large patch just to make a comment
on one small hunk.

I very nearly missed what you had to say because when scrolling
through it it appeared as if you made no comments at all.

Again, NEVER, EVER, do this.  It's extremely anti-social.  Edit out
the irrelevant quoted content when replying to people, always.

^ permalink raw reply

* Re: [PATCH 1/2 net] 6lowpan: add missing pskb_may_pull() check
From: David Miller @ 2012-05-10 16:28 UTC (permalink / raw)
  To: alex.bluesman.smirnov; +Cc: eric.dumazet, netdev
In-Reply-To: <CAJmB2rCo0CfHP992Jix=mtshSS-wn92rf+3yg-ZK1_KLkXKRGw@mail.gmail.com>

From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Date: Thu, 10 May 2012 18:05:46 +0400

> Using BUG() macro I just want to indicate that something in the bottom
> of the stack went terribly wrong and you must check your code for
> bugs..

Then you should do something like:

	if (WARN_ON_ONCE(!pskb_may_pull(...))) {
		appropriate_error_handling();
		return;
	}

instead.

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net: Convert compare_ether_addr to ether_addr_equal
From: Joe Perches @ 2012-05-10 16:11 UTC (permalink / raw)
  To: Jussi Kivilinna
  Cc: David S. Miller, John W. Linville, netdev, linux-kernel,
	linux-wireless
In-Reply-To: <20120510173201.16131du3cg90nybo@www.81.fi>

(cc's trimmed)

On Thu, 2012-05-10 at 17:32 +0300, Jussi Kivilinna wrote:
> Quoting Joe Perches <joe@perches.com>:
> > Use the new bool function ether_addr_equal to add
> > some clarity and reduce the likelihood for misuse
> > of compare_ether_addr for sorting.
[]
> > diff --git a/drivers/net/wireless/rndis_wlan.c
[]
> > @@ -2139,7 +2139,7 @@ resize_buf:
> >  	while (check_bssid_list_item(bssid, bssid_len, buf, len)) {
> >  		if (rndis_bss_info_update(usbdev, bssid) && match_bssid &&
> >  		    matched) {
> > -			if (compare_ether_addr(bssid->mac, match_bssid))
> > +			if (!ether_addr_equal(bssid->mac, match_bssid))
> 
> While reviewing this, noticed that above original code is wrong. It  
> should be !compare_ether_addr. So do I push patch fixing this through  
> wireless-testing althought it will later cause conflict with this patch?
> 
> -Jussi
> 
> >  				*matched = true;
> >  		}
> >

Up to John.

Here's the patch I would send against net-next
updating the test and the style a little.

diff --git a/drivers/net/wireless/rndis_wlan.c b/drivers/net/wireless/rndis_wlan.c
index dcf0e7e..29cccc5 100644
--- a/drivers/net/wireless/rndis_wlan.c
+++ b/drivers/net/wireless/rndis_wlan.c
@@ -2137,11 +2137,10 @@ resize_buf:
 	 * received 'num_items' and walking through full bssid buffer instead.
 	 */
 	while (check_bssid_list_item(bssid, bssid_len, buf, len)) {
-		if (rndis_bss_info_update(usbdev, bssid) && match_bssid &&
-		    matched) {
-			if (!ether_addr_equal(bssid->mac, match_bssid))
-				*matched = true;
-		}
+		if (rndis_bss_info_update(usbdev, bssid) &&
+		    match_bssid && matched &&
+		    ether_addr_equal(bssid->mac, match_bssid))
+			*matched = true;
 
 		real_count++;
 		bssid = next_bssid_list_item(bssid, &bssid_len, buf, len);

^ permalink raw reply related

* Re: [PATCH 07/14] batman-adv: split neigh_new function into generic and batman iv specific parts
From: David Miller @ 2012-05-10 16:07 UTC (permalink / raw)
  To: sven-KaDOiPu9UxWEi8DpZVb4nw
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	lindner_marek-LWAfsSFWpa4
In-Reply-To: <9033916.UuGU18BIPo@bentobox>

From: Sven Eckelmann <sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org>
Date: Thu, 10 May 2012 09:34:25 +0200

> On Wednesday, May 09, 2012 08:41:11 PM David Miller wrote:
> [...]
>> The namespace pollution of the batman-adv code needs to improve,
>> and I'm putting my foot down starting with this change.
>> 
>> If you have a static function which is therefore private to a
>> source file, name it whatever you want.
>> 
>> But once it gets exported out of that file, you have to give it
>> an appropriate name.  Probably with a "batman_adv_" prefix or
>> similar.
> 
> I aggree, but would like to like to have a shorter prefix batadv_. I know that 
> you said "or similar" but there are still some developers that fear your 
> response to a patch that only adds the prefix batadv_ instead of the longer 
> version.

batadv_ is fine.

^ permalink raw reply

* Re: Information leakage from RDS protocol
From: Venkat Venkatsubra @ 2012-05-10 15:38 UTC (permalink / raw)
  To: Jay Fenlason
  Cc: Linus Torvalds, security, eugene, pmatouse, Netdev, David Miller
In-Reply-To: <20120509155709.GA29413@redhat.com>

On 5/9/2012 10:57 AM, Jay Fenlason wrote:
> On Wed, May 09, 2012 at 10:17:57AM -0500, Venkat Venkatsubra wrote:
>> On 5/8/2012 1:22 PM, Jay Fenlason wrote:
>>>> On Tue, May 8, 2012 at 9:10 AM, Jay Fenlason<fenlason@redhat.com>   wrote:
>>>>> recvfrom() on an RDS socket can return the contents of random(?)
>>>>> kernel memory to userspace if it was called with a address
>>>>> length larger than sizeof(struct sockaddr_in). ?rds_recvmsg() also
>>>>> fails to set the addr_len paramater properly before returning, but
>>>>> that's just a bug.
>>>>>
>>>>> There are also a number of cases wher recvfrom() can return an entirely
>>>>> bogus address. ?Anything in rds_recvmsg() that returns a
>>>>> non-negative value but does not go through the
>>>>> ? "sin = (struct sockaddr_in *)msg->msg_name;"
>>>>> code path at the end of the while(1) loop will return up to 128
>>>>> bytes of kernel memory to userspace.
>>>>>
>>>>> Also, on a receive race, the message that was copied to userspace but
>>>>> received by someone else is not zeroed, meaning that if the next
>>>>> message it receives is smaller, the tail of the raced message is
>>>>> leaked. ?I'm not sure how serious this is, but unexpectedly scribbling
>>>>> on userspace memory (even if it is part of a buffer that userspace
>>>>> asked us to write to) should be avoided.
>>>>>
>>> On Tue, May 08, 2012 at 11:04:01AM -0700, Linus Torvalds wrote:
>>>> Please cc David Miller too on these things, and make sure he knows
>>>> there's no embargo or anything (he won't touch it if there is). Maybe
>>>> you don't want public mailing lists, but in general, the more open we
>>>> can be, the better.
>>> Added.  Nobody has said anything about any embargo to me, either
>>> that they want one or that there shouldn't be one.  Personally, I
>>> don't see any reason to embargo this, but I'm not on any
>>> security-response teams.
>>>
>>>> This seems unfortunate, but at least the address thing is limited to
>>>> sizeof(sockaddr_storage) and is kernel stack - which in turn means
>>>> that while it potentially leaks kernel addresses (bad!), it almost
>>>> certainly won't leak anything fundamentally interesting (ie you can't
>>>> read arbitrary kernel memory and find plaintext passwords etc).
>>>>
>>>> I assume the fix is a trivial
>>>>
>>>>    msg->msg_namelen = sizeof(*sin);
>>>>
>>>> in rds_recvmsg() where it sets up the address?
>>> That fixes the case where it actually sets up the address, but won't
>>> fix the cases where it doesn't even do that.  I don't think anyone
>>> ever thought about what the source address should be for a message
>>> that was generated internally by the kernel.  I think the obvious
>>> possibilities are msg_namelen = 0 (no address) and 127.0.0.1
>>>
>>>> I do wonder if maybe recvmsg() should initialize msg_namelen to 0
>>>> instead of the size of the buffer before calling the low-level recvmsg
>>>> function - so that protocols would have to explicitly set the size to
>>>> the right value. But that would need much more validation.
>>> That would require checking/fixing all of the low-level functions,
>>> which will then have to know that the buffer pointed to by msg is at
>>> most sizeof(struct sockaddr_storage) bytes.  I think it's better to
>>> keep the size of the address buffer there, so the low-level functions
>>> can confirm that the address data they're about to stuff in there
>>> won't overflow the buffer.  (That way if we ever change the size of
>>> the buffer, only one place has to change.)
>>>
>>> And the whole rds recieve subsystem needs a bit of a rewrite to close
>>> the information-leaking receive race.  Keeping the semantics correct
>>> in regards to MSG_PEEK and multiple threads reading the socket at the
>>> same time may be tricky.
>>>
>>> 		-- JF
>>>
>> How about adding the suggested "msg->msg_namelen = sizeof(*sin);"
>> line at the top of rds_recvmsg ?
>> And "msg->msg_namelen = 0;" in the below "break;" cases ? I am
>> assuming the apps wouldn't need to look at msg_name in these cases.
>>                  if (!list_empty(&rs->rs_notify_queue)) {
>>                          ret = rds_notify_queue_get(rs, msg);
>>                          break;
>>                  }
>>
>>                  if (rs->rs_cong_notify) {
>>                          ret = rds_notify_cong(rs, msg);
>>                          break;
>>                  }
> Wouldn't it be better to set msg->msg_namelen = 0 at the top of the
> function, and only set it to sizeof(*sin) after msg->msg_name is
> filled in?  That'll prevent accidental disclosure of kernel memory via
> unanticipated code paths.
>
>> And, shouldn't an error be returned for the case below ? Currently
>> zero is returned.
>>
>>          if (msg_flags&  MSG_OOB)
>>                  goto out;
>> An error such as EOPNOTSUPP ?
> I don't know.  I'm not a networking expert.  From what I've found
> googling, EINVAL would be more correct that ENOTSUPP.
>
> This only leaves the datagram contents leak to userspace when multiple
> threads race on receiving a datagram and the subsequent datagram is
> smaller.  That one will be hard to fix, most notably because the
> obvious fixes I've looked at involve losing a datagram if either of
>    inc->i_conn->c_trans->inc_copy_to_user()
> or
>    rds_cmsg_recv()
> fail.  I don't know how likely either of those are, but losing
> datagrams seems like an inappropriate behavior for a reliable datagram
> subsystem.
>
Moving the discussion to netdev.

Venkat

^ permalink raw reply

* [PATCH net-next] net_sched: update bstats in dequeue()
From: Eric Dumazet @ 2012-05-10 15:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Class bytes/packets stats can be misleading because they are updated in
enqueue() while packet might be dropped later.

We already fixed all qdiscs but sch_atm.

This patch makes the final cleanup.

class rate estimators can now match qdisc ones.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_atm.c  |    4 ++--
 net/sched/sch_drr.c  |    4 ++--
 net/sched/sch_hfsc.c |    2 +-
 net/sched/sch_htb.c  |    2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index a77a4fb..8522a47 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -423,8 +423,6 @@ drop: __maybe_unused
 		}
 		return ret;
 	}
-	qdisc_bstats_update(sch, skb);
-	bstats_update(&flow->bstats, skb);
 	/*
 	 * Okay, this may seem weird. We pretend we've dropped the packet if
 	 * it goes via ATM. The reason for this is that the outer qdisc
@@ -472,6 +470,8 @@ static void sch_atm_dequeue(unsigned long data)
 			if (unlikely(!skb))
 				break;
 
+			qdisc_bstats_update(sch, skb);
+			bstats_update(&flow->bstats, skb);
 			pr_debug("atm_tc_dequeue: sending on class %p\n", flow);
 			/* remove any LL header somebody else has attached */
 			skb_pull(skb, skb_network_offset(skb));
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index c218987..9ce0b4f 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -376,8 +376,6 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		cl->deficit = cl->quantum;
 	}
 
-	bstats_update(&cl->bstats, skb);
-
 	sch->q.qlen++;
 	return err;
 }
@@ -403,6 +401,8 @@ static struct sk_buff *drr_dequeue(struct Qdisc *sch)
 			skb = qdisc_dequeue_peeked(cl->qdisc);
 			if (cl->qdisc->q.qlen == 0)
 				list_del(&cl->alist);
+
+			bstats_update(&cl->bstats, skb);
 			qdisc_bstats_update(sch, skb);
 			sch->q.qlen--;
 			return skb;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 8db3e2c..6c2ec45 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1609,7 +1609,6 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (cl->qdisc->q.qlen == 1)
 		set_active(cl, qdisc_pkt_len(skb));
 
-	bstats_update(&cl->bstats, skb);
 	sch->q.qlen++;
 
 	return NET_XMIT_SUCCESS;
@@ -1657,6 +1656,7 @@ hfsc_dequeue(struct Qdisc *sch)
 		return NULL;
 	}
 
+	bstats_update(&cl->bstats, skb);
 	update_vf(cl, qdisc_pkt_len(skb), cur_time);
 	if (realtime)
 		cl->cl_cumul += qdisc_pkt_len(skb);
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index acae5b0..992ab3d 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -574,7 +574,6 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		}
 		return ret;
 	} else {
-		bstats_update(&cl->bstats, skb);
 		htb_activate(q, cl);
 	}
 
@@ -835,6 +834,7 @@ next:
 	} while (cl != start);
 
 	if (likely(skb != NULL)) {
+		bstats_update(&cl->bstats, skb);
 		cl->un.leaf.deficit[level] -= qdisc_pkt_len(skb);
 		if (cl->un.leaf.deficit[level] < 0) {
 			cl->un.leaf.deficit[level] += cl->quantum;

^ permalink raw reply related

* Re: [PATCH] net: device - added support of clearing device statistics
From: Eric Dumazet @ 2012-05-10 15:18 UTC (permalink / raw)
  To: Sasikantha babu
  Cc: David S. Miller, Michał Mirosław, Jiri Pirko,
	Ben Hutchings, netdev, linux-kernel
In-Reply-To: <1336662961-15033-1-git-send-email-sasikanth.v19@gmail.com>

On Thu, 2012-05-10 at 20:46 +0530, Sasikantha babu wrote:
> This patch adds the support of clearing device statistics. Added new 
> entry ndo_clear_stats to net_device_ops for device drivers to provide
> there own method to clear stats otherwise internal statistics structure
> is cleared.
> 
> Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com>

This is forbidden and racy.

SNMP counters must be increasing.

^ permalink raw reply

* [PATCH] net: device - added support of clearing device statistics
From: Sasikantha babu @ 2012-05-10 15:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Michał Mirosław,
	Jiri Pirko, Ben Hutchings
  Cc: netdev, linux-kernel, Sasikantha babu

This patch adds the support of clearing device statistics. Added new 
entry ndo_clear_stats to net_device_ops for device drivers to provide
there own method to clear stats otherwise internal statistics structure
is cleared.

Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com>
---
 include/linux/netdevice.h |    3 +++
 net/core/dev.c            |   23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cbaa20..3366bd6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -935,6 +935,8 @@ struct net_device_ops {
 						     struct rtnl_link_stats64 *storage);
 	struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
 
+	void			(*ndo_clear_stats) (struct net_device *dev);
+
 	int			(*ndo_vlan_rx_add_vid)(struct net_device *dev,
 						       unsigned short vid);
 	int			(*ndo_vlan_rx_kill_vid)(struct net_device *dev,
@@ -2576,6 +2578,7 @@ extern void		dev_load(struct net *net, const char *name);
 extern void		dev_mcast_init(void);
 extern struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 					       struct rtnl_link_stats64 *storage);
+extern void dev_clear_stats(struct net_device *dev);
 extern void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 				    const struct net_device_stats *netdev_stats);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 9bb8f87..fc29ea4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5870,6 +5870,29 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 }
 EXPORT_SYMBOL(dev_get_stats);
 
+/**
+ *	dev_clear_stats	- Clear network device statistics
+ *	@dev: device to clear statistics from
+ *
+ *	Clears network statistics of device.
+ *	The device driver may provide its own method by setting
+ *	dev->netdev_ops->ndo_clear_stats;
+ *	otherwise the internal statistics structure is used.
+ */
+void dev_clear_stats(struct net_device *dev)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (ops->ndo_clear_stats)
+		ops->ndo_clear_stats(dev);
+	else
+		memset(&dev->stats, 0, sizeof(dev->stats));
+
+	atomic_long_set(&dev->rx_dropped, 0);
+	return;
+}
+EXPORT_SYMBOL(dev_clear_stats);
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev)
 {
 	struct netdev_queue *queue = dev_ingress_queue(dev);
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH 1/3] drivers/net: Convert compare_ether_addr to ether_addr_equal
From: Jussi Kivilinna @ 2012-05-10 14:32 UTC (permalink / raw)
  To: Joe Perches
  Cc: ath5k-devel-xDcbHBWguxEUs3QNXV6qNA, Stanislaw Gruszka,
	Roopa Prabhu, Nick-juf53994utBLZpfksSYvnA, Wey-Yi Guy,
	Christian Lamparter, Vasanthakumar Thiagarajan, Andy Gospodarek,
	Jiri Slaby, Christian-juf53994utBLZpfksSYvnA,
	Intel-juf53994utBLZpfksSYvnA, Jay Vosburgh, Jouni Malinen,
	Wireless, Nishank Trivedi, Stanislav Yakovlev, Grant Grundler,
	Johannes Berg, Sony Chacko, John W.  Linville, Chris Metcalf,
	Ben Hutchings, Balasubramanian, Sol
In-Reply-To: <7c9881a67c52c2f218480b6742155b6d6928122d.1336618708.git.joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>

Quoting Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>:

> Use the new bool function ether_addr_equal to add
> some clarity and reduce the likelihood for misuse
> of compare_ether_addr for sorting.
>
> Done via cocci script:
>
> $ cat compare_ether_addr.cocci
> @@
> expression a,b;
> @@
> -	!compare_ether_addr(a, b)
> +	ether_addr_equal(a, b)
>
> @@
> expression a,b;
> @@
> -	compare_ether_addr(a, b)
> +	!ether_addr_equal(a, b)
>
> @@
> expression a,b;
> @@
> -	!ether_addr_equal(a, b) == 0
> +	ether_addr_equal(a, b)
>
> @@
> expression a,b;
> @@
> -	!ether_addr_equal(a, b) != 0
> +	!ether_addr_equal(a, b)
>
> @@
> expression a,b;
> @@
> -	ether_addr_equal(a, b) == 0
> +	!ether_addr_equal(a, b)
>
> @@
> expression a,b;
> @@
> -	ether_addr_equal(a, b) != 0
> +	ether_addr_equal(a, b)
>
> @@
> expression a,b;
> @@
> -	!!ether_addr_equal(a, b)
> +	ether_addr_equal(a, b)
>
> Signed-off-by: Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
> ---
>  drivers/net/bonding/bond_main.c                   |    2 +-
>  drivers/net/ethernet/amd/depca.c                  |    3 ++-
>  drivers/net/ethernet/cisco/enic/enic_main.c       |   12 ++++--------
>  drivers/net/ethernet/dec/ewrk3.c                  |    3 ++-
>  drivers/net/ethernet/dec/tulip/de4x5.c            |    2 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c  |    5 ++---
>  drivers/net/ethernet/sfc/ethtool.c                |    2 +-
>  drivers/net/ethernet/sun/sunvnet.c                |    2 +-
>  drivers/net/ethernet/tile/tilepro.c               |    2 +-
>  drivers/net/ethernet/toshiba/ps3_gelic_wireless.c |    8 ++++----
>  drivers/net/tun.c                                 |    2 +-
>  drivers/net/wireless/at76c50x-usb.c               |    2 +-
>  drivers/net/wireless/ath/ath5k/base.c             |    6 +++---
>  drivers/net/wireless/ath/ath9k/recv.c             |    2 +-
>  drivers/net/wireless/ath/carl9170/rx.c            |    2 +-
>  drivers/net/wireless/ipw2x00/libipw_rx.c          |   16 ++++++++--------
>  drivers/net/wireless/iwlegacy/3945.c              |    4 ++--
>  drivers/net/wireless/iwlegacy/4965-mac.c          |    2 +-
>  drivers/net/wireless/iwlegacy/common.c            |   14 +++++++-------
>  drivers/net/wireless/iwlwifi/iwl-agn-rx.c         |    4 ++--
>  drivers/net/wireless/iwlwifi/iwl-agn-rxon.c       |    8 ++++----
>  drivers/net/wireless/iwlwifi/iwl-agn-sta.c        |    6 +++---
>  drivers/net/wireless/mwl8k.c                      |    2 +-
>  drivers/net/wireless/p54/txrx.c                   |    2 +-
>  drivers/net/wireless/rndis_wlan.c                 |   12 ++++++------
>  drivers/net/wireless/rtlwifi/base.c               |    2 +-
>  drivers/net/wireless/rtlwifi/ps.c                 |    2 +-
>  drivers/net/wireless/rtlwifi/rtl8192ce/trx.c      |   10 +++++-----
>  drivers/net/wireless/rtlwifi/rtl8192cu/mac.c      |   10 +++++-----
>  drivers/net/wireless/rtlwifi/rtl8192de/trx.c      |   11 ++++++-----
>  drivers/net/wireless/rtlwifi/rtl8192se/trx.c      |   11 ++++++-----
>  31 files changed, 85 insertions(+), 86 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c  
> b/drivers/net/bonding/bond_main.c
> index 16dbf53..bbb0043 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1961,7 +1961,7 @@ int bond_release(struct net_device *bond_dev,  
> struct net_device *slave_dev)
>  	write_lock_bh(&bond->lock);
>
>  	if (!bond->params.fail_over_mac) {
> -		if (!compare_ether_addr(bond_dev->dev_addr, slave->perm_hwaddr) &&
> +		if (ether_addr_equal(bond_dev->dev_addr, slave->perm_hwaddr) &&
>  		    bond->slave_cnt > 1)
>  			pr_warning("%s: Warning: the permanent HWaddr of %s - %pM - is  
> still in use by %s. Set the HWaddr of %s to a different address to  
> avoid conflicts.\n",
>  				   bond_dev->name, slave_dev->name,
> diff --git a/drivers/net/ethernet/amd/depca.c  
> b/drivers/net/ethernet/amd/depca.c
> index 86dd957..7f7b99a 100644
> --- a/drivers/net/ethernet/amd/depca.c
> +++ b/drivers/net/ethernet/amd/depca.c
> @@ -1079,7 +1079,8 @@ static int depca_rx(struct net_device *dev)
>  						} else {
>  							lp->pktStats.multicast++;
>  						}
> -					} else if (compare_ether_addr(buf, dev->dev_addr) == 0) {
> +					} else if (ether_addr_equal(buf,
> +								    dev->dev_addr)) {
>  						lp->pktStats.unicast++;
>  					}
>
> diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c  
> b/drivers/net/ethernet/cisco/enic/enic_main.c
> index d7ac6c1..8132c78 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_main.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_main.c
> @@ -944,8 +944,7 @@ static void  
> enic_update_multicast_addr_list(struct enic *enic)
>
>  	for (i = 0; i < enic->mc_count; i++) {
>  		for (j = 0; j < mc_count; j++)
> -			if (compare_ether_addr(enic->mc_addr[i],
> -				mc_addr[j]) == 0)
> +			if (ether_addr_equal(enic->mc_addr[i], mc_addr[j]))
>  				break;
>  		if (j == mc_count)
>  			enic_dev_del_addr(enic, enic->mc_addr[i]);
> @@ -953,8 +952,7 @@ static void  
> enic_update_multicast_addr_list(struct enic *enic)
>
>  	for (i = 0; i < mc_count; i++) {
>  		for (j = 0; j < enic->mc_count; j++)
> -			if (compare_ether_addr(mc_addr[i],
> -				enic->mc_addr[j]) == 0)
> +			if (ether_addr_equal(mc_addr[i], enic->mc_addr[j]))
>  				break;
>  		if (j == enic->mc_count)
>  			enic_dev_add_addr(enic, mc_addr[i]);
> @@ -999,8 +997,7 @@ static void enic_update_unicast_addr_list(struct  
> enic *enic)
>
>  	for (i = 0; i < enic->uc_count; i++) {
>  		for (j = 0; j < uc_count; j++)
> -			if (compare_ether_addr(enic->uc_addr[i],
> -				uc_addr[j]) == 0)
> +			if (ether_addr_equal(enic->uc_addr[i], uc_addr[j]))
>  				break;
>  		if (j == uc_count)
>  			enic_dev_del_addr(enic, enic->uc_addr[i]);
> @@ -1008,8 +1005,7 @@ static void  
> enic_update_unicast_addr_list(struct enic *enic)
>
>  	for (i = 0; i < uc_count; i++) {
>  		for (j = 0; j < enic->uc_count; j++)
> -			if (compare_ether_addr(uc_addr[i],
> -				enic->uc_addr[j]) == 0)
> +			if (ether_addr_equal(uc_addr[i], enic->uc_addr[j]))
>  				break;
>  		if (j == enic->uc_count)
>  			enic_dev_add_addr(enic, uc_addr[i]);
> diff --git a/drivers/net/ethernet/dec/ewrk3.c  
> b/drivers/net/ethernet/dec/ewrk3.c
> index 1879f84..17ae8c6 100644
> --- a/drivers/net/ethernet/dec/ewrk3.c
> +++ b/drivers/net/ethernet/dec/ewrk3.c
> @@ -1016,7 +1016,8 @@ static int ewrk3_rx(struct net_device *dev)
>  							} else {
>  								lp->pktStats.multicast++;
>  							}
> -						} else if (compare_ether_addr(p, dev->dev_addr) == 0) {
> +						} else if (ether_addr_equal(p,
> +									    dev->dev_addr)) {
>  							lp->pktStats.unicast++;
>  						}
>  						lp->pktStats.bins[0]++;		/* Duplicates stats.rx_packets */
> diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c  
> b/drivers/net/ethernet/dec/tulip/de4x5.c
> index 18b106c..d3cd489 100644
> --- a/drivers/net/ethernet/dec/tulip/de4x5.c
> +++ b/drivers/net/ethernet/dec/tulip/de4x5.c
> @@ -1874,7 +1874,7 @@ de4x5_local_stats(struct net_device *dev, char  
> *buf, int pkt_len)
>  	} else {
>  	    lp->pktStats.multicast++;
>  	}
> -    } else if (compare_ether_addr(buf, dev->dev_addr) == 0) {
> +    } else if (ether_addr_equal(buf, dev->dev_addr)) {
>          lp->pktStats.unicast++;
>      }
>
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c  
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 5c47135..46e77a2 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -1965,7 +1965,7 @@ qlcnic_send_filter(struct qlcnic_adapter *adapter,
>  	__le16 vlan_id = 0;
>  	u8 hindex;
>
> -	if (!compare_ether_addr(phdr->h_source, adapter->mac_addr))
> +	if (ether_addr_equal(phdr->h_source, adapter->mac_addr))
>  		return;
>
>  	if (adapter->fhash.fnum >= adapter->fhash.fmax)
> @@ -2235,8 +2235,7 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct  
> net_device *netdev)
>
>  	if (adapter->flags & QLCNIC_MACSPOOF) {
>  		phdr = (struct ethhdr *)skb->data;
> -		if (compare_ether_addr(phdr->h_source,
> -					adapter->mac_addr))
> +		if (!ether_addr_equal(phdr->h_source, adapter->mac_addr))
>  			goto drop_packet;
>  	}
>
> diff --git a/drivers/net/ethernet/sfc/ethtool.c  
> b/drivers/net/ethernet/sfc/ethtool.c
> index f22f45f..62d4b81 100644
> --- a/drivers/net/ethernet/sfc/ethtool.c
> +++ b/drivers/net/ethernet/sfc/ethtool.c
> @@ -1023,7 +1023,7 @@ static int efx_ethtool_set_class_rule(struct  
> efx_nic *efx,
>  			return -EINVAL;
>
>  		/* Is it a default UC or MC filter? */
> -		if (!compare_ether_addr(mac_mask->h_dest, mac_addr_mc_mask) &&
> +		if (ether_addr_equal(mac_mask->h_dest, mac_addr_mc_mask) &&
>  		    vlan_tag_mask == 0) {
>  			if (is_multicast_ether_addr(mac_entry->h_dest))
>  				rc = efx_filter_set_mc_def(&spec);
> diff --git a/drivers/net/ethernet/sun/sunvnet.c  
> b/drivers/net/ethernet/sun/sunvnet.c
> index 38e3ae9..a108db3 100644
> --- a/drivers/net/ethernet/sun/sunvnet.c
> +++ b/drivers/net/ethernet/sun/sunvnet.c
> @@ -618,7 +618,7 @@ struct vnet_port *__tx_port_find(struct vnet  
> *vp, struct sk_buff *skb)
>  	struct vnet_port *port;
>
>  	hlist_for_each_entry(port, n, hp, hash) {
> -		if (!compare_ether_addr(port->raddr, skb->data))
> +		if (ether_addr_equal(port->raddr, skb->data))
>  			return port;
>  	}
>  	port = NULL;
> diff --git a/drivers/net/ethernet/tile/tilepro.c  
> b/drivers/net/ethernet/tile/tilepro.c
> index 3d501ec..96070e9 100644
> --- a/drivers/net/ethernet/tile/tilepro.c
> +++ b/drivers/net/ethernet/tile/tilepro.c
> @@ -843,7 +843,7 @@ static bool tile_net_poll_aux(struct  
> tile_net_cpu *info, int index)
>  		if (!is_multicast_ether_addr(buf)) {
>  			/* Filter packets not for our address. */
>  			const u8 *mine = dev->dev_addr;
> -			filter = compare_ether_addr(mine, buf);
> +			filter = !ether_addr_equal(mine, buf);
>  		}
>  	}
>
> diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c  
> b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
> index 5c14f82..961c832 100644
> --- a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
> +++ b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
> @@ -1590,8 +1590,8 @@ static void  
> gelic_wl_scan_complete_event(struct gelic_wl_info *wl)
>  		found = 0;
>  		oldest = NULL;
>  		list_for_each_entry(target, &wl->network_list, list) {
> -			if (!compare_ether_addr(&target->hwinfo->bssid[2],
> -						&scan_info->bssid[2])) {
> +			if (ether_addr_equal(&target->hwinfo->bssid[2],
> +					     &scan_info->bssid[2])) {
>  				found = 1;
>  				pr_debug("%s: same BBS found scanned list\n",
>  					 __func__);
> @@ -1691,8 +1691,8 @@ struct gelic_wl_scan_info  
> *gelic_wl_find_best_bss(struct gelic_wl_info *wl)
>
>  		/* If bss specified, check it only */
>  		if (test_bit(GELIC_WL_STAT_BSSID_SET, &wl->stat)) {
> -			if (!compare_ether_addr(&scan_info->hwinfo->bssid[2],
> -						wl->bssid)) {
> +			if (ether_addr_equal(&scan_info->hwinfo->bssid[2],
> +					     wl->bssid)) {
>  				best_bss = scan_info;
>  				pr_debug("%s: bssid matched\n", __func__);
>  				break;
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index bb8c72c..987aeef 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -313,7 +313,7 @@ static int run_filter(struct tap_filter *filter,  
> const struct sk_buff *skb)
>
>  	/* Exact match */
>  	for (i = 0; i < filter->count; i++)
> -		if (!compare_ether_addr(eh->h_dest, filter->addr[i]))
> +		if (ether_addr_equal(eh->h_dest, filter->addr[i]))
>  			return 1;
>
>  	/* Inexact match (multicast only) */
> diff --git a/drivers/net/wireless/at76c50x-usb.c  
> b/drivers/net/wireless/at76c50x-usb.c
> index faa8bcb..5ad74c8 100644
> --- a/drivers/net/wireless/at76c50x-usb.c
> +++ b/drivers/net/wireless/at76c50x-usb.c
> @@ -1751,7 +1751,7 @@ static void at76_mac80211_tx(struct  
> ieee80211_hw *hw, struct sk_buff *skb)
>  	 * following workaround is necessary. If the TX frame is an
>  	 * authentication frame extract the bssid and send the CMD_JOIN. */
>  	if (mgmt->frame_control & cpu_to_le16(IEEE80211_STYPE_AUTH)) {
> -		if (compare_ether_addr(priv->bssid, mgmt->bssid)) {
> +		if (!ether_addr_equal(priv->bssid, mgmt->bssid)) {
>  			memcpy(priv->bssid, mgmt->bssid, ETH_ALEN);
>  			ieee80211_queue_work(hw, &priv->work_join_bssid);
>  			dev_kfree_skb_any(skb);
> diff --git a/drivers/net/wireless/ath/ath5k/base.c  
> b/drivers/net/wireless/ath/ath5k/base.c
> index 49e3b19..0ba81a6 100644
> --- a/drivers/net/wireless/ath/ath5k/base.c
> +++ b/drivers/net/wireless/ath/ath5k/base.c
> @@ -462,7 +462,7 @@ void ath5k_vif_iter(void *data, u8 *mac, struct  
> ieee80211_vif *vif)
>  	}
>
>  	if (iter_data->need_set_hw_addr && iter_data->hw_macaddr)
> -		if (compare_ether_addr(iter_data->hw_macaddr, mac) == 0)
> +		if (ether_addr_equal(iter_data->hw_macaddr, mac))
>  			iter_data->need_set_hw_addr = false;
>
>  	if (!iter_data->any_assoc) {
> @@ -1170,7 +1170,7 @@ ath5k_check_ibss_tsf(struct ath5k_hw *ah,  
> struct sk_buff *skb,
>
>  	if (ieee80211_is_beacon(mgmt->frame_control) &&
>  	    le16_to_cpu(mgmt->u.beacon.capab_info) & WLAN_CAPABILITY_IBSS &&
> -	    compare_ether_addr(mgmt->bssid, common->curbssid) == 0) {
> +	    ether_addr_equal(mgmt->bssid, common->curbssid)) {
>  		/*
>  		 * Received an IBSS beacon with the same BSSID. Hardware *must*
>  		 * have updated the local TSF. We have to work around various
> @@ -1234,7 +1234,7 @@ ath5k_update_beacon_rssi(struct ath5k_hw *ah,  
> struct sk_buff *skb, int rssi)
>
>  	/* only beacons from our BSSID */
>  	if (!ieee80211_is_beacon(mgmt->frame_control) ||
> -	    compare_ether_addr(mgmt->bssid, common->curbssid) != 0)
> +	    !ether_addr_equal(mgmt->bssid, common->curbssid))
>  		return;
>
>  	ewma_add(&ah->ah_beacon_rssi_avg, rssi);
> diff --git a/drivers/net/wireless/ath/ath9k/recv.c  
> b/drivers/net/wireless/ath/ath9k/recv.c
> index 544e549..e1fcc68 100644
> --- a/drivers/net/wireless/ath/ath9k/recv.c
> +++ b/drivers/net/wireless/ath/ath9k/recv.c
> @@ -1833,7 +1833,7 @@ int ath_rx_tasklet(struct ath_softc *sc, int  
> flush, bool hp)
>  		if (ieee80211_is_beacon(hdr->frame_control)) {
>  			RX_STAT_INC(rx_beacons);
>  			if (!is_zero_ether_addr(common->curbssid) &&
> -			    !compare_ether_addr(hdr->addr3, common->curbssid))
> +			    ether_addr_equal(hdr->addr3, common->curbssid))
>  				rs.is_mybeacon = true;
>  			else
>  				rs.is_mybeacon = false;
> diff --git a/drivers/net/wireless/ath/carl9170/rx.c  
> b/drivers/net/wireless/ath/carl9170/rx.c
> index dc99030..84b22ee 100644
> --- a/drivers/net/wireless/ath/carl9170/rx.c
> +++ b/drivers/net/wireless/ath/carl9170/rx.c
> @@ -538,7 +538,7 @@ static void carl9170_ps_beacon(struct ar9170  
> *ar, void *data, unsigned int len)
>  		return;
>
>  	/* and only beacons from the associated BSSID, please */
> -	if (compare_ether_addr(hdr->addr3, ar->common.curbssid) ||
> +	if (!ether_addr_equal(hdr->addr3, ar->common.curbssid) ||
>  	    !ar->common.curaid)
>  		return;
>
> diff --git a/drivers/net/wireless/ipw2x00/libipw_rx.c  
> b/drivers/net/wireless/ipw2x00/libipw_rx.c
> index c4955d2..02e0579 100644
> --- a/drivers/net/wireless/ipw2x00/libipw_rx.c
> +++ b/drivers/net/wireless/ipw2x00/libipw_rx.c
> @@ -77,8 +77,8 @@ static struct libipw_frag_entry  
> *libipw_frag_cache_find(struct
>
>  		if (entry->skb != NULL && entry->seq == seq &&
>  		    (entry->last_frag + 1 == frag || frag == -1) &&
> -		    !compare_ether_addr(entry->src_addr, src) &&
> -		    !compare_ether_addr(entry->dst_addr, dst))
> +		    ether_addr_equal(entry->src_addr, src) &&
> +		    ether_addr_equal(entry->dst_addr, dst))
>  			return entry;
>  	}
>
> @@ -245,12 +245,12 @@ static int libipw_is_eapol_frame(struct  
> libipw_device *ieee,
>  	/* check that the frame is unicast frame to us */
>  	if ((fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) ==
>  	    IEEE80211_FCTL_TODS &&
> -	    !compare_ether_addr(hdr->addr1, dev->dev_addr) &&
> -	    !compare_ether_addr(hdr->addr3, dev->dev_addr)) {
> +	    ether_addr_equal(hdr->addr1, dev->dev_addr) &&
> +	    ether_addr_equal(hdr->addr3, dev->dev_addr)) {
>  		/* ToDS frame with own addr BSSID and DA */
>  	} else if ((fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) ==
>  		   IEEE80211_FCTL_FROMDS &&
> -		   !compare_ether_addr(hdr->addr1, dev->dev_addr)) {
> +		   ether_addr_equal(hdr->addr1, dev->dev_addr)) {
>  		/* FromDS frame with own addr as DA */
>  	} else
>  		return 0;
> @@ -523,8 +523,8 @@ int libipw_rx(struct libipw_device *ieee, struct  
> sk_buff *skb,
>
>  	if (ieee->iw_mode == IW_MODE_MASTER && !wds &&
>  	    (fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) ==
> -	    IEEE80211_FCTL_FROMDS && ieee->stadev
> -	    && !compare_ether_addr(hdr->addr2, ieee->assoc_ap_addr)) {
> +	    IEEE80211_FCTL_FROMDS && ieee->stadev &&
> +	    ether_addr_equal(hdr->addr2, ieee->assoc_ap_addr)) {
>  		/* Frame from BSSID of the AP for which we are a client */
>  		skb->dev = dev = ieee->stadev;
>  		stats = hostap_get_stats(dev);
> @@ -1468,7 +1468,7 @@ static inline int is_same_network(struct  
> libipw_network *src,
>  	 * as one network */
>  	return ((src->ssid_len == dst->ssid_len) &&
>  		(src->channel == dst->channel) &&
> -		!compare_ether_addr(src->bssid, dst->bssid) &&
> +		ether_addr_equal(src->bssid, dst->bssid) &&
>  		!memcmp(src->ssid, dst->ssid, src->ssid_len));
>  }
>
> diff --git a/drivers/net/wireless/iwlegacy/3945.c  
> b/drivers/net/wireless/iwlegacy/3945.c
> index b25c01b..87e5398 100644
> --- a/drivers/net/wireless/iwlegacy/3945.c
> +++ b/drivers/net/wireless/iwlegacy/3945.c
> @@ -453,10 +453,10 @@ il3945_is_network_packet(struct il_priv *il,  
> struct ieee80211_hdr *header)
>  	switch (il->iw_mode) {
>  	case NL80211_IFTYPE_ADHOC:	/* Header: Dest. | Source    | BSSID */
>  		/* packets to our IBSS update information */
> -		return !compare_ether_addr(header->addr3, il->bssid);
> +		return ether_addr_equal(header->addr3, il->bssid);
>  	case NL80211_IFTYPE_STATION:	/* Header: Dest. | AP{BSSID} | Source */
>  		/* packets to our IBSS update information */
> -		return !compare_ether_addr(header->addr2, il->bssid);
> +		return ether_addr_equal(header->addr2, il->bssid);
>  	default:
>  		return 1;
>  	}
> diff --git a/drivers/net/wireless/iwlegacy/4965-mac.c  
> b/drivers/net/wireless/iwlegacy/4965-mac.c
> index f2baf94..509301a 100644
> --- a/drivers/net/wireless/iwlegacy/4965-mac.c
> +++ b/drivers/net/wireless/iwlegacy/4965-mac.c
> @@ -2565,7 +2565,7 @@ il4965_find_station(struct il_priv *il, const u8 *addr)
>  	spin_lock_irqsave(&il->sta_lock, flags);
>  	for (i = start; i < il->hw_params.max_stations; i++)
>  		if (il->stations[i].used &&
> -		    (!compare_ether_addr(il->stations[i].sta.sta.addr, addr))) {
> +		    ether_addr_equal(il->stations[i].sta.sta.addr, addr)) {
>  			ret = i;
>  			goto out;
>  		}
> diff --git a/drivers/net/wireless/iwlegacy/common.c  
> b/drivers/net/wireless/iwlegacy/common.c
> index eaf24945..cbf2dc1 100644
> --- a/drivers/net/wireless/iwlegacy/common.c
> +++ b/drivers/net/wireless/iwlegacy/common.c
> @@ -1896,8 +1896,8 @@ il_prep_station(struct il_priv *il, const u8  
> *addr, bool is_ap,
>  		sta_id = il->hw_params.bcast_id;
>  	else
>  		for (i = IL_STA_ID; i < il->hw_params.max_stations; i++) {
> -			if (!compare_ether_addr
> -			    (il->stations[i].sta.sta.addr, addr)) {
> +			if (ether_addr_equal(il->stations[i].sta.sta.addr,
> +					     addr)) {
>  				sta_id = i;
>  				break;
>  			}
> @@ -1926,7 +1926,7 @@ il_prep_station(struct il_priv *il, const u8  
> *addr, bool is_ap,
>
>  	if ((il->stations[sta_id].used & IL_STA_DRIVER_ACTIVE) &&
>  	    (il->stations[sta_id].used & IL_STA_UCODE_ACTIVE) &&
> -	    !compare_ether_addr(il->stations[sta_id].sta.sta.addr, addr)) {
> +	    ether_addr_equal(il->stations[sta_id].sta.sta.addr, addr)) {
>  		D_ASSOC("STA %d (%pM) already added, not adding again.\n",
>  			sta_id, addr);
>  		return sta_id;
> @@ -3744,10 +3744,10 @@ il_full_rxon_required(struct il_priv *il)
>
>  	/* These items are only settable from the full RXON command */
>  	CHK(!il_is_associated(il));
> -	CHK(compare_ether_addr(staging->bssid_addr, active->bssid_addr));
> -	CHK(compare_ether_addr(staging->node_addr, active->node_addr));
> -	CHK(compare_ether_addr
> -	    (staging->wlap_bssid_addr, active->wlap_bssid_addr));
> +	CHK(!ether_addr_equal(staging->bssid_addr, active->bssid_addr));
> +	CHK(!ether_addr_equal(staging->node_addr, active->node_addr));
> +	CHK(!ether_addr_equal(staging->wlap_bssid_addr,
> +			      active->wlap_bssid_addr));
>  	CHK_NEQ(staging->dev_type, active->dev_type);
>  	CHK_NEQ(staging->channel, active->channel);
>  	CHK_NEQ(staging->air_propagation, active->air_propagation);
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-rx.c  
> b/drivers/net/wireless/iwlwifi/iwl-agn-rx.c
> index 0c252c5..779f819 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn-rx.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn-rx.c
> @@ -779,8 +779,8 @@ static void  
> iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
>  	*/
>  	if (unlikely(ieee80211_is_beacon(fc) && priv->passive_no_rx)) {
>  		for_each_context(priv, ctx) {
> -			if (compare_ether_addr(hdr->addr3,
> -					       ctx->active.bssid_addr))
> +			if (!ether_addr_equal(hdr->addr3,
> +					      ctx->active.bssid_addr))
>  				continue;
>  			iwlagn_lift_passive_no_rx(priv);
>  		}
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-rxon.c  
> b/drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
> index 0f7c444..74fbee6 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
> @@ -881,10 +881,10 @@ int iwl_full_rxon_required(struct iwl_priv *priv,
>
>  	/* These items are only settable from the full RXON command */
>  	CHK(!iwl_is_associated_ctx(ctx));
> -	CHK(compare_ether_addr(staging->bssid_addr, active->bssid_addr));
> -	CHK(compare_ether_addr(staging->node_addr, active->node_addr));
> -	CHK(compare_ether_addr(staging->wlap_bssid_addr,
> -				active->wlap_bssid_addr));
> +	CHK(!ether_addr_equal(staging->bssid_addr, active->bssid_addr));
> +	CHK(!ether_addr_equal(staging->node_addr, active->node_addr));
> +	CHK(!ether_addr_equal(staging->wlap_bssid_addr,
> +			      active->wlap_bssid_addr));
>  	CHK_NEQ(staging->dev_type, active->dev_type);
>  	CHK_NEQ(staging->channel, active->channel);
>  	CHK_NEQ(staging->air_propagation, active->air_propagation);
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-sta.c  
> b/drivers/net/wireless/iwlwifi/iwl-agn-sta.c
> index 67e6f1d..b31584e 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn-sta.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn-sta.c
> @@ -322,8 +322,8 @@ u8 iwl_prep_station(struct iwl_priv *priv,  
> struct iwl_rxon_context *ctx,
>  		sta_id = ctx->bcast_sta_id;
>  	else
>  		for (i = IWL_STA_ID; i < IWLAGN_STATION_COUNT; i++) {
> -			if (!compare_ether_addr(priv->stations[i].sta.sta.addr,
> -						addr)) {
> +			if (ether_addr_equal(priv->stations[i].sta.sta.addr,
> +					     addr)) {
>  				sta_id = i;
>  				break;
>  			}
> @@ -353,7 +353,7 @@ u8 iwl_prep_station(struct iwl_priv *priv,  
> struct iwl_rxon_context *ctx,
>
>  	if ((priv->stations[sta_id].used & IWL_STA_DRIVER_ACTIVE) &&
>  	    (priv->stations[sta_id].used & IWL_STA_UCODE_ACTIVE) &&
> -	    !compare_ether_addr(priv->stations[sta_id].sta.sta.addr, addr)) {
> +	    ether_addr_equal(priv->stations[sta_id].sta.sta.addr, addr)) {
>  		IWL_DEBUG_ASSOC(priv, "STA %d (%pM) already added, not "
>  				"adding again.\n", sta_id, addr);
>  		return sta_id;
> diff --git a/drivers/net/wireless/mwl8k.c b/drivers/net/wireless/mwl8k.c
> index e30cc32..cf7bdc6 100644
> --- a/drivers/net/wireless/mwl8k.c
> +++ b/drivers/net/wireless/mwl8k.c
> @@ -1235,7 +1235,7 @@ mwl8k_capture_bssid(struct mwl8k_priv *priv,  
> struct ieee80211_hdr *wh)
>  {
>  	return priv->capture_beacon &&
>  		ieee80211_is_beacon(wh->frame_control) &&
> -		!compare_ether_addr(wh->addr3, priv->capture_bssid);
> +		ether_addr_equal(wh->addr3, priv->capture_bssid);
>  }
>
>  static inline void mwl8k_save_beacon(struct ieee80211_hw *hw,
> diff --git a/drivers/net/wireless/p54/txrx.c  
> b/drivers/net/wireless/p54/txrx.c
> index 7c8f118..82a1cac 100644
> --- a/drivers/net/wireless/p54/txrx.c
> +++ b/drivers/net/wireless/p54/txrx.c
> @@ -308,7 +308,7 @@ static void p54_pspoll_workaround(struct  
> p54_common *priv, struct sk_buff *skb)
>  		return;
>
>  	/* only consider beacons from the associated BSSID */
> -	if (compare_ether_addr(hdr->addr3, priv->bssid))
> +	if (!ether_addr_equal(hdr->addr3, priv->bssid))
>  		return;
>
>  	tim = p54_find_ie(skb, WLAN_EID_TIM);
> diff --git a/drivers/net/wireless/rndis_wlan.c  
> b/drivers/net/wireless/rndis_wlan.c
> index d66e298..dcf0e7e 100644
> --- a/drivers/net/wireless/rndis_wlan.c
> +++ b/drivers/net/wireless/rndis_wlan.c
> @@ -1801,8 +1801,8 @@ static struct ndis_80211_pmkid  
> *remove_pmkid(struct usbnet *usbdev,
>  		count = max_pmkids;
>
>  	for (i = 0; i < count; i++)
> -		if (!compare_ether_addr(pmkids->bssid_info[i].bssid,
> -							pmksa->bssid))
> +		if (ether_addr_equal(pmkids->bssid_info[i].bssid,
> +				     pmksa->bssid))
>  			break;
>
>  	/* pmkid not found */
> @@ -1843,8 +1843,8 @@ static struct ndis_80211_pmkid  
> *update_pmkid(struct usbnet *usbdev,
>
>  	/* update with new pmkid */
>  	for (i = 0; i < count; i++) {
> -		if (compare_ether_addr(pmkids->bssid_info[i].bssid,
> -							pmksa->bssid))
> +		if (!ether_addr_equal(pmkids->bssid_info[i].bssid,
> +				      pmksa->bssid))
>  			continue;
>
>  		memcpy(pmkids->bssid_info[i].pmkid, pmksa->pmkid,
> @@ -2139,7 +2139,7 @@ resize_buf:
>  	while (check_bssid_list_item(bssid, bssid_len, buf, len)) {
>  		if (rndis_bss_info_update(usbdev, bssid) && match_bssid &&
>  		    matched) {
> -			if (compare_ether_addr(bssid->mac, match_bssid))
> +			if (!ether_addr_equal(bssid->mac, match_bssid))

While reviewing this, noticed that above original code is wrong. It  
should be !compare_ether_addr. So do I push patch fixing this through  
wireless-testing althought it will later cause conflict with this patch?

-Jussi

>  				*matched = true;
>  		}
>
> @@ -2531,7 +2531,7 @@ static int rndis_get_station(struct wiphy  
> *wiphy, struct net_device *dev,
>  	struct rndis_wlan_private *priv = wiphy_priv(wiphy);
>  	struct usbnet *usbdev = priv->usbdev;
>
> -	if (compare_ether_addr(priv->bssid, mac))
> +	if (!ether_addr_equal(priv->bssid, mac))
>  		return -ENOENT;
>
>  	rndis_fill_station_info(usbdev, sinfo);
> diff --git a/drivers/net/wireless/rtlwifi/base.c  
> b/drivers/net/wireless/rtlwifi/base.c
> index e54488d..f4c852c 100644
> --- a/drivers/net/wireless/rtlwifi/base.c
> +++ b/drivers/net/wireless/rtlwifi/base.c
> @@ -1460,7 +1460,7 @@ void rtl_recognize_peer(struct ieee80211_hw  
> *hw, u8 *data, unsigned int len)
>  		return;
>
>  	/* and only beacons from the associated BSSID, please */
> -	if (compare_ether_addr(hdr->addr3, rtlpriv->mac80211.bssid))
> +	if (!ether_addr_equal(hdr->addr3, rtlpriv->mac80211.bssid))
>  		return;
>
>  	if (rtl_find_221_ie(hw, data, len))
> diff --git a/drivers/net/wireless/rtlwifi/ps.c  
> b/drivers/net/wireless/rtlwifi/ps.c
> index 5b9c3b5..5ae2664 100644
> --- a/drivers/net/wireless/rtlwifi/ps.c
> +++ b/drivers/net/wireless/rtlwifi/ps.c
> @@ -480,7 +480,7 @@ void rtl_swlps_beacon(struct ieee80211_hw *hw,  
> void *data, unsigned int len)
>  		return;
>
>  	/* and only beacons from the associated BSSID, please */
> -	if (compare_ether_addr(hdr->addr3, rtlpriv->mac80211.bssid))
> +	if (!ether_addr_equal(hdr->addr3, rtlpriv->mac80211.bssid))
>  		return;
>
>  	rtlpriv->psc.last_beacon = jiffies;
> diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c  
> b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
> index 37b1363..3af874e 100644
> --- a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
> +++ b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
> @@ -508,14 +508,14 @@ static void  
> _rtl92ce_translate_rx_signal_stuff(struct ieee80211_hw *hw,
>
>  	packet_matchbssid =
>  	    ((IEEE80211_FTYPE_CTL != type) &&
> -	     (!compare_ether_addr(mac->bssid,
> -				  (c_fc & IEEE80211_FCTL_TODS) ?
> -				  hdr->addr1 : (c_fc & IEEE80211_FCTL_FROMDS) ?
> -				  hdr->addr2 : hdr->addr3)) &&
> +	     ether_addr_equal(mac->bssid,
> +			      (c_fc & IEEE80211_FCTL_TODS) ? hdr->addr1 :
> +			      (c_fc & IEEE80211_FCTL_FROMDS) ? hdr->addr2 :
> +			      hdr->addr3) &&
>  	     (!pstats->hwerror) && (!pstats->crc) && (!pstats->icv));
>
>  	packet_toself = packet_matchbssid &&
> -	    (!compare_ether_addr(praddr, rtlefuse->dev_addr));
> +	     ether_addr_equal(praddr, rtlefuse->dev_addr);
>
>  	if (ieee80211_is_beacon(fc))
>  		packet_beacon = true;
> diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/mac.c  
> b/drivers/net/wireless/rtlwifi/rtl8192cu/mac.c
> index 025bdc2..7e91c76 100644
> --- a/drivers/net/wireless/rtlwifi/rtl8192cu/mac.c
> +++ b/drivers/net/wireless/rtlwifi/rtl8192cu/mac.c
> @@ -1099,14 +1099,14 @@ void rtl92c_translate_rx_signal_stuff(struct  
> ieee80211_hw *hw,
>  	praddr = hdr->addr1;
>  	packet_matchbssid =
>  	    ((IEEE80211_FTYPE_CTL != type) &&
> -	     (!compare_ether_addr(mac->bssid,
> -			  (cpu_fc & IEEE80211_FCTL_TODS) ?
> -			  hdr->addr1 : (cpu_fc & IEEE80211_FCTL_FROMDS) ?
> -			  hdr->addr2 : hdr->addr3)) &&
> +	     ether_addr_equal(mac->bssid,
> +			      (cpu_fc & IEEE80211_FCTL_TODS) ? hdr->addr1 :
> +			      (cpu_fc & IEEE80211_FCTL_FROMDS) ? hdr->addr2 :
> +			      hdr->addr3) &&
>  	     (!pstats->hwerror) && (!pstats->crc) && (!pstats->icv));
>
>  	packet_toself = packet_matchbssid &&
> -	    (!compare_ether_addr(praddr, rtlefuse->dev_addr));
> +	    ether_addr_equal(praddr, rtlefuse->dev_addr);
>  	if (ieee80211_is_beacon(fc))
>  		packet_beacon = true;
>  	_rtl92c_query_rxphystatus(hw, pstats, pdesc, p_drvinfo,
> diff --git a/drivers/net/wireless/rtlwifi/rtl8192de/trx.c  
> b/drivers/net/wireless/rtlwifi/rtl8192de/trx.c
> index a7f6126..1666ef7 100644
> --- a/drivers/net/wireless/rtlwifi/rtl8192de/trx.c
> +++ b/drivers/net/wireless/rtlwifi/rtl8192de/trx.c
> @@ -466,12 +466,13 @@ static void  
> _rtl92de_translate_rx_signal_stuff(struct ieee80211_hw *hw,
>  	type = WLAN_FC_GET_TYPE(fc);
>  	praddr = hdr->addr1;
>  	packet_matchbssid = ((IEEE80211_FTYPE_CTL != type) &&
> -	     (!compare_ether_addr(mac->bssid, (cfc & IEEE80211_FCTL_TODS) ?
> -		  hdr->addr1 : (cfc & IEEE80211_FCTL_FROMDS) ?
> -		  hdr->addr2 : hdr->addr3)) && (!pstats->hwerror) &&
> -		  (!pstats->crc) && (!pstats->icv));
> +	     ether_addr_equal(mac->bssid,
> +			      (cfc & IEEE80211_FCTL_TODS) ? hdr->addr1 :
> +			      (cfc & IEEE80211_FCTL_FROMDS) ? hdr->addr2 :
> +			      hdr->addr3) &&
> +	     (!pstats->hwerror) && (!pstats->crc) && (!pstats->icv));
>  	packet_toself = packet_matchbssid &&
> -			(!compare_ether_addr(praddr, rtlefuse->dev_addr));
> +			ether_addr_equal(praddr, rtlefuse->dev_addr);
>  	if (ieee80211_is_beacon(fc))
>  		packet_beacon = true;
>  	_rtl92de_query_rxphystatus(hw, pstats, pdesc, p_drvinfo,
> diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c  
> b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
> index 2fd3d13..812b585 100644
> --- a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
> +++ b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
> @@ -492,13 +492,14 @@ static void  
> _rtl92se_translate_rx_signal_stuff(struct ieee80211_hw *hw,
>  	praddr = hdr->addr1;
>
>  	packet_matchbssid = ((IEEE80211_FTYPE_CTL != type) &&
> -	     (!compare_ether_addr(mac->bssid, (cfc & IEEE80211_FCTL_TODS) ?
> -			hdr->addr1 : (cfc & IEEE80211_FCTL_FROMDS) ?
> -			hdr->addr2 : hdr->addr3)) && (!pstats->hwerror) &&
> -			(!pstats->crc) && (!pstats->icv));
> +	     ether_addr_equal(mac->bssid,
> +			      (cfc & IEEE80211_FCTL_TODS) ? hdr->addr1 :
> +			      (cfc & IEEE80211_FCTL_FROMDS) ? hdr->addr2 :
> +			      hdr->addr3) &&
> +	     (!pstats->hwerror) && (!pstats->crc) && (!pstats->icv));
>
>  	packet_toself = packet_matchbssid &&
> -	    (!compare_ether_addr(praddr, rtlefuse->dev_addr));
> +	    ether_addr_equal(praddr, rtlefuse->dev_addr);
>
>  	if (ieee80211_is_beacon(fc))
>  		packet_beacon = true;
> --
> 1.7.8.111.gad25c.dirty
>
>
>

^ permalink raw reply

* Re: [PATCH 02/12] selinux: tag avc cache alloc as non-critical
From: Eric Paris @ 2012-05-10 14:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson
In-Reply-To: <1336658065-24851-3-git-send-email-mgorman@suse.de>

On Thu, May 10, 2012 at 9:54 AM, Mel Gorman <mgorman@suse.de> wrote:
> Failing to allocate a cache entry will only harm performance not
> correctness.  Do not consume valuable reserve pages for something
> like that.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Acked-by: Eric Paris <eparis@redhat.com>

> ---
>  security/selinux/avc.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/security/selinux/avc.c b/security/selinux/avc.c
> index 8ee42b2..75c2977 100644
> --- a/security/selinux/avc.c
> +++ b/security/selinux/avc.c
> @@ -280,7 +280,7 @@ static struct avc_node *avc_alloc_node(void)
>  {
>        struct avc_node *node;
>
> -       node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
> +       node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
>        if (!node)
>                goto out;
>
> --
> 1.7.9.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [net-next 07/12] ixgbe: Enable timesync clock-out feature for PPS support on X540
From: Richard Cochran @ 2012-05-10 14:17 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Jacob E Keller, netdev, gospo, sassmann
In-Reply-To: <1336632413-19135-8-git-send-email-jeffrey.t.kirsher@intel.com>

On Wed, May 09, 2012 at 11:46:48PM -0700, Jeff Kirsher wrote:
> From: Jacob E Keller <jacob.e.keller@intel.com>
> 
> This patch enables the PPS system in the PHC framework, by enabling
> the clock-out feature on the X540 device. Causes the SDP0 to be set as
> a 1Hz clock. Also configures the timesync interrupt cause in order to
> report each pulse to the PPS via the PHC framework, which can be used
> for general system clock synchronization. (This allows a stable method
> for tuning the general system time via the on-board SYSTIM register
> based clock.)

Glad to see the PPS output and internal PPS support.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 9a83c40..1ad6e2a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

...

>  /**
> + * ixgbe_ptp_check_pps_event
> + * @adapter - the private adapter structure
> + * @eicr - the interrupt cause register value
> + *
> + * This function is called by the interrupt routine when checking for
> + * interrupts. It will check and handle a pps event.
> + */
> +void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr)
> +{
> +	struct ixgbe_hw *hw = &adapter->hw;
> +	struct ptp_clock_event event;
> +
> +	event.type = PTP_CLOCK_PPS;
> +
> +	/* Make sure ptp clock is valid, and PPS event enabled */
> +	if (!adapter->ptp_clock ||
> +	    !(adapter->flags2 & IXGBE_FLAG2_PTP_PPS_ENABLED))
> +		return;
> +
> +	switch (hw->mac.type) {
> +	case ixgbe_mac_X540:
> +		if (eicr & IXGBE_EICR_TIMESYNC)

Since this function is called in every interrupt, I would check this
flag first thing.

> +			ptp_clock_event(adapter->ptp_clock, &event);
> +		break;
> +	default:
> +		break;
> +	}
> +}

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 02/12] selinux: tag avc cache alloc as non-critical
From: Casey Schaufler @ 2012-05-10 14:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Eric B Munson, LSM, SE Linux
In-Reply-To: <1336658065-24851-3-git-send-email-mgorman@suse.de>

On 5/10/2012 6:54 AM, Mel Gorman wrote:
> Failing to allocate a cache entry will only harm performance not
> correctness.  Do not consume valuable reserve pages for something
> like that.

Copying to the LSM and SELinux lists.
 
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  security/selinux/avc.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/security/selinux/avc.c b/security/selinux/avc.c
> index 8ee42b2..75c2977 100644
> --- a/security/selinux/avc.c
> +++ b/security/selinux/avc.c
> @@ -280,7 +280,7 @@ static struct avc_node *avc_alloc_node(void)
>  {
>  	struct avc_node *node;
>  
> -	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
> +	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
>  	if (!node)
>  		goto out;
>  


^ permalink raw reply

* Re: [net-next 06/12] ixgbe: Hardware Timestamping + PTP Hardware Clock (PHC)
From: Richard Cochran @ 2012-05-10 14:11 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Jacob Keller, netdev, gospo, sassmann
In-Reply-To: <1336632413-19135-7-git-send-email-jeffrey.t.kirsher@intel.com>

Mostly, this looks very good. I do have one concern and a nit, though.

On Wed, May 09, 2012 at 11:46:47PM -0700, Jeff Kirsher wrote:
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 1693ec3..9a83c40 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -789,6 +789,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>  		total_bytes += tx_buffer->bytecount;
>  		total_packets += tx_buffer->gso_segs;
>  
> +#ifdef CONFIG_IXGBE_PTP
> +		if (unlikely(tx_buffer->tx_flags &
> +			     IXGBE_TX_FLAGS_TSTAMP))
> +			ixgbe_ptp_tx_hwtstamp(q_vector,
> +					      tx_buffer->skb);

This looks strangely wrapped.

> +
> +#endif
>  		/* free the skb */
>  		dev_kfree_skb_any(tx_buffer->skb);
>  

...

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> new file mode 100644
> index 0000000..0b6553e
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c

...

> +/**
> + * ixgbe_ptp_rx_hwtstamp - utility function which checks for RX time stamp
> + * @q_vector: structure containing interrupt and ring information
> + * @skb: particular skb to send timestamp with
> + *
> + * if the timestamp is valid, we convert it into the timecounter ns
> + * value, then store that result into the shhwtstamps structure which
> + * is passed up the network stack
> + */
> +void ixgbe_ptp_rx_hwtstamp(struct ixgbe_q_vector *q_vector,
> +			   struct sk_buff *skb)
> +{
> +	struct ixgbe_adapter *adapter;
> +	struct ixgbe_hw *hw;
> +	struct skb_shared_hwtstamps *shhwtstamps;
> +	u64 regval = 0, ns;
> +	u32 tsyncrxctl;
> +	unsigned long flags;
> +
> +	/* we cannot process timestamps on a ring without a q_vector */
> +	if (!q_vector || !q_vector->adapter)
> +		return;
> +
> +	adapter = q_vector->adapter;
> +	hw = &adapter->hw;
> +
> +	tsyncrxctl = IXGBE_READ_REG(hw, IXGBE_TSYNCRXCTL);
> +	regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPL);
> +	regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPH) << 32;
> +
> +	/*
> +	 * If this bit is set, then the RX registers contain the time stamp. No
> +	 * other packet will be time stamped until we read these registers, so
> +	 * read the registers to make them available again. Because only one
> +	 * packet can be time stamped at a time, we know that the register
> +	 * values must belong to this one here and therefore we don't need to
> +	 * compare any of the additional attributes stored for it.

I suspect that this assumption is wrong. What happens if the time
stamping logic locks a value but the packet is lost because the ring
is full?

BTW, the IGB driver also has this defect.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 1/2 net] 6lowpan: add missing pskb_may_pull() check
From: Alexander Smirnov @ 2012-05-10 14:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1336656659.12504.153.camel@edumazet-glaptop>

Hi Eric,

2012/5/10 Eric Dumazet <eric.dumazet@gmail.com>:
> On Thu, 2012-05-10 at 17:22 +0400, alex.bluesman.smirnov@gmail.com
> wrote:
>> From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
>>
>> Add pskb_may_pull() call when fetching u8 from skb.
>>
>> Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
>> ---
>>  net/ieee802154/6lowpan.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
>> index 32eb417..0ab3efe 100644
>> --- a/net/ieee802154/6lowpan.c
>> +++ b/net/ieee802154/6lowpan.c
>> @@ -295,6 +295,8 @@ static u8 lowpan_fetch_skb_u8(struct sk_buff *skb)
>>  {
>>       u8 ret;
>>
>> +     BUG_ON(!pskb_may_pull(skb, 1));
>> +
>>       ret = skb->data[0];
>>       skb_pull(skb, 1);
>>
>
> No, you cant do that.
>
> pskb_may_pull() can fail, and you crash your machine instead of graceful
> error reporting.
>

thanks for the comment!

Using BUG() macro I just want to indicate that something in the bottom
of the stack went terribly wrong and you must check your code for
bugs..

>
>

^ permalink raw reply

* [PATCH] ehea: fix losing of NEQ events when one event occurred early
From: Thadeu Lima de Souza Cascardo @ 2012-05-10 14:00 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Thadeu Lima de Souza Cascardo
In-Reply-To: <20120509.202913.886545227635873372.davem@davemloft.net>

The NEQ interrupt is only triggered when there was no previous pending
interrupt. If we request irq handling after an interrupt has occurred,
we will never get an interrupt until we call H_RESET_EVENTS.

Events seem to be cleared when we first register the NEQ. So, when we
requested irq handling right after registering it, a possible race with
an interrupt was much less likely. Now, there is a chance we may lose
this race and never get any events.

The fix here is to poll and acknowledge any events that might have
happened right after registering the irq handler.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ehea/ehea_main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index c9069a2..f4d2da0 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -3335,6 +3335,8 @@ static int __devinit ehea_probe_adapter(struct platform_device *dev,
 		goto out_shutdown_ports;
 	}

+	/* Handle any events that might be pending. */
+	tasklet_hi_schedule(&adapter->neq_tasklet);

 	ret = 0;
 	goto out;
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 12/12] Avoid dereferencing bd_disk during swap_entry_free for network storage
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

Commit [b3a27d: swap: Add swap slot free callback to
block_device_operations] dereferences p->bdev->bd_disk but this is a
NULL dereference if using swap-over-NFS. This patch checks SWP_BLKDEV
on the swap_info_struct before dereferencing.

With reference to this callback, Christoph Hellwig stated "Please
just remove the callback entirely.  It has no user outside the staging
tree and was added clearly against the rules for that staging tree".
This would also be my preference but there was not an obvious way of
keeping zram in staging/ happy.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swapfile.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 80b3415..d85d842 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -547,7 +547,6 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 
 	/* free if no reference */
 	if (!usage) {
-		struct gendisk *disk = p->bdev->bd_disk;
 		if (offset < p->lowest_bit)
 			p->lowest_bit = offset;
 		if (offset > p->highest_bit)
@@ -557,9 +556,11 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 			swap_list.next = p->type;
 		nr_swap_pages++;
 		p->inuse_pages--;
-		if ((p->flags & SWP_BLKDEV) &&
-				disk->fops->swap_slot_free_notify)
-			disk->fops->swap_slot_free_notify(p->bdev, offset);
+		if (p->flags & SWP_BLKDEV) {
+			struct gendisk *disk = p->bdev->bd_disk;
+			if (disk->fops->swap_slot_free_notify)
+				disk->fops->swap_slot_free_notify(p->bdev, offset);
+		}
 	}
 
 	return usage;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 11/12] nfs: Prevent page allocator recursions with swap over NFS.
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate
IO, just not of any filesystem data.

The problem is that previously NOFS was correct because that avoids
recursion into the NFS code. With swap-over-NFS, it is no longer
correct as swap IO can lead to this recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/pagelist.c |    2 +-
 fs/nfs/write.c    |    7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 77aa83e..8e80d71 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -29,7 +29,7 @@ static struct kmem_cache *nfs_page_cachep;
 static inline struct nfs_page *
 nfs_page_alloc(void)
 {
-	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL);
+	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_NOIO);
 	if (p)
 		INIT_LIST_HEAD(&p->wb_list);
 	return p;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 21cfe71..abdbe61 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -52,7 +52,7 @@ static mempool_t *nfs_commit_mempool;
 
 struct nfs_write_data *nfs_commitdata_alloc(void)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -72,7 +72,7 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
 
 struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -81,7 +81,8 @@ struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 		if (pagecount <= ARRAY_SIZE(p->page_array))
 			p->pagevec = p->page_array;
 		else {
-			p->pagevec = kcalloc(pagecount, sizeof(struct page *), GFP_NOFS);
+			p->pagevec = kcalloc(pagecount, sizeof(struct page *),
+					GFP_NOIO);
 			if (!p->pagevec) {
 				mempool_free(p, nfs_wdata_mempool);
 				p = NULL;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 10/12] nfs: enable swap on NFS
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect
under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the
protocol ->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us
to receive the packets required for the TCP connection buildup.

[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/Kconfig              |    8 ++++
 fs/nfs/direct.c             |   94 +++++++++++++++++++++++++++++--------------
 fs/nfs/file.c               |   22 +++++++++-
 include/linux/nfs_fs.h      |    4 +-
 include/linux/sunrpc/xprt.h |    3 ++
 net/sunrpc/Kconfig          |    5 +++
 net/sunrpc/clnt.c           |    2 +
 net/sunrpc/sched.c          |    7 +++-
 net/sunrpc/xprtsock.c       |   53 ++++++++++++++++++++++++
 9 files changed, 161 insertions(+), 37 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 2a0e6c5..ff93d0c 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -75,6 +75,14 @@ config NFS_V4
 
 	  If unsure, say Y.
 
+config NFS_SWAP
+	bool "Provide swap over NFS support"
+	default n
+	depends on NFS_FS
+	select SUNRPC_SWAP
+	help
+	  This option enables swapon to work on files located on NFS mounts.
+
 config NFS_V4_1
 	bool "NFS client support for NFSv4.1 (EXPERIMENTAL)"
 	depends on NFS_FS && NFS_V4 && EXPERIMENTAL
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 481be7f..2ef7395 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -111,17 +111,28 @@ static inline int put_dreq(struct nfs_direct_req *dreq)
  * @nr_segs: size of iovec array
  *
  * The presence of this routine in the address space ops vector means
- * the NFS client supports direct I/O.  However, we shunt off direct
- * read and write requests before the VFS gets them, so this method
- * should never be called.
+ * the NFS client supports direct I/O. However, for most direct IO, we
+ * shunt off direct read and write requests before the VFS gets them,
+ * so this method is only ever called for swap.
  */
 ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t pos, unsigned long nr_segs)
 {
+#ifndef CONFIG_NFS_SWAP
 	dprintk("NFS: nfs_direct_IO (%s) off/no(%Ld/%lu) EINVAL\n",
 			iocb->ki_filp->f_path.dentry->d_name.name,
 			(long long) pos, nr_segs);
 
 	return -EINVAL;
+#else
+	VM_BUG_ON(iocb->ki_left != PAGE_SIZE);
+	VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE);
+
+	if (rw == READ || rw == KERNEL_READ)
+		return nfs_file_direct_read(iocb, iov, nr_segs, pos,
+				rw == READ ? true : false);
+	return nfs_file_direct_write(iocb, iov, nr_segs, pos,
+				rw == WRITE ? true : false);
+#endif /* CONFIG_NFS_SWAP */
 }
 
 static void nfs_direct_dirty_pages(struct page **pages, unsigned int pgbase, size_t count)
@@ -278,7 +289,7 @@ static const struct rpc_call_ops nfs_read_direct_ops = {
  */
 static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 						const struct iovec *iov,
-						loff_t pos)
+						loff_t pos, bool uio)
 {
 	struct nfs_open_context *ctx = dreq->ctx;
 	struct inode *inode = ctx->dentry->d_inode;
@@ -312,13 +323,22 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 		if (unlikely(!data))
 			break;
 
-		down_read(&current->mm->mmap_sem);
-		result = get_user_pages(current, current->mm, user_addr,
-					data->npages, 1, 0, data->pagevec, NULL);
-		up_read(&current->mm->mmap_sem);
-		if (result < 0) {
-			nfs_readdata_free(data);
-			break;
+		if (uio) {
+			down_read(&current->mm->mmap_sem);
+			result = get_user_pages(current, current->mm, user_addr,
+				data->npages, 1, 0, data->pagevec, NULL);
+			up_read(&current->mm->mmap_sem);
+			if (result < 0) {
+				nfs_readdata_free(data);
+				break;
+			}
+		} else {
+			WARN_ON(data->npages != 1);
+			result = get_kernel_page(user_addr, 1, data->pagevec);
+			if (WARN_ON(result != 1)) {
+				nfs_readdata_free(data);
+				break;
+			}
 		}
 		if ((unsigned)result < data->npages) {
 			bytes = result * PAGE_SIZE;
@@ -386,7 +406,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 					      const struct iovec *iov,
 					      unsigned long nr_segs,
-					      loff_t pos)
+					      loff_t pos, bool uio)
 {
 	ssize_t result = -EINVAL;
 	size_t requested_bytes = 0;
@@ -396,7 +416,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 
 	for (seg = 0; seg < nr_segs; seg++) {
 		const struct iovec *vec = &iov[seg];
-		result = nfs_direct_read_schedule_segment(dreq, vec, pos);
+		result = nfs_direct_read_schedule_segment(dreq, vec, pos, uio);
 		if (result < 0)
 			break;
 		requested_bytes += result;
@@ -420,7 +440,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 }
 
 static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
-			       unsigned long nr_segs, loff_t pos)
+			       unsigned long nr_segs, loff_t pos, bool uio)
 {
 	ssize_t result = -ENOMEM;
 	struct inode *inode = iocb->ki_filp->f_mapping->host;
@@ -438,7 +458,7 @@ static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
-	result = nfs_direct_read_schedule_iovec(dreq, iov, nr_segs, pos);
+	result = nfs_direct_read_schedule_iovec(dreq, iov, nr_segs, pos, uio);
 	if (!result)
 		result = nfs_direct_wait(dreq);
 out_release:
@@ -705,7 +725,8 @@ static const struct rpc_call_ops nfs_write_direct_ops = {
  */
 static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 						 const struct iovec *iov,
-						 loff_t pos, int sync)
+						 loff_t pos, int sync,
+						 bool uio)
 {
 	struct nfs_open_context *ctx = dreq->ctx;
 	struct inode *inode = ctx->dentry->d_inode;
@@ -739,13 +760,22 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 		if (unlikely(!data))
 			break;
 
-		down_read(&current->mm->mmap_sem);
-		result = get_user_pages(current, current->mm, user_addr,
-					data->npages, 0, 0, data->pagevec, NULL);
-		up_read(&current->mm->mmap_sem);
-		if (result < 0) {
-			nfs_writedata_free(data);
-			break;
+		if (uio) {
+			down_read(&current->mm->mmap_sem);
+			result = get_user_pages(current, current->mm, user_addr,
+				data->npages, 0, 0, data->pagevec, NULL);
+			up_read(&current->mm->mmap_sem);
+			if (result < 0) {
+				nfs_writedata_free(data);
+				break;
+			}
+		} else {
+			WARN_ON(data->npages != 1);
+			result = get_kernel_page(user_addr, 0, data->pagevec);
+			if (WARN_ON(result != 1)) {
+				nfs_writedata_free(data);
+				break;
+			}
 		}
 		if ((unsigned)result < data->npages) {
 			bytes = result * PAGE_SIZE;
@@ -817,7 +847,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 					       const struct iovec *iov,
 					       unsigned long nr_segs,
-					       loff_t pos, int sync)
+					       loff_t pos, int sync,
+					       bool uio)
 {
 	ssize_t result = 0;
 	size_t requested_bytes = 0;
@@ -828,7 +859,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 	for (seg = 0; seg < nr_segs; seg++) {
 		const struct iovec *vec = &iov[seg];
 		result = nfs_direct_write_schedule_segment(dreq, vec,
-							   pos, sync);
+							   pos, sync, uio);
 		if (result < 0)
 			break;
 		requested_bytes += result;
@@ -853,7 +884,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 
 static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
 				unsigned long nr_segs, loff_t pos,
-				size_t count)
+				size_t count, bool uio)
 {
 	ssize_t result = -ENOMEM;
 	struct inode *inode = iocb->ki_filp->f_mapping->host;
@@ -877,7 +908,8 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
-	result = nfs_direct_write_schedule_iovec(dreq, iov, nr_segs, pos, sync);
+	result = nfs_direct_write_schedule_iovec(dreq, iov, nr_segs, pos,
+								sync, uio);
 	if (!result)
 		result = nfs_direct_wait(dreq);
 out_release:
@@ -908,7 +940,7 @@ out:
  * cache.
  */
 ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t pos)
+				unsigned long nr_segs, loff_t pos, bool uio)
 {
 	ssize_t retval = -EINVAL;
 	struct file *file = iocb->ki_filp;
@@ -933,7 +965,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov,
 
 	task_io_account_read(count);
 
-	retval = nfs_direct_read(iocb, iov, nr_segs, pos);
+	retval = nfs_direct_read(iocb, iov, nr_segs, pos, uio);
 	if (retval > 0)
 		iocb->ki_pos = pos + retval;
 
@@ -964,7 +996,7 @@ out:
  * is no atomic O_APPEND write facility in the NFS protocol.
  */
 ssize_t nfs_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t pos)
+				unsigned long nr_segs, loff_t pos, bool uio)
 {
 	ssize_t retval = -EINVAL;
 	struct file *file = iocb->ki_filp;
@@ -996,7 +1028,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 
 	task_io_account_write(count);
 
-	retval = nfs_direct_write(iocb, iov, nr_segs, pos, count);
+	retval = nfs_direct_write(iocb, iov, nr_segs, pos, count, uio);
 
 	if (retval > 0)
 		iocb->ki_pos = pos + retval;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 6ead5e3..0e330f2 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -187,7 +187,7 @@ nfs_file_read(struct kiocb *iocb, const struct iovec *iov,
 	ssize_t result;
 
 	if (iocb->ki_filp->f_flags & O_DIRECT)
-		return nfs_file_direct_read(iocb, iov, nr_segs, pos);
+		return nfs_file_direct_read(iocb, iov, nr_segs, pos, true);
 
 	dprintk("NFS: read(%s/%s, %lu@%lu)\n",
 		dentry->d_parent->d_name.name, dentry->d_name.name,
@@ -486,6 +486,20 @@ static int nfs_launder_page(struct page *page)
 	return nfs_wb_page(inode, page);
 }
 
+#ifdef CONFIG_NFS_SWAP
+static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
+						sector_t *span)
+{
+	*span = sis->pages;
+	return xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 1);
+}
+
+static void nfs_swap_deactivate(struct file *file)
+{
+	xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 0);
+}
+#endif
+
 const struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -500,6 +514,10 @@ const struct address_space_operations nfs_file_aops = {
 	.migratepage = nfs_migrate_page,
 	.launder_page = nfs_launder_page,
 	.error_remove_page = generic_error_remove_page,
+#ifdef CONFIG_NFS_SWAP
+	.swap_activate = nfs_swap_activate,
+	.swap_deactivate = nfs_swap_deactivate,
+#endif
 };
 
 /*
@@ -574,7 +592,7 @@ static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
 	size_t count = iov_length(iov, nr_segs);
 
 	if (iocb->ki_filp->f_flags & O_DIRECT)
-		return nfs_file_direct_write(iocb, iov, nr_segs, pos);
+		return nfs_file_direct_write(iocb, iov, nr_segs, pos, true);
 
 	dprintk("NFS: write(%s/%s, %lu@%Ld)\n",
 		dentry->d_parent->d_name.name, dentry->d_name.name,
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 52a1bdb..1f4efab 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -481,10 +481,10 @@ extern ssize_t nfs_direct_IO(int, struct kiocb *, const struct iovec *, loff_t,
 			unsigned long);
 extern ssize_t nfs_file_direct_read(struct kiocb *iocb,
 			const struct iovec *iov, unsigned long nr_segs,
-			loff_t pos);
+			loff_t pos, bool uio);
 extern ssize_t nfs_file_direct_write(struct kiocb *iocb,
 			const struct iovec *iov, unsigned long nr_segs,
-			loff_t pos);
+			loff_t pos, bool uio);
 
 /*
  * linux/fs/nfs/dir.c
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 77d278d..cff40aa 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -174,6 +174,8 @@ struct rpc_xprt {
 	unsigned long		state;		/* transport state */
 	unsigned char		shutdown   : 1,	/* being shut down */
 				resvport   : 1; /* use a reserved port */
+	unsigned int		swapper;	/* we're swapping over this
+						   transport */
 	unsigned int		bind_index;	/* bind function index */
 
 	/*
@@ -316,6 +318,7 @@ void			xprt_release_rqst_cong(struct rpc_task *task);
 void			xprt_disconnect_done(struct rpc_xprt *xprt);
 void			xprt_force_disconnect(struct rpc_xprt *xprt);
 void			xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
+int			xs_swapper(struct rpc_xprt *xprt, int enable);
 
 /*
  * Reserved bit positions in xprt->state
diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
index 9fe8857..03d03e3 100644
--- a/net/sunrpc/Kconfig
+++ b/net/sunrpc/Kconfig
@@ -21,6 +21,11 @@ config SUNRPC_XPRT_RDMA
 
 	  If unsure, say N.
 
+config SUNRPC_SWAP
+	bool
+	depends on SUNRPC
+	select NETVM
+
 config RPCSEC_GSS_KRB5
 	tristate "Secure RPC: Kerberos V mechanism"
 	depends on SUNRPC && CRYPTO
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6797246..c01d35f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -691,6 +691,8 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
 		atomic_inc(&clnt->cl_count);
 		if (clnt->cl_softrtry)
 			task->tk_flags |= RPC_TASK_SOFT;
+		if (task->tk_client->cl_xprt->swapper)
+			task->tk_flags |= RPC_TASK_SWAPPER;
 		/* Add to the client's list of all tasks */
 		spin_lock(&clnt->cl_lock);
 		list_add_tail(&task->tk_task, &clnt->cl_tasks);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 994cfea..83a4c43 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -812,7 +812,10 @@ static void rpc_async_schedule(struct work_struct *work)
 void *rpc_malloc(struct rpc_task *task, size_t size)
 {
 	struct rpc_buffer *buf;
-	gfp_t gfp = RPC_IS_SWAPPER(task) ? GFP_ATOMIC : GFP_NOWAIT;
+	gfp_t gfp = GFP_NOWAIT;
+
+	if (RPC_IS_SWAPPER(task))
+		gfp |= __GFP_MEMALLOC;
 
 	size += sizeof(struct rpc_buffer);
 	if (size <= RPC_BUFFER_MAXSIZE)
@@ -886,7 +889,7 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta
 static struct rpc_task *
 rpc_alloc_task(void)
 {
-	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOFS);
+	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOIO);
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 890b03f..b84df34 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1930,6 +1930,45 @@ out:
 	xprt_wake_pending_tasks(xprt, status);
 }
 
+#ifdef CONFIG_SUNRPC_SWAP
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+
+	if (xprt->swapper)
+		sk_set_memalloc(transport->inet);
+}
+
+/**
+ * xs_swapper - Tag this transport as being used for swap.
+ * @xprt: transport to tag
+ * @enable: enable/disable
+ *
+ */
+int xs_swapper(struct rpc_xprt *xprt, int enable)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+	int err = 0;
+
+	if (enable) {
+		xprt->swapper++;
+		xs_set_memalloc(xprt);
+	} else if (xprt->swapper) {
+		xprt->swapper--;
+		sk_clear_memalloc(transport->inet);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(xs_swapper);
+#else
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+}
+#endif
+
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 {
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -1954,6 +1993,8 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 		transport->sock = sock;
 		transport->inet = sk;
 
+		xs_set_memalloc(xprt);
+
 		write_unlock_bh(&sk->sk_callback_lock);
 	}
 	xs_udp_do_set_buffer_size(xprt);
@@ -1965,11 +2006,15 @@ static void xs_udp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -1988,6 +2033,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /*
@@ -2078,6 +2124,8 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 	if (!xprt_bound(xprt))
 		goto out;
 
+	xs_set_memalloc(xprt);
+
 	/* Tell the socket layer to start connecting... */
 	xprt->stat.connect_count++;
 	xprt->stat.connect_start = jiffies;
@@ -2108,11 +2156,15 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct socket *sock = transport->sock;
 	struct rpc_xprt *xprt = &transport->xprt;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2174,6 +2226,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /**
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 09/12] nfs: disable data cache revalidation for swapfiles
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

The VM does not like PG_private set on PG_swapcache pages. As suggested
by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables
NFS data cache revalidation on swap files.  as it does not make
sense to have other clients change the file while it is being used as
swap. This avoids setting PG_private on swap pages, since there ought
to be no further races with invalidate_inode_pages2() to deal with.

Since we cannot set PG_private we cannot use page->private which
is already used by PG_swapcache pages to store the nfs_page. Thus
augment the new nfs_page_find_request logic.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/inode.c |    6 ++++++
 fs/nfs/write.c |   49 +++++++++++++++++++++++++++++++++++--------------
 2 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e8bbfa5..af43ef6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -880,6 +880,12 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
 	struct nfs_inode *nfsi = NFS_I(inode);
 	int ret = 0;
 
+	/*
+	 * swapfiles are not supposed to be shared.
+	 */
+	if (IS_SWAPFILE(inode))
+		goto out;
+
 	if ((nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
 			|| nfs_attribute_cache_expired(inode)
 			|| NFS_STALE(inode)) {
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8223b2c..21cfe71 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -111,15 +111,28 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
 	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
 }
 
-static struct nfs_page *nfs_page_find_request_locked(struct page *page)
+static struct nfs_page *
+nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page)
 {
 	struct nfs_page *req = NULL;
 
-	if (PagePrivate(page)) {
+	if (PagePrivate(page))
 		req = (struct nfs_page *)page_private(page);
-		if (req != NULL)
-			kref_get(&req->wb_kref);
+	else if (unlikely(PageSwapCache(page))) {
+		struct nfs_page *freq, *t;
+
+		/* Linearly search the commit list for the correct req */
+		list_for_each_entry_safe(freq, t, &nfsi->commit_list, wb_list) {
+			if (freq->wb_page == page) {
+				req = freq;
+				break;
+			}
+		}
 	}
+
+	if (req)
+		kref_get(&req->wb_kref);
+
 	return req;
 }
 
@@ -129,7 +142,7 @@ static struct nfs_page *nfs_page_find_request(struct page *page)
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
-	req = nfs_page_find_request_locked(page);
+	req = nfs_page_find_request_locked(NFS_I(inode), page);
 	spin_unlock(&inode->i_lock);
 	return req;
 }
@@ -232,7 +245,7 @@ static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblo
 
 	spin_lock(&inode->i_lock);
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			break;
 		if (nfs_lock_request_dontget(req))
@@ -385,9 +398,15 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
 	spin_lock(&inode->i_lock);
 	if (!nfsi->npages && nfs_have_delegation(inode, FMODE_WRITE))
 		inode->i_version++;
-	set_bit(PG_MAPPED, &req->wb_flags);
-	SetPagePrivate(req->wb_page);
-	set_page_private(req->wb_page, (unsigned long)req);
+	/*
+	 * Swap-space should not get truncated. Hence no need to plug the race
+	 * with invalidate/truncate.
+	 */
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_bit(PG_MAPPED, &req->wb_flags);
+		SetPagePrivate(req->wb_page);
+		set_page_private(req->wb_page, (unsigned long)req);
+	}
 	nfsi->npages++;
 	kref_get(&req->wb_kref);
 	spin_unlock(&inode->i_lock);
@@ -404,9 +423,11 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 	BUG_ON (!NFS_WBACK_BUSY(req));
 
 	spin_lock(&inode->i_lock);
-	set_page_private(req->wb_page, 0);
-	ClearPagePrivate(req->wb_page);
-	clear_bit(PG_MAPPED, &req->wb_flags);
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_page_private(req->wb_page, 0);
+		ClearPagePrivate(req->wb_page);
+		clear_bit(PG_MAPPED, &req->wb_flags);
+	}
 	nfsi->npages--;
 	spin_unlock(&inode->i_lock);
 	nfs_release_request(req);
@@ -646,7 +667,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 	spin_lock(&inode->i_lock);
 
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			goto out_unlock;
 
@@ -1691,7 +1712,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_offset(page);
+	loff_t range_start = page_file_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 08/12] nfs: teach the NFS client how to treat PG_swapcache pages
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

Replace all relevant occurences of page->index and page->mapping in
the NFS client with the new page_file_index() and page_file_mapping()
functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/file.c     |    6 +++---
 fs/nfs/internal.h |    7 ++++---
 fs/nfs/pagelist.c |    4 ++--
 fs/nfs/read.c     |    6 +++---
 fs/nfs/write.c    |   40 +++++++++++++++++++++-------------------
 5 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index aa9b709..6ead5e3 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -434,7 +434,7 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset)
 	if (offset != 0)
 		return;
 	/* Cancel any unstarted writes on this page */
-	nfs_wb_page_cancel(page->mapping->host, page);
+	nfs_wb_page_cancel(page_file_mapping(page)->host, page);
 
 	nfs_fscache_invalidate_page(page, page->mapping->host);
 }
@@ -476,7 +476,7 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
  */
 static int nfs_launder_page(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_inode *nfsi = NFS_I(inode);
 
 	dfprintk(PAGECACHE, "NFS: launder_page(%ld, %llu)\n",
@@ -525,7 +525,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
 
 	lock_page(page);
-	mapping = page->mapping;
+	mapping = page_file_mapping(page);
 	if (mapping != dentry->d_inode->i_mapping)
 		goto out_unlock;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 2476dc69..a7c16f3 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -434,13 +434,14 @@ void nfs_super_set_maxbytes(struct super_block *sb, __u64 maxfilesize)
 static inline
 unsigned int nfs_page_length(struct page *page)
 {
-	loff_t i_size = i_size_read(page->mapping->host);
+	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
+		pgoff_t page_index = page_file_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page->index < end_index)
+		if (page_index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page->index == end_index)
+		if (page_index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d21fcea..77aa83e 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -77,11 +77,11 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
 	atomic_set(&req->wb_complete, 0);
-	req->wb_index	= page->index;
+	req->wb_index	= page_file_index(page);
 	page_cache_get(page);
 	BUG_ON(PagePrivate(page));
 	BUG_ON(!PageLocked(page));
-	BUG_ON(page->mapping->host != inode);
+	BUG_ON(page_file_mapping(page)->host != inode);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
 	req->wb_bytes   = count;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 0a4be28..fb69784 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -548,11 +548,11 @@ static const struct rpc_call_ops nfs_read_full_ops = {
 int nfs_readpage(struct file *file, struct page *page)
 {
 	struct nfs_open_context *ctx;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page->index);
+		page, PAGE_CACHE_SIZE, page_file_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
@@ -606,7 +606,7 @@ static int
 readpage_async_filler(void *data, struct page *page)
 {
 	struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *new;
 	unsigned int len;
 	int error;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c074623..8223b2c 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -125,7 +125,7 @@ static struct nfs_page *nfs_page_find_request_locked(struct page *page)
 
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
@@ -137,16 +137,16 @@ static struct nfs_page *nfs_page_find_request(struct page *page)
 /* Adjust the file length if we're writing beyond the end */
 static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int count)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	loff_t end, i_size;
 	pgoff_t end_index;
 
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page->index < end_index)
+	if (i_size > 0 && page_file_index(page) < end_index)
 		goto out;
-	end = ((loff_t)page->index << PAGE_CACHE_SHIFT) + ((loff_t)offset+count);
+	end = page_file_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -159,7 +159,7 @@ out:
 static void nfs_set_pageerror(struct page *page)
 {
 	SetPageError(page);
-	nfs_zap_mapping(page->mapping->host, page->mapping);
+	nfs_zap_mapping(page_file_mapping(page)->host, page_file_mapping(page));
 }
 
 /* We can set the PG_uptodate flag if we see that a write request
@@ -200,7 +200,7 @@ static int nfs_set_page_writeback(struct page *page)
 	int ret = test_set_page_writeback(page);
 
 	if (!ret) {
-		struct inode *inode = page->mapping->host;
+		struct inode *inode = page_file_mapping(page)->host;
 		struct nfs_server *nfss = NFS_SERVER(inode);
 
 		page_cache_get(page);
@@ -215,7 +215,7 @@ static int nfs_set_page_writeback(struct page *page)
 
 static void nfs_end_page_writeback(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_server *nfss = NFS_SERVER(inode);
 
 	end_page_writeback(page);
@@ -226,7 +226,7 @@ static void nfs_end_page_writeback(struct page *page)
 
 static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblock)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req;
 	int ret;
 
@@ -287,13 +287,13 @@ out:
 
 static int nfs_do_writepage(struct page *page, struct writeback_control *wbc, struct nfs_pageio_descriptor *pgio)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int ret;
 
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page->index);
+	nfs_pageio_cond_complete(pgio, page_file_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -310,7 +310,8 @@ static int nfs_writepage_locked(struct page *page, struct writeback_control *wbc
 	struct nfs_pageio_descriptor pgio;
 	int err;
 
-	nfs_pageio_init_write(&pgio, page->mapping->host, wb_priority(wbc));
+	nfs_pageio_init_write(&pgio, page_file_mapping(page)->host,
+			wb_priority(wbc));
 	err = nfs_do_writepage(page, wbc, &pgio);
 	nfs_pageio_complete(&pgio);
 	if (err < 0)
@@ -441,7 +442,8 @@ nfs_request_add_commit_list(struct nfs_page *req, struct list_head *head)
 	NFS_I(inode)->ncommit++;
 	spin_unlock(&inode->i_lock);
 	inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-	inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+	inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
+			BDI_RECLAIMABLE);
 	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 }
 EXPORT_SYMBOL_GPL(nfs_request_add_commit_list);
@@ -486,7 +488,7 @@ static void
 nfs_clear_page_commit(struct page *page)
 {
 	dec_zone_page_state(page, NR_UNSTABLE_NFS);
-	dec_bdi_stat(page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+	dec_bdi_stat(page_file_mapping(page)->backing_dev_info, BDI_RECLAIMABLE);
 }
 
 static void
@@ -703,7 +705,7 @@ out_err:
 static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 		struct page *page, unsigned int offset, unsigned int bytes)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page	*req;
 
 	req = nfs_try_to_update_request(inode, page, offset, bytes);
@@ -756,7 +758,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 		nfs_release_request(req);
 		if (!do_flush)
 			return 0;
-		status = nfs_wb_page(page->mapping->host, page);
+		status = nfs_wb_page(page_file_mapping(page)->host, page);
 	} while (status == 0);
 	return status;
 }
@@ -782,7 +784,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 		unsigned int offset, unsigned int count)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
-	struct inode	*inode = page->mapping->host;
+	struct inode	*inode = page_file_mapping(page)->host;
 	int		status = 0;
 
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
@@ -790,7 +792,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 	dprintk("NFS:       nfs_updatepage(%s/%s %d@%lld)\n",
 		file->f_path.dentry->d_parent->d_name.name,
 		file->f_path.dentry->d_name.name, count,
-		(long long)(page_offset(page) + offset));
+		(long long)(page_file_offset(page) + offset));
 
 	/* If we're not using byte range locks, and we know the page
 	 * is up to date, it may be more efficient to extend the write
@@ -1150,7 +1152,7 @@ static void nfs_writeback_release_partial(void *calldata)
 	}
 
 	if (nfs_write_need_commit(data)) {
-		struct inode *inode = page->mapping->host;
+		struct inode *inode = page_file_mapping(page)->host;
 
 		spin_lock(&inode->i_lock);
 		if (test_bit(PG_NEED_RESCHED, &req->wb_flags)) {
@@ -1442,7 +1444,7 @@ void nfs_retry_commit(struct list_head *page_list,
 		nfs_list_remove_request(req);
 		nfs_mark_request_commit(req, lseg);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+		dec_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
 			     BDI_RECLAIMABLE);
 		nfs_unlock_request(req);
 	}
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 07/12] mm: Add support for direct_IO to highmem pages
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

The patch "mm: Add support for a filesystem to activate swap files and
use direct_IO for writing swap pages" added support for using direct_IO
to write swap pages but it is insufficient for highmem pages.

To support highmem pages, this patch kmaps() the page before calling the
direct_IO() handler. As direct_IO deals with virtual addresses an
additional helper is necessary for get_kernel_pages() to lookup the
struct page for a kmap virtual address.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/highmem.h |    7 +++++++
 mm/highmem.c            |   12 ++++++++++++
 mm/memory.c             |    3 +--
 mm/page_io.c            |    3 ++-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index d3999b4..e186e3c 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -39,10 +39,17 @@ extern unsigned long totalhigh_pages;
 
 void kmap_flush_unused(void);
 
+struct page *kmap_to_page(void *addr);
+
 #else /* CONFIG_HIGHMEM */
 
 static inline unsigned int nr_free_highpages(void) { return 0; }
 
+static inline struct page *kmap_to_page(void *addr)
+{
+	return virt_to_page(addr);
+}
+
 #define totalhigh_pages 0UL
 
 #ifndef ARCH_HAS_KMAP
diff --git a/mm/highmem.c b/mm/highmem.c
index 57d82c6..d517cd1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -94,6 +94,18 @@ static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);
 		do { spin_unlock(&kmap_lock); (void)(flags); } while (0)
 #endif
 
+struct page *kmap_to_page(void *vaddr)
+{
+	unsigned long addr = (unsigned long)vaddr;
+
+	if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
+		int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+		return pte_page(pkmap_page_table[i]);
+	}
+
+	return virt_to_page(addr);
+}
+
 static void flush_all_zero_pkmaps(void)
 {
 	int i;
diff --git a/mm/memory.c b/mm/memory.c
index 0bc990e7..fd32a1a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1858,8 +1858,7 @@ int get_kernel_pages(const struct kvec *kiov, int nr_segs, int write,
 		if (WARN_ON(kiov[seg].iov_len != PAGE_SIZE))
 			return seg;
 
-		/* virt_to_page sanity checks the PFN */
-		pages[seg] = virt_to_page(kiov[seg].iov_base);
+		pages[seg] = kmap_to_page(kiov[seg].iov_base);
 		page_cache_get(pages[seg]);
 	}
 
diff --git a/mm/page_io.c b/mm/page_io.c
index f363261..1e39e88 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -198,7 +198,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
 		struct iovec iov = {
-			.iov_base = page_address(page),
+			.iov_base = kmap(page),
 			.iov_len  = PAGE_SIZE,
 		};
 
@@ -211,6 +211,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		ret = mapping->a_ops->direct_IO(KERNEL_WRITE,
 						&kiocb, &iov,
 						kiocb.ki_pos, 1);
+		kunmap(page);
 		if (ret == PAGE_SIZE) {
 			count_vm_event(PSWPOUT);
 			ret = 0;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 06/12] mm: Add get_kernel_page[s] for pinning of kernel addresses for I/O
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

This patch adds two new APIs get_kernel_pages() and get_kernel_page()
that may be used to pin a vector of kernel addresses for IO. The initial
user is expected to be NFS for allowing pages to be written to swap
using aops->direct_IO(). Strictly speaking, swap-over-NFS only needs
to pin one page for IO but it makes sense to express the API in terms
of a vector and add a helper for pinning single pages.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/blk_types.h |    2 ++
 include/linux/fs.h        |    2 ++
 include/linux/mm.h        |    4 ++++
 mm/memory.c               |   53 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 4053cbd..1e62642 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -150,6 +150,7 @@ enum rq_flag_bits {
 	__REQ_FLUSH_SEQ,	/* request for flush sequence */
 	__REQ_IO_STAT,		/* account I/O stat */
 	__REQ_MIXED_MERGE,	/* merge of different types, fail separately */
+	__REQ_KERNEL, 		/* direct IO to kernel pages */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -191,5 +192,6 @@ enum rq_flag_bits {
 #define REQ_IO_STAT		(1 << __REQ_IO_STAT)
 #define REQ_MIXED_MERGE		(1 << __REQ_MIXED_MERGE)
 #define REQ_SECURE		(1 << __REQ_SECURE)
+#define REQ_KERNEL		(1 << __REQ_KERNEL)
 
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d48e8b8..150bc85 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -165,6 +165,8 @@ struct inodes_stat_t {
 #define READ			0
 #define WRITE			RW_MASK
 #define READA			RWA_MASK
+#define KERNEL_READ		(READ|REQ_KERNEL)
+#define KERNEL_WRITE		(WRITE|REQ_KERNEL)
 
 #define READ_SYNC		(READ | REQ_SYNC)
 #define WRITE_SYNC		(WRITE | REQ_SYNC | REQ_NOIDLE)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 58cc925..cf4a730 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1023,6 +1023,10 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
+struct kvec;
+int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
+			struct page **pages);
+int get_kernel_page(unsigned long start, int write, struct page **pages);
 struct page *get_dump_page(unsigned long addr);
 
 extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
diff --git a/mm/memory.c b/mm/memory.c
index 6105f47..0bc990e7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1837,6 +1837,59 @@ next_page:
 EXPORT_SYMBOL(__get_user_pages);
 
 /*
+ * get_kernel_pages() - pin kernel pages in memory
+ * @kiov:	An array of struct kvec structures
+ * @nr_segs:	number of segments to pin
+ * @write:	pinning for read/write, currently ignored
+ * @pages:	array that receives pointers to the pages pinned.
+ *		Should be at least nr_segs long.
+ *
+ * Returns number of pages pinned. This may be fewer than the number
+ * requested. If nr_pages is 0 or negative, returns 0. If no pages
+ * were pinned, returns -errno. Each page returned must be released
+ * with a put_page() call when it is finished with.
+ */
+int get_kernel_pages(const struct kvec *kiov, int nr_segs, int write,
+		struct page **pages)
+{
+	int seg;
+
+	for (seg = 0; seg < nr_segs; seg++) {
+		if (WARN_ON(kiov[seg].iov_len != PAGE_SIZE))
+			return seg;
+
+		/* virt_to_page sanity checks the PFN */
+		pages[seg] = virt_to_page(kiov[seg].iov_base);
+		page_cache_get(pages[seg]);
+	}
+
+	return seg;
+}
+EXPORT_SYMBOL_GPL(get_kernel_pages);
+
+/*
+ * get_kernel_page() - pin a kernel page in memory
+ * @start:	starting kernel address
+ * @write:	pinning for read/write, currently ignored
+ * @pages:	array that receives pointer to the page pinned.
+ *		Must be at least nr_segs long.
+ *
+ * Returns 1 if page is pinned. If the page was not pinned, returns
+ * -errno. The page returned must be released with a put_page() call
+ * when it is finished with.
+ */
+int get_kernel_page(unsigned long start, int write, struct page **pages)
+{
+	const struct kvec kiov = {
+		.iov_base = (void *)start,
+		.iov_len = PAGE_SIZE
+	};
+
+	return get_kernel_pages(&kiov, 1, write, pages);
+}
+EXPORT_SYMBOL_GPL(get_kernel_page);
+
+/*
  * fixup_user_fault() - manually resolve a user page fault
  * @tsk:	the task_struct to use for page fault accounting, or
  *		NULL if faults are not to be recorded.
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 05/12] mm: swap: Implement generic handler for swap_activate
From: Mel Gorman @ 2012-05-10 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336658065-24851-1-git-send-email-mgorman@suse.de>

The version of swap_activate introduced is sufficient for swap-over-NFS
but would not provide enough information to implement a generic handler.
This patch shuffles things slightly to ensure the same information is
available for aops->swap_activate() as is available to the core.

No functionality change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/fs.h   |    6 ++--
 include/linux/swap.h |    5 +++
 mm/page_io.c         |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/swapfile.c        |   91 +++----------------------------------------------
 4 files changed, 106 insertions(+), 88 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0dcd1e8..d48e8b8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -417,6 +417,7 @@ struct kstatfs;
 struct vm_area_struct;
 struct vfsmount;
 struct cred;
+struct swap_info_struct;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
@@ -628,8 +629,9 @@ struct address_space_operations {
 	int (*error_remove_page)(struct address_space *, struct page *);
 
 	/* swapfile support */
-	int (*swap_activate)(struct file *file);
-	int (*swap_deactivate)(struct file *file);
+	int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
+				sector_t *span);
+	void (*swap_deactivate)(struct file *file);
 };
 
 extern const struct address_space_operations empty_aops;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 6b40350..4ab2276 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -320,6 +320,11 @@ extern int swap_writepage(struct page *page, struct writeback_control *wbc);
 extern int swap_set_page_dirty(struct page *page);
 extern void end_swap_bio_read(struct bio *bio, int err);
 
+int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
+		unsigned long nr_pages, sector_t start_block);
+int generic_swapfile_activate(struct swap_info_struct *, struct file *,
+		sector_t *);
+
 /* linux/mm/swap_state.c */
 extern struct address_space swapper_space;
 #define total_swapcache_pages  swapper_space.nrpages
diff --git a/mm/page_io.c b/mm/page_io.c
index 68d8357..f363261 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -86,6 +86,98 @@ void end_swap_bio_read(struct bio *bio, int err)
 	bio_put(bio);
 }
 
+int generic_swapfile_activate(struct swap_info_struct *sis,
+				struct file *swap_file,
+				sector_t *span)
+{
+	struct address_space *mapping = swap_file->f_mapping;
+	struct inode *inode = mapping->host;
+	unsigned blocks_per_page;
+	unsigned long page_no;
+	unsigned blkbits;
+	sector_t probe_block;
+	sector_t last_block;
+	sector_t lowest_block = -1;
+	sector_t highest_block = 0;
+	int nr_extents = 0;
+	int ret;
+
+	blkbits = inode->i_blkbits;
+	blocks_per_page = PAGE_SIZE >> blkbits;
+
+	/*
+	 * Map all the blocks into the extent list.  This code doesn't try
+	 * to be very smart.
+	 */
+	probe_block = 0;
+	page_no = 0;
+	last_block = i_size_read(inode) >> blkbits;
+	while ((probe_block + blocks_per_page) <= last_block &&
+			page_no < sis->max) {
+		unsigned block_in_page;
+		sector_t first_block;
+
+		first_block = bmap(inode, probe_block);
+		if (first_block == 0)
+			goto bad_bmap;
+
+		/*
+		 * It must be PAGE_SIZE aligned on-disk
+		 */
+		if (first_block & (blocks_per_page - 1)) {
+			probe_block++;
+			goto reprobe;
+		}
+
+		for (block_in_page = 1; block_in_page < blocks_per_page;
+					block_in_page++) {
+			sector_t block;
+
+			block = bmap(inode, probe_block + block_in_page);
+			if (block == 0)
+				goto bad_bmap;
+			if (block != first_block + block_in_page) {
+				/* Discontiguity */
+				probe_block++;
+				goto reprobe;
+			}
+		}
+
+		first_block >>= (PAGE_SHIFT - blkbits);
+		if (page_no) {	/* exclude the header page */
+			if (first_block < lowest_block)
+				lowest_block = first_block;
+			if (first_block > highest_block)
+				highest_block = first_block;
+		}
+
+		/*
+		 * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
+		 */
+		ret = add_swap_extent(sis, page_no, 1, first_block);
+		if (ret < 0)
+			goto out;
+		nr_extents += ret;
+		page_no++;
+		probe_block += blocks_per_page;
+reprobe:
+		continue;
+	}
+	ret = nr_extents;
+	*span = 1 + highest_block - lowest_block;
+	if (page_no == 0)
+		page_no = 1;	/* force Empty message */
+	sis->max = page_no;
+	sis->pages = page_no - 1;
+	sis->highest_bit = page_no - 1;
+out:
+	return ret;
+bad_bmap:
+	printk(KERN_ERR "swapon: swapfile has holes\n");
+	ret = -EINVAL;
+	goto out;
+}
+
 /*
  * We may have stale swap cache pages in memory: notice
  * them here and get rid of the unnecessary final write.
diff --git a/mm/swapfile.c b/mm/swapfile.c
index fe2ed44..80b3415 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1358,7 +1358,7 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
  *
  * This function rather assumes that it is called in ascending page order.
  */
-static int
+int
 add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 		unsigned long nr_pages, sector_t start_block)
 {
@@ -1434,106 +1434,25 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 	struct file *swap_file = sis->swap_file;
 	struct address_space *mapping = swap_file->f_mapping;
 	struct inode *inode = mapping->host;
-	unsigned blocks_per_page;
-	unsigned long page_no;
-	unsigned blkbits;
-	sector_t probe_block;
-	sector_t last_block;
-	sector_t lowest_block = -1;
-	sector_t highest_block = 0;
-	int nr_extents = 0;
 	int ret;
 
 	if (S_ISBLK(inode->i_mode)) {
 		ret = add_swap_extent(sis, 0, sis->max, 0);
 		*span = sis->pages;
-		goto out;
+		return ret;
 	}
 
 	if (mapping->a_ops->swap_activate) {
-		ret = mapping->a_ops->swap_activate(swap_file);
+		ret = mapping->a_ops->swap_activate(sis, swap_file, span);
 		if (!ret) {
 			sis->flags |= SWP_FILE;
 			ret = add_swap_extent(sis, 0, sis->max, 0);
 			*span = sis->pages;
 		}
-		goto out;
+		return ret;
 	}
 
-	blkbits = inode->i_blkbits;
-	blocks_per_page = PAGE_SIZE >> blkbits;
-
-	/*
-	 * Map all the blocks into the extent list.  This code doesn't try
-	 * to be very smart.
-	 */
-	probe_block = 0;
-	page_no = 0;
-	last_block = i_size_read(inode) >> blkbits;
-	while ((probe_block + blocks_per_page) <= last_block &&
-			page_no < sis->max) {
-		unsigned block_in_page;
-		sector_t first_block;
-
-		first_block = bmap(inode, probe_block);
-		if (first_block == 0)
-			goto bad_bmap;
-
-		/*
-		 * It must be PAGE_SIZE aligned on-disk
-		 */
-		if (first_block & (blocks_per_page - 1)) {
-			probe_block++;
-			goto reprobe;
-		}
-
-		for (block_in_page = 1; block_in_page < blocks_per_page;
-					block_in_page++) {
-			sector_t block;
-
-			block = bmap(inode, probe_block + block_in_page);
-			if (block == 0)
-				goto bad_bmap;
-			if (block != first_block + block_in_page) {
-				/* Discontiguity */
-				probe_block++;
-				goto reprobe;
-			}
-		}
-
-		first_block >>= (PAGE_SHIFT - blkbits);
-		if (page_no) {	/* exclude the header page */
-			if (first_block < lowest_block)
-				lowest_block = first_block;
-			if (first_block > highest_block)
-				highest_block = first_block;
-		}
-
-		/*
-		 * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
-		 */
-		ret = add_swap_extent(sis, page_no, 1, first_block);
-		if (ret < 0)
-			goto out;
-		nr_extents += ret;
-		page_no++;
-		probe_block += blocks_per_page;
-reprobe:
-		continue;
-	}
-	ret = nr_extents;
-	*span = 1 + highest_block - lowest_block;
-	if (page_no == 0)
-		page_no = 1;	/* force Empty message */
-	sis->max = page_no;
-	sis->pages = page_no - 1;
-	sis->highest_bit = page_no - 1;
-out:
-	return ret;
-bad_bmap:
-	printk(KERN_ERR "swapon: swapfile has holes\n");
-	ret = -EINVAL;
-	goto out;
+	return generic_swapfile_activate(sis, swap_file, span);
 }
 
 static void enable_swap_info(struct swap_info_struct *p, int prio,
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox