Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Kexec Reboot Network Issue
From: Eric W. Biederman @ 2010-07-18  4:25 UTC (permalink / raw)
  To: Simon Horman; +Cc: Richard Genthner, Michael Chan, Matt Carlson, netdev, kexec
In-Reply-To: <20100716071547.GB20761@verge.net.au>

Simon Horman <horms@verge.net.au> writes:

> [ CCed Michael Chan, Matt Carlson and Netdev ]
>
> On Thu, Jul 15, 2010 at 11:35:43AM -0400, Richard Genthner wrote:
>> On 07/15/2010 10:20 AM, Simon Horman wrote:
>> >On Thu, Jul 15, 2010 at 07:59:07AM -0400, Richard Genthner wrote:
>> >>I'm currently using the following string:
>> >>
>> >>kexex --type=elf-x86_64 --args-linux -l /boot/vmlinuz-2.6.18-12.el5
>> >>--initrid=/boot/initrd-2.6.18-128.el6.img --append="`cat
>> >>/proc/cmdline`"
>> >>kexec -e
>> >>
>> >>Some times we can get to the box from any subnet, other times we can
>> >>only get to the box from the same subnet only. Our solution to this
>> >>is to down the iface and then restart networking. Has anyone else
>> >>run into this issue?
>> >Hi Richard,
>> >
>> >could you be more specific about which NIC you are using?
>> >And is it at all possible to test a newer kernel version?
>> >
>> >What I suspect is happening is that the NIC is getting into an unknown
>> >state. And what I'm hoping is that is a problem thats already been
>> >addressed.
>>
>> I would try a different kenerl but our cluster fs has us locked to
>> this kernel until we finish the upgrade to the new cluster fs
>> version. heres ethtool on the iface
>> 
>> from lshw
>> *-network
>>                 description: Ethernet interface
>>                 product: NetXtreme BCM5721 Gigabit Ethernet PCI Express
>>                 vendor: Broadcom Corporation
>>                 physical id: 0
>>                 bus info: pci@0000:03:00.0
>>                 logical name: eth0
>>                 version: 21
>>                 serial: 00:25:64:3b:9c:ae
>>                 size: 1GB/s
>>                 capacity: 1GB/s
>>                 width: 64 bits
>>                 clock: 33MHz
>>                 capabilities: pm vpd msi pciexpress bus_master
>> cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt
>> 1000bt-fd autonegotiation
>>                 configuration: autonegotiation=on broadcast=yes
>> driver=tg3 driverversion=3.93 duplex=full firmware=5721-v3.65,
>> ASFIPMI v6.25 ip=172.16.1.123 latency=0 link=yes module=tg3
>> multicast=yes port=twisted pair speed=1GB/s
>
> Hi Richard,
>
> first, please don't top-post, its not the done thing in these parts.
>
> I had a quick hunt through the git change log and the onl changed that
> jumped out was "[TG3]: Fix msi issue with kexec/kdump"[1], but this
> seems to have been back-ported to the initrd-2.6.18-128.el5 (I assume
> you meant 5 not 6) kernel.
>
> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ee6a99b539a50b4e9398938a0a6d37f8bf911550
>
> In any case I strongly suspect that the problem is a kernel problem and as
> such can't be solved modifying the kernel or at least tg.ko module (which
> is probably in the initrd). I suggest logging bug report with Red Hat.

The classic workaround is to rmmod the driver before calling kexec.
Frequently driver remove methods used by rmmod are much more tested
then the shutdown methods used by kexec and reboot.


Eric

^ permalink raw reply

* Re: [PATCH 2.6.35-rc1] net-next: vmxnet3 fixes [4/5] Do not reset when the device is not opened
From: David Miller @ 2010-07-17 23:35 UTC (permalink / raw)
  To: sbhatewara; +Cc: netdev, linux-kernel, pv-drivers, ronghua, matthieu
In-Reply-To: <alpine.LRH.2.00.1007160108240.12503@localhost.localdomain>

From: Shreyas Bhatewara <sbhatewara@vmware.com>
Date: Fri, 16 Jul 2010 01:17:29 -0700 (PDT)

> 
> 
> On Thu, 15 Jul 2010, David Miller wrote:
> 
>> From: Shreyas Bhatewara <sbhatewara@vmware.com>
>> Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)
>> 
>> > Is this what you suggest :
>> > 
>> > ---
>> > 
>> > Hold rtnl_lock to get the right link state.
>> 
>> It ought to work, but make sure that it is legal to take the
>> RTNL semaphore in all contexts in which this code block
>> might be called.
>> 
> 
> This code block is called only from the workqueue handler, which runs in
> process context, so it is legal to take rtnl semaphore.
> Tested this code by simulating event interrupts (which schedule this 
> code) at considerable frequency while the interface was brought up and
> down in a loop. Similar stress testing had revealed the bug originally. 

Awesome, please submit this formally.  The copy you sent lacked a commit
message and signoff.

^ permalink raw reply

* charity projects,
From: king @ 2010-07-17 22:36 UTC (permalink / raw)





^ permalink raw reply

* Re: [PATCH] LSM: Add post recvmsg() hook.
From: Paul Moore @ 2010-07-17 20:34 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev,
	linux-security-module
In-Reply-To: <201007171017.DFC73498.SFFFOMLVJOHOtQ@I-love.SAKURA.ne.jp>

On Friday, July 16, 2010 09:17:10 pm Tetsuo Handa wrote:
> David Miller wrote:
> > From: Tetsuo Handa
> > Date: Sat, 17 Jul 2010 01:14:38 +0900
> > 
> > > Below is a patch for post recvmsg() operation. I modified the patch to
> > > call skb_recv_datagram() again (for udp_recvmsg(), raw_recvmsg(),
> > > udpv6_recvmsg()) if LSM dicided to drop the message. (Regarding
> > > rawv6_recvmsg(), I didn't do so in accordance with the comment at
> > > "csum_copy_err:".)
> > > What do you think about this verion?
> > 
> > This looks fine, but regardless of that comment I think the IPV6 raw
> > recvmsg() should loop just as the IPV4 one does in your patch.
> 
> Thank you, David.
> I updated to call skb_recv_datagram() for rawv6_recvmsg() case too.
> 
> NETWORKING [IPv4/IPv6] maintainers and Paul, is below patch fine for you?

Comments below ...

> >From b43154a90bc7494ec1ee301e692d2bbf29c8f2f8 Mon Sep 17 00:00:00 2001
> 
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Sat, 17 Jul 2010 09:52:38 +0900
> Subject: [PATCH] LSM: Add post recvmsg() hook.
> 
> Current pre recvmsg hook (i.e. security_socket_recvmsg()) has two problems.
> 
> One is that it will cause eating 100% of CPU time if the caller does not
> close() the socket when recvmsg() failed due to security_socket_recvmsg(),
> for subsequent select() notifies the caller of readiness for recvmsg()
> since the datagram which would have been already picked up if
> security_socket_recvmsg() did not return error is remaining in the queue.
> 
> The other is that it is racy if LSM module wants to do filtering based on
> "which process can pick up datagrams from which source" because the process
> which picks up the datagram is not known until skb_recv_datagram() and lock
> is not held between security_socket_recvmsg() and skb_recv_datagram().
> 
> This patch introduces post recvmsg hook (i.e.
> security_socket_post_recvmsg()) in order to solve above problems at the
> cost of ability to pick up the datagram which would have been picked up if
> preceding security_socket_post_recvmsg() did not return error.

We've had discussions before about the merits of queuing inbound packets to 
the socket buffer only to later reject them when the application reads from 
the socket.  I'd be much happier to see you drop the packets before queuing 
them to the socket, e.g. security_sock_rcv_skb(), but I understand that isn't 
possible with TOMOYO's approach to security.

At least we're not talking about TCP sockets :)

I'll go ahead and add my ACK to this patch, but I wonder if it makes more 
sense in the UDP path to add the LSM hook after the decision to calculate the 
checksum prior to the copy?  If we're going to reject the packet due to a bad 
checksum we might as well do that before we waste our time with the LSM 
processing - right?  Although, if we end up doing checksum verification with 
the copy in the majority of the cases it may not be worth it.

Acked-by: Paul Moore <paul.moore@hp.com>

> ---
>  include/linux/security.h |   14 ++++++++++++++
>  net/ipv4/raw.c           |   12 +++++++++---
>  net/ipv4/udp.c           |    9 ++++++++-
>  net/ipv6/raw.c           |   12 +++++++++---
>  net/ipv6/udp.c           |    9 ++++++++-
>  security/capability.c    |    6 ++++++
>  security/security.c      |    6 ++++++
>  7 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 723a93d..409c44d 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -879,6 +879,12 @@ static inline void security_free_mnt_opts(struct
> security_mnt_opts *opts) *	@size contains the size of message structure.
>   *	@flags contains the operational flags.
>   *	Return 0 if permission is granted.
> + * @socket_post_recvmsg:
> + *	Check permission after receiving a message from a socket.
> + *	The message is discarded if permission is not granted.
> + *	@sk contains the sock structure.
> + *	@skb contains the sk_buff structure.
> + *	Return 0 if permission is granted.
>   * @socket_getsockname:
>   *	Check permission before the local address (name) of the socket object
>   *	@sock is retrieved.
> @@ -1575,6 +1581,7 @@ struct security_operations {
>  			       struct msghdr *msg, int size);
>  	int (*socket_recvmsg) (struct socket *sock,
>  			       struct msghdr *msg, int size, int flags);
> +	int (*socket_post_recvmsg) (struct sock *sk, struct sk_buff *skb);
>  	int (*socket_getsockname) (struct socket *sock);
>  	int (*socket_getpeername) (struct socket *sock);
>  	int (*socket_getsockopt) (struct socket *sock, int level, int optname);
> @@ -2526,6 +2533,7 @@ int security_socket_accept(struct socket *sock,
> struct socket *newsock); int security_socket_sendmsg(struct socket *sock,
> struct msghdr *msg, int size); int security_socket_recvmsg(struct socket
> *sock, struct msghdr *msg, int size, int flags);
> +int security_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb);
>  int security_socket_getsockname(struct socket *sock);
>  int security_socket_getpeername(struct socket *sock);
>  int security_socket_getsockopt(struct socket *sock, int level, int
> optname); @@ -2617,6 +2625,12 @@ static inline int
> security_socket_recvmsg(struct socket *sock, return 0;
>  }
> 
> +static inline int security_socket_post_recvmsg(struct sock *sk,
> +					       struct sk_buff *skb)
> +{
> +	return 0;
> +}
> +
>  static inline int security_socket_getsockname(struct socket *sock)
>  {
>  	return 0;
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 2c7a163..69652d4 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -676,9 +676,15 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock
> *sk, struct msghdr *msg, goto out;
>  	}
> 
> -	skb = skb_recv_datagram(sk, flags, noblock, &err);
> -	if (!skb)
> -		goto out;
> +	for (;;) {
> +		skb = skb_recv_datagram(sk, flags, noblock, &err);
> +		if (!skb)
> +			goto out;
> +		err = security_socket_post_recvmsg(sk, skb);
> +		if (likely(!err))
> +			break;
> +		skb_kill_datagram(sk, skb, flags);
> +	}
> 
>  	copied = skb->len;
>  	if (len < copied) {
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 5858574..9145685 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1125,6 +1125,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk,
> struct msghdr *msg, int err;
>  	int is_udplite = IS_UDPLITE(sk);
>  	bool slow;
> +	bool update_stat;
> 
>  	/*
>  	 *	Check any passed addresses
> @@ -1140,6 +1141,12 @@ try_again:
>  				  &peeked, &err);
>  	if (!skb)
>  		goto out;
> +	err = security_socket_post_recvmsg(sk, skb);
> +	if (err) {
> +		update_stat = false;
> +		goto csum_copy_err;
> +	}
> +	update_stat = true;
> 
>  	ulen = skb->len - sizeof(struct udphdr);
>  	if (len > ulen)
> @@ -1200,7 +1207,7 @@ out:
> 
>  csum_copy_err:
>  	slow = lock_sock_fast(sk);
> -	if (!skb_kill_datagram(sk, skb, flags))
> +	if (!skb_kill_datagram(sk, skb, flags) && update_stat)
>  		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
>  	unlock_sock_fast(sk, slow);
> 
> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
> index 4a4dcbe..6915b01 100644
> --- a/net/ipv6/raw.c
> +++ b/net/ipv6/raw.c
> @@ -464,9 +464,15 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct
> sock *sk, if (np->rxpmtu && np->rxopt.bits.rxpmtu)
>  		return ipv6_recv_rxpmtu(sk, msg, len);
> 
> -	skb = skb_recv_datagram(sk, flags, noblock, &err);
> -	if (!skb)
> -		goto out;
> +	for (;;) {
> +		skb = skb_recv_datagram(sk, flags, noblock, &err);
> +		if (!skb)
> +			goto out;
> +		err = security_socket_post_recvmsg(sk, skb);
> +		if (likely(!err))
> +			break;
> +		skb_kill_datagram(sk, skb, flags);
> +	}
> 
>  	copied = skb->len;
>  	if (copied > len) {
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 87be586..6cae276 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -329,6 +329,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
>  	int is_udplite = IS_UDPLITE(sk);
>  	int is_udp4;
>  	bool slow;
> +	bool update_stat;
> 
>  	if (addr_len)
>  		*addr_len=sizeof(struct sockaddr_in6);
> @@ -344,6 +345,12 @@ try_again:
>  				  &peeked, &err);
>  	if (!skb)
>  		goto out;
> +	err = security_socket_post_recvmsg(sk, skb);
> +	if (err) {
> +		update_stat = false;
> +		goto csum_copy_err;
> +	}
> +	update_stat = true;
> 
>  	ulen = skb->len - sizeof(struct udphdr);
>  	if (len > ulen)
> @@ -426,7 +433,7 @@ out:
> 
>  csum_copy_err:
>  	slow = lock_sock_fast(sk);
> -	if (!skb_kill_datagram(sk, skb, flags)) {
> +	if (!skb_kill_datagram(sk, skb, flags) && update_stat) {
>  		if (is_udp4)
>  			UDP_INC_STATS_USER(sock_net(sk),
>  					UDP_MIB_INERRORS, is_udplite);
> diff --git a/security/capability.c b/security/capability.c
> index 4aeb699..709aea3 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -597,6 +597,11 @@ static int cap_socket_recvmsg(struct socket *sock,
> struct msghdr *msg, return 0;
>  }
> 
> +static int cap_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb)
> +{
> +	return 0;
> +}
> +
>  static int cap_socket_getsockname(struct socket *sock)
>  {
>  	return 0;
> @@ -1001,6 +1006,7 @@ void __init security_fixup_ops(struct
> security_operations *ops) set_to_cap_if_null(ops, socket_accept);
>  	set_to_cap_if_null(ops, socket_sendmsg);
>  	set_to_cap_if_null(ops, socket_recvmsg);
> +	set_to_cap_if_null(ops, socket_post_recvmsg);
>  	set_to_cap_if_null(ops, socket_getsockname);
>  	set_to_cap_if_null(ops, socket_getpeername);
>  	set_to_cap_if_null(ops, socket_setsockopt);
> diff --git a/security/security.c b/security/security.c
> index e8c87b8..4291bd7 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1037,6 +1037,12 @@ int security_socket_recvmsg(struct socket *sock,
> struct msghdr *msg, return security_ops->socket_recvmsg(sock, msg, size,
> flags);
>  }
> 
> +int security_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb)
> +{
> +	return security_ops->socket_post_recvmsg(sk, skb);
> +}
> +EXPORT_SYMBOL(security_socket_post_recvmsg);
> +
>  int security_socket_getsockname(struct socket *sock)
>  {
>  	return security_ops->socket_getsockname(sock);

-- 
paul moore
linux @ hp

^ permalink raw reply

* [PATCH 4/4] net: support time stamping in phy devices.
From: Richard Cochran @ 2010-07-17 18:49 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1279391885.git.richard.cochran@omicron.at>

This patch adds a new networking option to allow hardware time stamps
from PHY devices. When enabled, likely candidates among incoming and
outgoing network packets are offered to the PHY driver for possible
time stamping. When accepted by the PHY driver, incoming packets are
deferred for later delivery by the driver.

The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl
and callbacks for transmit and receive time stamping. Drivers may
optionally implement these functions.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 drivers/net/phy/phy.c        |    5 ++
 drivers/net/phy/phy_device.c |    2 +
 include/linux/netdevice.h    |    4 +
 include/linux/phy.h          |   22 +++++++
 include/linux/skbuff.h       |   31 ++++++++++
 net/Kconfig                  |   10 +++
 net/core/Makefile            |    2 +-
 net/core/dev.c               |    3 +
 net/core/timestamping.c      |  126 ++++++++++++++++++++++++++++++++++++++++++
 net/socket.c                 |    4 +
 10 files changed, 208 insertions(+), 1 deletions(-)
 create mode 100644 net/core/timestamping.c

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index bd88d81..5130db8 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -361,6 +361,11 @@ int phy_mii_ioctl(struct phy_device *phydev,
 		}
 		break;
 
+	case SIOCSHWTSTAMP:
+		if (phydev->drv->hwtstamp)
+			return phydev->drv->hwtstamp(phydev, ifr);
+		/* fall through */
+
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1a99bb2..c076119 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -460,6 +460,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 	}
 
 	phydev->attached_dev = dev;
+	dev->phydev = phydev;
 
 	phydev->dev_flags = flags;
 
@@ -513,6 +514,7 @@ EXPORT_SYMBOL(phy_attach);
  */
 void phy_detach(struct phy_device *phydev)
 {
+	phydev->attached_dev->phydev = NULL;
 	phydev->attached_dev = NULL;
 
 	/* If the device had no specific driver before (i.e. - it
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8fa5e5a..131e9c8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -54,6 +54,7 @@
 
 struct vlan_group;
 struct netpoll_info;
+struct phy_device;
 /* 802.11 specific */
 struct wireless_dev;
 					/* source back-compat hooks */
@@ -1077,6 +1078,9 @@ struct net_device {
 #endif
 	/* n-tuple filter list attached to this device */
 	struct ethtool_rx_ntuple_list ethtool_ntuple_list;
+
+	/* phy device may attach itself for hardware timestamping */
+	struct phy_device *phydev;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/linux/phy.h b/include/linux/phy.h
index d63736a..6b0a782 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -234,6 +234,8 @@ enum phy_state {
 	PHY_RESUMING
 };
 
+struct sk_buff;
+
 /* phy_device: An instance of a PHY
  *
  * drv: Pointer to the driver for this PHY instance
@@ -402,6 +404,26 @@ struct phy_driver {
 	/* Clears up any memory if needed */
 	void (*remove)(struct phy_device *phydev);
 
+	/* Handles SIOCSHWTSTAMP ioctl for hardware time stamping. */
+	int  (*hwtstamp)(struct phy_device *phydev, struct ifreq *ifr);
+
+	/*
+	 * Requests a Rx timestamp for 'skb'. If the skb is accepted,
+	 * the phy driver promises to deliver it using netif_rx() as
+	 * soon as a timestamp becomes available. One of the
+	 * PTP_CLASS_ values is passed in 'type'. The function must
+	 * return true if the skb is accepted for delivery.
+	 */
+	bool (*rxtstamp)(struct phy_device *dev, struct sk_buff *skb, int type);
+
+	/*
+	 * Requests a Tx timestamp for 'skb'. The phy driver promises
+	 * to deliver it to the socket's error queue as soon as a
+	 * timestamp becomes available. One of the PTP_CLASS_ values
+	 * is passed in 'type'.
+	 */
+	void (*txtstamp)(struct phy_device *dev, struct sk_buff *skb, int type);
+
 	struct device_driver driver;
 };
 #define to_phy_driver(d) container_of(d, struct phy_driver, driver)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a1b0400..f5aa87e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1933,6 +1933,36 @@ static inline ktime_t net_invalid_timestamp(void)
 	return ktime_set(0, 0);
 }
 
+extern void skb_timestamping_init(void);
+
+#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING
+
+extern void skb_clone_tx_timestamp(struct sk_buff *skb);
+extern bool skb_defer_rx_timestamp(struct sk_buff *skb);
+
+#else /* CONFIG_NETWORK_PHY_TIMESTAMPING */
+
+static inline void skb_clone_tx_timestamp(struct sk_buff *skb)
+{
+}
+
+static inline bool skb_defer_rx_timestamp(struct sk_buff *skb)
+{
+	return false;
+}
+
+#endif /* !CONFIG_NETWORK_PHY_TIMESTAMPING */
+
+/**
+ * skb_complete_tx_timestamp() - deliver cloned skb with tx timestamps
+ *
+ * @skb: clone of the the original outgoing packet
+ * @hwtstamps: hardware time stamps
+ *
+ */
+void skb_complete_tx_timestamp(struct sk_buff *skb,
+			       struct skb_shared_hwtstamps *hwtstamps);
+
 /**
  * skb_tstamp_tx - queue clone of skb with send time stamps
  * @orig_skb:	the original outgoing packet
@@ -1965,6 +1995,7 @@ static inline void sw_tx_timestamp(struct sk_buff *skb)
  */
 static inline void skb_tx_timestamp(struct sk_buff *skb)
 {
+	skb_clone_tx_timestamp(skb);
 	sw_tx_timestamp(skb);
 }
 
diff --git a/net/Kconfig b/net/Kconfig
index 0d68b40..b325094 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -86,6 +86,16 @@ config NETWORK_SECMARK
 	  to nfmark, but designated for security purposes.
 	  If you are unsure how to answer this question, answer N.
 
+config NETWORK_PHY_TIMESTAMPING
+	bool "Timestamping in PHY devices"
+	depends on EXPERIMENTAL
+	help
+	  This allows timestamping of network packets by PHYs with
+	  hardware timestamping capabilities. This option adds some
+	  overhead in the transmit and receive paths.
+
+	  If you are unsure how to answer this question, answer N.
+
 menuconfig NETFILTER
 	bool "Network packet filtering framework (Netfilter)"
 	---help---
diff --git a/net/core/Makefile b/net/core/Makefile
index 51c3eec..8a04dd2 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -18,4 +18,4 @@ obj-$(CONFIG_NET_DMA) += user_dma.o
 obj-$(CONFIG_FIB_RULES) += fib_rules.o
 obj-$(CONFIG_TRACEPOINTS) += net-traces.o
 obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
-
+obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
diff --git a/net/core/dev.c b/net/core/dev.c
index e85cc5f..0804c79 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2939,6 +2939,9 @@ int netif_receive_skb(struct sk_buff *skb)
 	if (netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	if (skb_defer_rx_timestamp(skb))
+		return NET_RX_SUCCESS;
+
 #ifdef CONFIG_RPS
 	{
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
diff --git a/net/core/timestamping.c b/net/core/timestamping.c
new file mode 100644
index 0000000..0ae6c22
--- /dev/null
+++ b/net/core/timestamping.c
@@ -0,0 +1,126 @@
+/*
+ * PTP 1588 clock support - support for timestamping in PHY devices
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/errqueue.h>
+#include <linux/phy.h>
+#include <linux/ptp_classify.h>
+#include <linux/skbuff.h>
+
+static struct sock_filter ptp_filter[] = {
+	PTP_FILTER
+};
+
+static unsigned int classify(struct sk_buff *skb)
+{
+	if (likely(skb->dev &&
+		   skb->dev->phydev &&
+		   skb->dev->phydev->drv))
+		return sk_run_filter(skb, ptp_filter, ARRAY_SIZE(ptp_filter));
+	else
+		return PTP_CLASS_NONE;
+}
+
+void skb_clone_tx_timestamp(struct sk_buff *skb)
+{
+	struct phy_device *phydev;
+	struct sk_buff *clone;
+	struct sock *sk = skb->sk;
+	unsigned int type;
+
+	if (!sk)
+		return;
+
+	type = classify(skb);
+
+	switch (type) {
+	case PTP_CLASS_V1_IPV4:
+	case PTP_CLASS_V1_IPV6:
+	case PTP_CLASS_V2_IPV4:
+	case PTP_CLASS_V2_IPV6:
+	case PTP_CLASS_V2_L2:
+	case PTP_CLASS_V2_VLAN:
+		phydev = skb->dev->phydev;
+		if (likely(phydev->drv->txtstamp)) {
+			clone = skb_clone(skb, GFP_ATOMIC);
+			if (!clone)
+				return;
+			clone->sk = sk;
+			phydev->drv->txtstamp(phydev, clone, type);
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+void skb_complete_tx_timestamp(struct sk_buff *skb,
+			       struct skb_shared_hwtstamps *hwtstamps)
+{
+	struct sock *sk = skb->sk;
+	struct sock_exterr_skb *serr;
+	int err;
+
+	if (!hwtstamps)
+		return;
+
+	*skb_hwtstamps(skb) = *hwtstamps;
+	serr = SKB_EXT_ERR(skb);
+	memset(serr, 0, sizeof(*serr));
+	serr->ee.ee_errno = ENOMSG;
+	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+	skb->sk = NULL;
+	err = sock_queue_err_skb(sk, skb);
+	if (err)
+		kfree_skb(skb);
+}
+EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
+
+bool skb_defer_rx_timestamp(struct sk_buff *skb)
+{
+	struct phy_device *phydev;
+	unsigned int type;
+
+	skb_push(skb, ETH_HLEN);
+
+	type = classify(skb);
+
+	skb_pull(skb, ETH_HLEN);
+
+	switch (type) {
+	case PTP_CLASS_V1_IPV4:
+	case PTP_CLASS_V1_IPV6:
+	case PTP_CLASS_V2_IPV4:
+	case PTP_CLASS_V2_IPV6:
+	case PTP_CLASS_V2_L2:
+	case PTP_CLASS_V2_VLAN:
+		phydev = skb->dev->phydev;
+		if (likely(phydev->drv->rxtstamp))
+			return phydev->drv->rxtstamp(phydev, skb, type);
+		break;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+void __init skb_timestamping_init(void)
+{
+	BUG_ON(sk_chk_filter(ptp_filter, ARRAY_SIZE(ptp_filter)));
+}
diff --git a/net/socket.c b/net/socket.c
index acfa173..b8a03b8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2403,6 +2403,10 @@ static int __init sock_init(void)
 	netfilter_init();
 #endif
 
+#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING
+	skb_timestamping_init();
+#endif
+
 	return 0;
 }
 
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 3/4] net: added a BPF to help drivers detect PTP packets.
From: Richard Cochran @ 2010-07-17 18:49 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1279391885.git.richard.cochran@omicron.at>

Certain kinds of hardware time stamping units in both MACs and PHYs have
the limitation that they can only time stamp PTP packets. Drivers for such
hardware are left with the task of correctly matching skbs to time stamps.
This patch adds a BPF that drivers can use to classify PTP packets when
needed.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 include/linux/ptp_classify.h |  126 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 126 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/ptp_classify.h

diff --git a/include/linux/ptp_classify.h b/include/linux/ptp_classify.h
new file mode 100644
index 0000000..943a85a
--- /dev/null
+++ b/include/linux/ptp_classify.h
@@ -0,0 +1,126 @@
+/*
+ * PTP 1588 support
+ *
+ * This file implements a BPF that recognizes PTP event messages.
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _PTP_CLASSIFY_H_
+#define _PTP_CLASSIFY_H_
+
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/filter.h>
+#ifdef __KERNEL__
+#include <linux/in.h>
+#else
+#include <netinet/in.h>
+#endif
+
+#define PTP_CLASS_NONE  0x00 /* not a PTP event message */
+#define PTP_CLASS_V1    0x01 /* protocol version 1 */
+#define PTP_CLASS_V2    0x02 /* protocol version 2 */
+#define PTP_CLASS_VMASK 0x0f /* max protocol version is 15 */
+#define PTP_CLASS_IPV4  0x10 /* event in an IPV4 UDP packet */
+#define PTP_CLASS_IPV6  0x20 /* event in an IPV6 UDP packet */
+#define PTP_CLASS_L2    0x30 /* event in a L2 packet */
+#define PTP_CLASS_VLAN  0x40 /* event in a VLAN tagged L2 packet */
+#define PTP_CLASS_PMASK 0xf0 /* mask for the packet type field */
+
+#define PTP_CLASS_V1_IPV4 (PTP_CLASS_V1 | PTP_CLASS_IPV4)
+#define PTP_CLASS_V1_IPV6 (PTP_CLASS_V1 | PTP_CLASS_IPV6) /*probably DNE*/
+#define PTP_CLASS_V2_IPV4 (PTP_CLASS_V2 | PTP_CLASS_IPV4)
+#define PTP_CLASS_V2_IPV6 (PTP_CLASS_V2 | PTP_CLASS_IPV6)
+#define PTP_CLASS_V2_L2   (PTP_CLASS_V2 | PTP_CLASS_L2)
+#define PTP_CLASS_V2_VLAN (PTP_CLASS_V2 | PTP_CLASS_VLAN)
+
+#define PTP_EV_PORT 319
+
+#define OFF_ETYPE	12
+#define OFF_IHL		14
+#define OFF_FRAG	20
+#define OFF_PROTO4	23
+#define OFF_NEXT	6
+#define OFF_UDP_DST	2
+
+#define IP6_HLEN	40
+#define UDP_HLEN	8
+
+#define RELOFF_DST4	(ETH_HLEN + OFF_UDP_DST)
+#define OFF_DST6	(ETH_HLEN + IP6_HLEN + OFF_UDP_DST)
+#define OFF_PTP6	(ETH_HLEN + IP6_HLEN + UDP_HLEN)
+
+#define OP_AND	(BPF_ALU | BPF_AND  | BPF_K)
+#define OP_JEQ	(BPF_JMP | BPF_JEQ  | BPF_K)
+#define OP_JSET	(BPF_JMP | BPF_JSET | BPF_K)
+#define OP_LDB	(BPF_LD  | BPF_B    | BPF_ABS)
+#define OP_LDH	(BPF_LD  | BPF_H    | BPF_ABS)
+#define OP_LDHI	(BPF_LD  | BPF_H    | BPF_IND)
+#define OP_LDX	(BPF_LDX | BPF_B    | BPF_MSH)
+#define OP_OR	(BPF_ALU | BPF_OR   | BPF_K)
+#define OP_RETA	(BPF_RET | BPF_A)
+#define OP_RETK	(BPF_RET | BPF_K)
+
+static inline int ptp_filter_init(struct sock_filter *f, int len)
+{
+	if (OP_LDH == f[0].code)
+		return sk_chk_filter(f, len);
+	else
+		return 0;
+}
+
+#define PTP_FILTER \
+	{OP_LDH,	0,   0, OFF_ETYPE		}, /*              */ \
+	{OP_JEQ,	0,  12, ETH_P_IP		}, /* f goto L20   */ \
+	{OP_LDB,	0,   0, OFF_PROTO4		}, /*              */ \
+	{OP_JEQ,	0,   9, IPPROTO_UDP		}, /* f goto L10   */ \
+	{OP_LDH,	0,   0, OFF_FRAG		}, /*              */ \
+	{OP_JSET,	7,   0, 0x1fff			}, /* t goto L11   */ \
+	{OP_LDX,	0,   0, OFF_IHL			}, /*              */ \
+	{OP_LDHI,	0,   0, RELOFF_DST4		}, /*              */ \
+	{OP_JEQ,	0,   4, PTP_EV_PORT		}, /* f goto L12   */ \
+	{OP_LDHI,	0,   0, ETH_HLEN + UDP_HLEN	}, /*              */ \
+	{OP_AND,	0,   0, PTP_CLASS_VMASK		}, /*              */ \
+	{OP_OR,		0,   0, PTP_CLASS_IPV4		}, /*              */ \
+	{OP_RETA,	0,   0, 0			}, /*              */ \
+/*L1x*/	{OP_RETK,	0,   0, PTP_CLASS_NONE		}, /*              */ \
+/*L20*/	{OP_JEQ,	0,   9, ETH_P_IPV6		}, /* f goto L40   */ \
+	{OP_LDB,	0,   0, ETH_HLEN + OFF_NEXT	}, /*              */ \
+	{OP_JEQ,	0,   6, IPPROTO_UDP		}, /* f goto L30   */ \
+	{OP_LDH,	0,   0, OFF_DST6		}, /*              */ \
+	{OP_JEQ,	0,   4, PTP_EV_PORT		}, /* f goto L31   */ \
+	{OP_LDH,	0,   0, OFF_PTP6		}, /*              */ \
+	{OP_AND,	0,   0, PTP_CLASS_VMASK		}, /*              */ \
+	{OP_OR,		0,   0, PTP_CLASS_IPV6		}, /*              */ \
+	{OP_RETA,	0,   0, 0			}, /*              */ \
+/*L3x*/	{OP_RETK,	0,   0, PTP_CLASS_NONE		}, /*              */ \
+/*L40*/	{OP_JEQ,	0,   6, ETH_P_8021Q		}, /* f goto L50   */ \
+	{OP_LDH,	0,   0, OFF_ETYPE + 4		}, /*              */ \
+	{OP_JEQ,	0,   9, ETH_P_1588		}, /* f goto L60   */ \
+	{OP_LDH,	0,   0, ETH_HLEN + VLAN_HLEN	}, /*              */ \
+	{OP_AND,	0,   0, PTP_CLASS_VMASK		}, /*              */ \
+	{OP_OR,		0,   0, PTP_CLASS_VLAN		}, /*              */ \
+	{OP_RETA,	0,   0, 0			}, /*              */ \
+/*L50*/	{OP_JEQ,	0,   4, ETH_P_1588		}, /* f goto L61   */ \
+	{OP_LDH,	0,   0, ETH_HLEN		}, /*              */ \
+	{OP_AND,	0,   0, PTP_CLASS_VMASK		}, /*              */ \
+	{OP_OR,		0,   0, PTP_CLASS_L2		}, /*              */ \
+	{OP_RETA,	0,   0, 0			}, /*              */ \
+/*L6x*/	{OP_RETK,	0,   0, PTP_CLASS_NONE		},
+
+#endif
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 2/4] net: preserve ifreq parameter when calling generic phy_mii_ioctl().
From: Richard Cochran @ 2010-07-17 18:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1279391885.git.richard.cochran@omicron.at>

The phy_mii_ioctl() function unnecessarily throws away the original ifreq.
We need access to the ifreq in order to support PHYs that can perform
hardware time stamping.

Two maverick drivers filter the ioctl commands passed to phy_mii_ioctl().
This is unnecessary since phylib will check the command in any case.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 drivers/net/arm/ixp4xx_eth.c           |    3 ++-
 drivers/net/au1000_eth.c               |    2 +-
 drivers/net/bcm63xx_enet.c             |    2 +-
 drivers/net/cpmac.c                    |    5 +----
 drivers/net/dnet.c                     |    2 +-
 drivers/net/ethoc.c                    |    2 +-
 drivers/net/fec.c                      |    2 +-
 drivers/net/fec_mpc52xx.c              |    2 +-
 drivers/net/fs_enet/fs_enet-main.c     |    3 +--
 drivers/net/gianfar.c                  |    2 +-
 drivers/net/macb.c                     |    2 +-
 drivers/net/mv643xx_eth.c              |    2 +-
 drivers/net/octeon/octeon_mgmt.c       |    2 +-
 drivers/net/phy/phy.c                  |    3 ++-
 drivers/net/sb1250-mac.c               |    2 +-
 drivers/net/sh_eth.c                   |    2 +-
 drivers/net/smsc911x.c                 |    2 +-
 drivers/net/smsc9420.c                 |    2 +-
 drivers/net/stmmac/stmmac_main.c       |   22 ++++++++--------------
 drivers/net/tc35815.c                  |    2 +-
 drivers/net/tg3.c                      |    2 +-
 drivers/net/ucc_geth.c                 |    2 +-
 drivers/staging/octeon/ethernet-mdio.c |    2 +-
 include/linux/phy.h                    |    2 +-
 net/dsa/slave.c                        |    3 +--
 25 files changed, 34 insertions(+), 43 deletions(-)

diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
index ee2f842..4f1cc71 100644
--- a/drivers/net/arm/ixp4xx_eth.c
+++ b/drivers/net/arm/ixp4xx_eth.c
@@ -782,7 +782,8 @@ static int eth_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
 
 	if (!netif_running(dev))
 		return -EINVAL;
-	return phy_mii_ioctl(port->phydev, if_mii(req), cmd);
+
+	return phy_mii_ioctl(port->phydev, req, cmd);
 }
 
 /* ethtool support */
diff --git a/drivers/net/au1000_eth.c b/drivers/net/au1000_eth.c
index ece6128..386d4fe 100644
--- a/drivers/net/au1000_eth.c
+++ b/drivers/net/au1000_eth.c
@@ -978,7 +978,7 @@ static int au1000_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!aup->phy_dev)
 		return -EINVAL; /* PHY not controllable */
 
-	return phy_mii_ioctl(aup->phy_dev, if_mii(rq), cmd);
+	return phy_mii_ioctl(aup->phy_dev, rq, cmd);
 }
 
 static const struct net_device_ops au1000_netdev_ops = {
diff --git a/drivers/net/bcm63xx_enet.c b/drivers/net/bcm63xx_enet.c
index faf5add..0d2c5da 100644
--- a/drivers/net/bcm63xx_enet.c
+++ b/drivers/net/bcm63xx_enet.c
@@ -1496,7 +1496,7 @@ static int bcm_enet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (priv->has_phy) {
 		if (!priv->phydev)
 			return -ENODEV;
-		return phy_mii_ioctl(priv->phydev, if_mii(rq), cmd);
+		return phy_mii_ioctl(priv->phydev, rq, cmd);
 	} else {
 		struct mii_if_info mii;
 
diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c
index 1756d28..cdb05bb 100644
--- a/drivers/net/cpmac.c
+++ b/drivers/net/cpmac.c
@@ -846,11 +846,8 @@ static int cpmac_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		return -EINVAL;
 	if (!priv->phy)
 		return -EINVAL;
-	if ((cmd == SIOCGMIIPHY) || (cmd == SIOCGMIIREG) ||
-	    (cmd == SIOCSMIIREG))
-		return phy_mii_ioctl(priv->phy, if_mii(ifr), cmd);
 
-	return -EOPNOTSUPP;
+	return phy_mii_ioctl(priv->phy, ifr, cmd);
 }
 
 static int cpmac_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
diff --git a/drivers/net/dnet.c b/drivers/net/dnet.c
index 8b0f50b..4ea7141 100644
--- a/drivers/net/dnet.c
+++ b/drivers/net/dnet.c
@@ -797,7 +797,7 @@ static int dnet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!phydev)
 		return -ENODEV;
 
-	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(phydev, rq, cmd);
 }
 
 static void dnet_get_drvinfo(struct net_device *dev,
diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 37ce8ac..d9f3106 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -732,7 +732,7 @@ static int ethoc_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		phy = priv->phy;
 	}
 
-	return phy_mii_ioctl(phy, mdio, cmd);
+	return phy_mii_ioctl(phy, ifr, cmd);
 }
 
 static int ethoc_config(struct net_device *dev, struct ifmap *map)
diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index b4afd7a..1670866 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -828,7 +828,7 @@ static int fec_enet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!phydev)
 		return -ENODEV;
 
-	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(phydev, rq, cmd);
 }
 
 static void fec_enet_free_buffers(struct net_device *dev)
diff --git a/drivers/net/fec_mpc52xx.c b/drivers/net/fec_mpc52xx.c
index 25e6cc6..fdbf148 100644
--- a/drivers/net/fec_mpc52xx.c
+++ b/drivers/net/fec_mpc52xx.c
@@ -826,7 +826,7 @@ static int mpc52xx_fec_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!priv->phydev)
 		return -ENOTSUPP;
 
-	return phy_mii_ioctl(priv->phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(priv->phydev, rq, cmd);
 }
 
 static const struct net_device_ops mpc52xx_fec_netdev_ops = {
diff --git a/drivers/net/fs_enet/fs_enet-main.c b/drivers/net/fs_enet/fs_enet-main.c
index 309a0ea..f08cff9 100644
--- a/drivers/net/fs_enet/fs_enet-main.c
+++ b/drivers/net/fs_enet/fs_enet-main.c
@@ -963,12 +963,11 @@ static const struct ethtool_ops fs_ethtool_ops = {
 static int fs_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
 	struct fs_enet_private *fep = netdev_priv(dev);
-	struct mii_ioctl_data *mii = (struct mii_ioctl_data *)&rq->ifr_data;
 
 	if (!netif_running(dev))
 		return -EINVAL;
 
-	return phy_mii_ioctl(fep->phydev, mii, cmd);
+	return phy_mii_ioctl(fep->phydev, rq, cmd);
 }
 
 extern int fs_mii_connect(struct net_device *dev);
diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index fccb7a3..1d16d51 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -847,7 +847,7 @@ static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!priv->phydev)
 		return -ENODEV;
 
-	return phy_mii_ioctl(priv->phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(priv->phydev, rq, cmd);
 }
 
 static unsigned int reverse_bitmap(unsigned int bit_map, unsigned int max_qs)
diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index 40797fb..ff2f158 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -1082,7 +1082,7 @@ static int macb_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!phydev)
 		return -ENODEV;
 
-	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(phydev, rq, cmd);
 }
 
 static const struct net_device_ops macb_netdev_ops = {
diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 82b720f..0561425 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -2457,7 +2457,7 @@ static int mv643xx_eth_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	struct mv643xx_eth_private *mp = netdev_priv(dev);
 
 	if (mp->phy != NULL)
-		return phy_mii_ioctl(mp->phy, if_mii(ifr), cmd);
+		return phy_mii_ioctl(mp->phy, ifr, cmd);
 
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/net/octeon/octeon_mgmt.c b/drivers/net/octeon/octeon_mgmt.c
index f4a0f08..b264f0f 100644
--- a/drivers/net/octeon/octeon_mgmt.c
+++ b/drivers/net/octeon/octeon_mgmt.c
@@ -620,7 +620,7 @@ static int octeon_mgmt_ioctl(struct net_device *netdev,
 	if (!p->phydev)
 		return -EINVAL;
 
-	return phy_mii_ioctl(p->phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(p->phydev, rq, cmd);
 }
 
 static void octeon_mgmt_adjust_link(struct net_device *netdev)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 64be466..bd88d81 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -309,8 +309,9 @@ EXPORT_SYMBOL(phy_ethtool_gset);
  * current state.  Use at own risk.
  */
 int phy_mii_ioctl(struct phy_device *phydev,
-		struct mii_ioctl_data *mii_data, int cmd)
+		struct ifreq *ifr, int cmd)
 {
+	struct mii_ioctl_data *mii_data = if_mii(ifr);
 	u16 val = mii_data->val_in;
 
 	switch (cmd) {
diff --git a/drivers/net/sb1250-mac.c b/drivers/net/sb1250-mac.c
index 1f3acc3..e585c3f 100644
--- a/drivers/net/sb1250-mac.c
+++ b/drivers/net/sb1250-mac.c
@@ -2532,7 +2532,7 @@ static int sbmac_mii_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!netif_running(dev) || !sc->phy_dev)
 		return -EINVAL;
 
-	return phy_mii_ioctl(sc->phy_dev, if_mii(rq), cmd);
+	return phy_mii_ioctl(sc->phy_dev, rq, cmd);
 }
 
 static int sbmac_close(struct net_device *dev)
diff --git a/drivers/net/sh_eth.c b/drivers/net/sh_eth.c
index 501a55f..8279f8e 100644
--- a/drivers/net/sh_eth.c
+++ b/drivers/net/sh_eth.c
@@ -1233,7 +1233,7 @@ static int sh_eth_do_ioctl(struct net_device *ndev, struct ifreq *rq,
 	if (!phydev)
 		return -ENODEV;
 
-	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(phydev, rq, cmd);
 }
 
 #if defined(SH_ETH_HAS_TSU)
diff --git a/drivers/net/smsc911x.c b/drivers/net/smsc911x.c
index cc55974..56dc2ff 100644
--- a/drivers/net/smsc911x.c
+++ b/drivers/net/smsc911x.c
@@ -1538,7 +1538,7 @@ static int smsc911x_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	if (!netif_running(dev) || !pdata->phy_dev)
 		return -EINVAL;
 
-	return phy_mii_ioctl(pdata->phy_dev, if_mii(ifr), cmd);
+	return phy_mii_ioctl(pdata->phy_dev, ifr, cmd);
 }
 
 static int
diff --git a/drivers/net/smsc9420.c b/drivers/net/smsc9420.c
index 6cdee6a..b09ee1c 100644
--- a/drivers/net/smsc9420.c
+++ b/drivers/net/smsc9420.c
@@ -245,7 +245,7 @@ static int smsc9420_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	if (!netif_running(dev) || !pd->phy_dev)
 		return -EINVAL;
 
-	return phy_mii_ioctl(pd->phy_dev, if_mii(ifr), cmd);
+	return phy_mii_ioctl(pd->phy_dev, ifr, cmd);
 }
 
 static int smsc9420_ethtool_get_settings(struct net_device *dev,
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index a31d580..acf0616 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -1437,24 +1437,18 @@ static void stmmac_poll_controller(struct net_device *dev)
 static int stmmac_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
-	int ret = -EOPNOTSUPP;
+	int ret;
 
 	if (!netif_running(dev))
 		return -EINVAL;
 
-	switch (cmd) {
-	case SIOCGMIIPHY:
-	case SIOCGMIIREG:
-	case SIOCSMIIREG:
-		if (!priv->phydev)
-			return -EINVAL;
-
-		spin_lock(&priv->lock);
-		ret = phy_mii_ioctl(priv->phydev, if_mii(rq), cmd);
-		spin_unlock(&priv->lock);
-	default:
-		break;
-	}
+	if (!priv->phydev)
+		return -EINVAL;
+
+	spin_lock(&priv->lock);
+	ret = phy_mii_ioctl(priv->phydev, rq, cmd);
+	spin_unlock(&priv->lock);
+
 	return ret;
 }
 
diff --git a/drivers/net/tc35815.c b/drivers/net/tc35815.c
index be08b75..99e423a 100644
--- a/drivers/net/tc35815.c
+++ b/drivers/net/tc35815.c
@@ -2066,7 +2066,7 @@ static int tc35815_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		return -EINVAL;
 	if (!lp->phy_dev)
 		return -ENODEV;
-	return phy_mii_ioctl(lp->phy_dev, if_mii(rq), cmd);
+	return phy_mii_ioctl(lp->phy_dev, rq, cmd);
 }
 
 static void tc35815_chip_reset(struct net_device *dev)
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 289cdc5..d4163f2 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -10954,7 +10954,7 @@ static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		if (!(tp->tg3_flags3 & TG3_FLG3_PHY_CONNECTED))
 			return -EAGAIN;
 		phydev = tp->mdio_bus->phy_map[TG3_PHY_MII_ADDR];
-		return phy_mii_ioctl(phydev, data, cmd);
+		return phy_mii_ioctl(phydev, ifr, cmd);
 	}
 
 	switch (cmd) {
diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index dc32a62..e17dd74 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3714,7 +3714,7 @@ static int ucc_geth_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!ugeth->phydev)
 		return -ENODEV;
 
-	return phy_mii_ioctl(ugeth->phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(ugeth->phydev, rq, cmd);
 }
 
 static const struct net_device_ops ucc_geth_netdev_ops = {
diff --git a/drivers/staging/octeon/ethernet-mdio.c b/drivers/staging/octeon/ethernet-mdio.c
index 7e0be8d..10a82ef 100644
--- a/drivers/staging/octeon/ethernet-mdio.c
+++ b/drivers/staging/octeon/ethernet-mdio.c
@@ -113,7 +113,7 @@ int cvm_oct_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!priv->phydev)
 		return -EINVAL;
 
-	return phy_mii_ioctl(priv->phydev, if_mii(rq), cmd);
+	return phy_mii_ioctl(priv->phydev, rq, cmd);
 }
 
 static void cvm_oct_adjust_link(struct net_device *dev)
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 987e111..d63736a 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -498,7 +498,7 @@ void phy_stop_machine(struct phy_device *phydev);
 int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
 int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd);
 int phy_mii_ioctl(struct phy_device *phydev,
-		struct mii_ioctl_data *mii_data, int cmd);
+		struct ifreq *ifr, int cmd);
 int phy_start_interrupts(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
 struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int phy_id);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8fdca56..64ca2a6 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -164,10 +164,9 @@ out:
 static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct mii_ioctl_data *mii_data = if_mii(ifr);
 
 	if (p->phy != NULL)
-		return phy_mii_ioctl(p->phy, mii_data, cmd);
+		return phy_mii_ioctl(p->phy, ifr, cmd);
 
 	return -EOPNOTSUPP;
 }
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 1/4] net: add driver hook for tx time stamping.
From: Richard Cochran @ 2010-07-17 18:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1279391885.git.richard.cochran@omicron.at>

This patch adds a hook for transmit time stamps. The transmit hook
allows a software fallback for transmit time stamps, for MACs
lacking time stamping hardware. Using the hook will still require
adding an inline function call to each MAC driver.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 include/linux/skbuff.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac74ee0..a1b0400 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1947,6 +1947,27 @@ static inline ktime_t net_invalid_timestamp(void)
 extern void skb_tstamp_tx(struct sk_buff *orig_skb,
 			struct skb_shared_hwtstamps *hwtstamps);
 
+static inline void sw_tx_timestamp(struct sk_buff *skb)
+{
+	union skb_shared_tx *shtx = skb_tx(skb);
+	if (shtx->software && !shtx->in_progress)
+		skb_tstamp_tx(skb, NULL);
+}
+
+/**
+ * skb_tx_timestamp() - Driver hook for transmit timestamping
+ *
+ * Ethernet MAC Drivers should call this function in their hard_xmit()
+ * function as soon as possible after giving the sk_buff to the MAC
+ * hardware, but before freeing the sk_buff.
+ *
+ * @skb: A socket buffer.
+ */
+static inline void skb_tx_timestamp(struct sk_buff *skb)
+{
+	sw_tx_timestamp(skb);
+}
+
 extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
 extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
 
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH v3 0/4] Extend Time Stamping
From: Richard Cochran @ 2010-07-17 18:48 UTC (permalink / raw)
  To: netdev

This patch set extends the packet time stamping capabilites of the
network stack in two ways.

1. The first patch presents a work-around for the TX software time
   stamping fallback problem cited in cd4d8fdad1f1. The idea is to add
   one inline function into each MAC driver. This function will act
   as hooks for current (and possible future) time stamping needs,
   once they are placed correctly within each MAC driver.

2. The other patches prepare the way for PHY drivers to offer time
   stamping.

I am preparing a new round of patches for PTP support, but it will
require the changes in this patch set in order to function. Thus I
would like to have this patch set reviewed (and hopefully merged) in
order to go forward.

Thanks,
Richard

* Patch ChangeLog
** v3
   After having received plenty of criticism on the idea of reading
   from the MDIO bus during the critical paths, this version presents
   a new approach. Now, when the CONFIG option for PHY time stamping
   is enabled, likely packets for time stamping are identified by the
   stack and presented to the PHY driver. The driver may accept the
   packet and defer its deliverly until the time stamp becomes
   available at a later time. This approach brings four main
   advantages.

   1. Now only one MAC driver hook is necessary.

   2. It leaves the option of how to get the time stamps open,
      allowing, for example, a driver to get them either via MDIO or
      PHY status frames.

   3. If reading the MDIO bus is required to get the time stamp, this
      can be done during as a work queue task.

   4. It moves the identification of PTP packets into the stack, so
      that each new PHY driver that comes only will not have to
      duplicate this logic again.

** v2
   Removed the CONFIG option for the driver hooks.

Richard Cochran (4):
  net: add driver hook for tx time stamping.
  net: preserve ifreq parameter when calling generic phy_mii_ioctl().
  net: added a BPF to help drivers detect PTP packets.
  net: support time stamping in phy devices.

 drivers/net/arm/ixp4xx_eth.c           |    3 +-
 drivers/net/au1000_eth.c               |    2 +-
 drivers/net/bcm63xx_enet.c             |    2 +-
 drivers/net/cpmac.c                    |    5 +-
 drivers/net/dnet.c                     |    2 +-
 drivers/net/ethoc.c                    |    2 +-
 drivers/net/fec.c                      |    2 +-
 drivers/net/fec_mpc52xx.c              |    2 +-
 drivers/net/fs_enet/fs_enet-main.c     |    3 +-
 drivers/net/gianfar.c                  |    2 +-
 drivers/net/macb.c                     |    2 +-
 drivers/net/mv643xx_eth.c              |    2 +-
 drivers/net/octeon/octeon_mgmt.c       |    2 +-
 drivers/net/phy/phy.c                  |    8 ++-
 drivers/net/phy/phy_device.c           |    2 +
 drivers/net/sb1250-mac.c               |    2 +-
 drivers/net/sh_eth.c                   |    2 +-
 drivers/net/smsc911x.c                 |    2 +-
 drivers/net/smsc9420.c                 |    2 +-
 drivers/net/stmmac/stmmac_main.c       |   22 ++----
 drivers/net/tc35815.c                  |    2 +-
 drivers/net/tg3.c                      |    2 +-
 drivers/net/ucc_geth.c                 |    2 +-
 drivers/staging/octeon/ethernet-mdio.c |    2 +-
 include/linux/netdevice.h              |    4 +
 include/linux/phy.h                    |   24 ++++++-
 include/linux/ptp_classify.h           |  126 ++++++++++++++++++++++++++++++++
 include/linux/skbuff.h                 |   52 +++++++++++++
 net/Kconfig                            |   10 +++
 net/core/Makefile                      |    2 +-
 net/core/dev.c                         |    3 +
 net/core/timestamping.c                |  126 ++++++++++++++++++++++++++++++++
 net/dsa/slave.c                        |    3 +-
 net/socket.c                           |    4 +
 34 files changed, 389 insertions(+), 44 deletions(-)
 create mode 100644 include/linux/ptp_classify.h
 create mode 100644 net/core/timestamping.c

^ permalink raw reply

* [patch] arcnet: fix signed bug in probe function
From: Dan Carpenter @ 2010-07-17 17:21 UTC (permalink / raw)
  To: netdev; +Cc: kernel-janitors

probe_irq_off() returns the first irq found or if two irqs are found
then it returns the negative of the first irq found.  We can cast
dev->irq to an int so that the test for negative values works.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/drivers/net/arcnet/com20020-isa.c b/drivers/net/arcnet/com20020-isa.c
index 0402da3..3727282 100644
--- a/drivers/net/arcnet/com20020-isa.c
+++ b/drivers/net/arcnet/com20020-isa.c
@@ -90,14 +90,14 @@ static int __init com20020isa_probe(struct net_device *dev)
 		outb(0, _INTMASK);
 		dev->irq = probe_irq_off(airqmask);
 
-		if (dev->irq <= 0) {
+		if ((int)dev->irq <= 0) {
 			BUGMSG(D_INIT_REASONS, "Autoprobe IRQ failed first time\n");
 			airqmask = probe_irq_on();
 			outb(NORXflag, _INTMASK);
 			udelay(5);
 			outb(0, _INTMASK);
 			dev->irq = probe_irq_off(airqmask);
-			if (dev->irq <= 0) {
+			if ((int)dev->irq <= 0) {
 				BUGMSG(D_NORMAL, "Autoprobe IRQ failed.\n");
 				err = -ENODEV;
 				goto out;
diff --git a/drivers/net/arcnet/com90io.c b/drivers/net/arcnet/com90io.c
index 4cb4018..eb27976 100644
--- a/drivers/net/arcnet/com90io.c
+++ b/drivers/net/arcnet/com90io.c
@@ -213,7 +213,7 @@ static int __init com90io_probe(struct net_device *dev)
 		outb(0, _INTMASK);
 		dev->irq = probe_irq_off(airqmask);
 
-		if (dev->irq <= 0) {
+		if ((int)dev->irq <= 0) {
 			BUGMSG(D_INIT_REASONS, "Autoprobe IRQ failed\n");
 			goto err_out;
 		}

^ permalink raw reply related

* [PATCH 5/5] net: dccp: fix sign bug
From: Kulikov Vasiliy @ 2010-07-17 15:21 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Arnaldo Carvalho de Melo, David S. Miller, Gerrit Renker, dccp,
	netdev

'gap' is unsigned, so this code is wrong:

    gap = -new_head;
    ...
    if (gap > 0) { ... }

Make 'gap' signed.


The semantic patch that finds this problem (many false-positive results):
(http://coccinelle.lip6.fr/)

// <smpl>
@ r1 @
identifier f;
@@
int f(...) { ... }

@@
identifier r1.f;
type T;
unsigned T x;
@@

*x = f(...)
 ...
*x > 0

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
---
 net/dccp/ackvec.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dccp/ackvec.c b/net/dccp/ackvec.c
index 2abddee..92a6fcb 100644
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -201,7 +201,7 @@ static inline int dccp_ackvec_set_buf_head_state(struct dccp_ackvec *av,
 						 const unsigned int packets,
 						 const unsigned char state)
 {
-	unsigned int gap;
+	long gap;
 	long new_head;
 
 	if (av->av_vec_len + packets > DCCP_MAX_ACKVEC_LEN)
-- 
1.7.0.4


^ permalink raw reply related

* RE: [REGRESSION] e1000e stopped working [MANUALLY BISECTED]
From: Maxim Levitsky @ 2010-07-17 13:54 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: Kirsher, Jeffrey T, netdev@vger.kernel.org, Allan, Bruce W,
	Pieper, Jeffrey E
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A1365FFE1B19CD@rrsmsx501.amr.corp.intel.com>

On Fri, 2010-07-16 at 17:23 -0600, Tantilov, Emil S wrote:
> Maxim Levitsky wrote:
> > On Thu, 2010-07-15 at 22:09 +0300, Maxim Levitsky wrote:
> >> On Thu, 2010-07-15 at 13:02 -0600, Tantilov, Emil S wrote:
> >>> Maxim Levitsky wrote:
> >>>> On Thu, 2010-07-15 at 02:33 +0300, Maxim Levitsky wrote:
> >>>>> On Wed, 2010-07-14 at 16:56 -0600, Tantilov, Emil S wrote:
> >>>>>> Maxim Levitsky wrote:
> >>>>>>> On Mon, 2010-07-12 at 15:23 -0600, Tantilov, Emil S wrote:
> >>>>>>>> Maxim Levitsky wrote:
> >>>>>>>>> On Mon, 2010-07-05 at 12:58 +0300, Maxim Levitsky wrote:
> >>>>>>>>>> On Mon, 2010-07-05 at 01:13 -0700, Jeff Kirsher wrote:
> >>>>>>>>>>> On Sun, Jul 4, 2010 at 15:48, Maxim Levitsky
> >>>>>>>>>>> <maximlevitsky@gmail.com> wrote:
> >>>>>>>>>>>> Did few guesses, and now I see that reverting the below
> >>>>>>>>>>>> commit fixes the problem. 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> "e1000e: Fix/cleanup PHY reset code for ICHx/PCHx"
> >>>>>>>>>>>> e98cac447cc1cc418dff1d610a5c79c4f2bdec7f.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>        Maxim Levitsky
> >>>>>>>>>>>> 
> >>>>>>>>>>>> --
> >>>>>>>>>>> 
> >>>>>>>>>>> Can you give us till Tuesday to respond?  I know that there
> >>>>>>>>>>> are some additional e1000e patches in my queue, which may
> >>>>>>>>>>> resolve the issue, but this weekend the power is down to do
> >>>>>>>>>>> some infrastructure upgrades which prevents us from doing
> >>>>>>>>>>> any investigation.debugging until Tuesday.
> >>>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> Sure.
> >>>>>>>>>> 
> >>>>>>>>>> Best regards,
> >>>>>>>>>> 	Maxim Levitsky
> >>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Updates?
> >>>>>>>> 
> >>>>>>>> We are working on reproducing the issue. So far we have not
> >>>>>>>> seen the problem when testing with net-next.
> >>>>>>>> 
> >>>>>>>> I asked in previous email about some additional info from
> >>>>>>>> ethtool (-d, -e, -S) and kernel config. That would help us to
> >>>>>>>> narrow it down. 
> >>>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> Emil
> >>>>>>> I did send -e and -d output.
> >>>>>> 
> >>>>>> Sorry, looks like I lost the email with the attachements.
> >>>>>> 
> >>>>>> Could you provide the output of dmesg after the failure occurs?
> >>>>>> 
> >>>>>>> Since you probably want -S output during failure, I need to
> >>>>>>> recompile kernel for that. I will do that soon.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> One question, in two weeks I hope 2.6.35 won't be released?
> >>>>>>> If so, I will have enough free time then to narrow down this
> >>>>>>> issue. 
> >>>>>>> 
> >>>>>>> Other solution, is to revert this commit.
> >>>>>>> (I have never seen this problem with it reverted).
> >>>>>> 
> >>>>>> We have been running reboot tests on 2 separate systems with
> >>>>>> recent net-next kernels using your config and so far no luck in
> >>>>>> reproducing this issue. 
> >>>>>> 
> >>>>>> What is the make model of your system (or MB)?
> >>>>> 
> >>>>> the motherboard is Intel DG965RY.
> >>>>> 
> >>>>> However, I am using vanilla kernel.
> >>>>> net-next might contain further fixes.
> >>>>> 
> >>>>> I see if net-next works here.
> >>>> 
> >>>> Yep, net-next works here.
> >>>> 
> >>>> 
> >>>> I have the problem on vanilla kernel.
> >>>> Last revision of it, I tested is 2.6.35-rc4 exactly
> >>>> (815c4163b6c8ebf8152f42b0a5fd015cfdcedc78)
> >>>> 
> >>>> 
> >>>> Maybe vanilla git master works, I test it too soon.
> >>> 
> >>> Thanks for the information! Good to know that this issue does not
> >>> exist in the latest branch. 
> >>> 
> >>> Have you by any chance tested a stable branch (2.6.34.x)?
> >> 
> >> I only did test plain 2.6.34 (v2.6.34)
> > And forgot to add, that it did work.
> > 
> >> 
> >> Also I repeat that revert of e98cac447cc1cc418dff1d610a5c79c4f2bdec7f
> >> (e1000e: Fix/cleanup PHY reset code for ICHx/PCHx) fixes the bug on
> >> vanilla kernel. 
> >> 
> >> Also I just pulled latest vanilla git, and I according to diffstat I
> >> see no changes in e1000e, so its likely that bug remains there.
> >> I will test that soon.
> > Tested, broken as expected.
> 
> That makes sense. Unfortunately we are still not able to reproduce even on recent pull from Linus tree.
> 
> If you want - you can look at the patches for e1000e in net-next and start applying those to your tree until the issue is resolved.
> 
That exactly what I will do soon.


Also I can narrow down the problem by reverting the commit partially.

After one week, I will have enough free time to do all the thing like
above. Now I have none.


> I will keep trying it here, but none of the systems we have exhibit the issue you described, so the bug could be exposed by something in your system/config.
I also think so. Otherwise, we would see more bug-reports.

You probably don't need to try anymore and reproduce that issue, because
of that.


Best regards,
	Maxim Levitsky


^ permalink raw reply

* [PATCH] Remove MAX_SOCK_ADDR constant
From: Tetsuo Handa @ 2010-07-17 12:38 UTC (permalink / raw)
  To: yoshfuji, davem; +Cc: netdev

>From b976a4d6c4d2a76e3926193eba366781adcb533c Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 17 Jul 2010 21:29:53 +0900
Subject: [PATCH] Remove MAX_SOCK_ADDR constant

MAX_SOCK_ADDR is no longer used because commit 230b1839 "net: Use standard
structures for generic socket address structures." replaced
"char address[MAX_SOCK_ADDR];" with "struct sockaddr_storage address;".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/socket.c |    9 ---------
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 367d547..2336ac5 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -170,15 +170,6 @@ static DEFINE_PER_CPU(int, sockets_in_use) = 0;
  * divide and look after the messy bits.
  */
 
-#define MAX_SOCK_ADDR	128		/* 108 for Unix domain -
-					   16 for IP, 16 for IPX,
-					   24 for IPv6,
-					   about 80 for AX.25
-					   must be at least one bigger than
-					   the AF_UNIX size (see net/unix/af_unix.c
-					   :unix_mkname()).
-					 */
-
 /**
  *	move_addr_to_kernel	-	copy a socket address into kernel space
  *	@uaddr: Address in user space
-- 
1.6.1

^ permalink raw reply related

* Re: [PATCH] rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()
From: Ivo Van Doorn @ 2010-07-17 10:08 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: users, John W. Linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1279299010-4723-1-git-send-email-bebarino@gmail.com>

On Fri, Jul 16, 2010 at 6:50 PM, Stephen Boyd <bebarino@gmail.com> wrote:
> The rt2x00dev->intf_work workqueue is never initialized when a driver is
> probed for a non-existent device (in this case rt2500usb). On such a
> path we call rt2x00lib_remove_dev() to free any resources initialized
> during the probe before we use INIT_WORK to initialize the workqueue.
> This causes lockdep to get confused since the lock used in the workqueue
> hasn't been initialized yet but is now being acquired during
> cancel_work_sync() called by rt2x00lib_remove_dev().
>
> Fix this by initializing the workqueue first before we attempt to probe
> the device. This should make lockdep happy and avoid breaking any
> assumptions about how the library cleans up after a probe fails.
>
> phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
> INFO: trying to register non-static key.
> the code is fine but needs lockdep annotation.
> turning off the locking correctness validator.
> Pid: 2027, comm: modprobe Not tainted 2.6.35-rc5+ #60
> Call Trace:
>  [<ffffffff8105fe59>] register_lock_class+0x152/0x31f
>  [<ffffffff81344a00>] ? usb_control_msg+0xd5/0x111
>  [<ffffffff81061bde>] __lock_acquire+0xce/0xcf4
>  [<ffffffff8105f6fd>] ? trace_hardirqs_off+0xd/0xf
>  [<ffffffff81492aef>] ?  _raw_spin_unlock_irqrestore+0x33/0x41
>  [<ffffffff810628d5>] lock_acquire+0xd1/0xf7
>  [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
>  [<ffffffff8104f06e>] __cancel_work_timer+0xd0/0x17e
>  [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
>  [<ffffffff8104f136>] cancel_work_sync+0xb/0xd
>  [<ffffffffa0096675>] rt2x00lib_remove_dev+0x25/0xb0 [rt2x00lib]
>  [<ffffffffa0096bf7>] rt2x00lib_probe_dev+0x380/0x3ed [rt2x00lib]
>  [<ffffffff811d78a7>] ? __raw_spin_lock_init+0x31/0x52
>  [<ffffffffa00bbd2c>] ? T.676+0xe/0x10 [rt2x00usb]
>  [<ffffffffa00bbe4f>] rt2x00usb_probe+0x121/0x15e [rt2x00usb]
>  [<ffffffff813468bd>] usb_probe_interface+0x151/0x19e
>  [<ffffffff812ea08e>] driver_probe_device+0xa7/0x136
>  [<ffffffff812ea167>] __driver_attach+0x4a/0x66
>  [<ffffffff812ea11d>] ? __driver_attach+0x0/0x66
>  [<ffffffff812e96ca>] bus_for_each_dev+0x54/0x89
>  [<ffffffff812e9efd>] driver_attach+0x19/0x1b
>  [<ffffffff812e9b64>] bus_add_driver+0xb4/0x204
>  [<ffffffff812ea41b>] driver_register+0x98/0x109
>  [<ffffffff813465dd>] usb_register_driver+0xb2/0x173
>  [<ffffffffa00ca000>] ? rt2500usb_init+0x0/0x20 [rt2500usb]
>  [<ffffffffa00ca01e>] rt2500usb_init+0x1e/0x20 [rt2500usb]
>  [<ffffffff81000203>] do_one_initcall+0x6d/0x17a
>  [<ffffffff8106cae8>] sys_init_module+0x9c/0x1e0
>  [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
>
> Signed-off-by: Stephen Boyd <bebarino@gmail.com>

Acked-by: Ivo van Doorn <IvDoorn@gmail.com>

^ permalink raw reply

* [PATCH NEXT 1/1] qlcnic: fix pci resource leak
From: amit.salecha @ 2010-07-17  7:39 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Amit Kumar Salecha

From: Amit Kumar Salecha <amit.salecha@qlogic.com>

pci_get_domain_bus_and_slot: caller must decrement the
reference count by calling pci_dev_put().

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_main.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 8d2d62f..f1f7acf 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2695,9 +2695,14 @@ static int qlcnic_is_first_func(struct pci_dev *pdev)
 		oth_pdev = pci_get_domain_bus_and_slot(pci_domain_nr
 			(pdev->bus), pdev->bus->number,
 			PCI_DEVFN(PCI_SLOT(pdev->devfn), val));
+		if (!oth_pdev)
+			continue;
 
-		if (oth_pdev && (oth_pdev->current_state != PCI_D3cold))
+		if (oth_pdev->current_state != PCI_D3cold) {
+			pci_dev_put(oth_pdev);
 			return 0;
+		}
+		pci_dev_put(oth_pdev);
 	}
 	return 1;
 }
-- 
1.6.0.2


^ permalink raw reply related

* netfilter/iptables stopped logging 2.6.35-rc
From: auto401300 @ 2010-07-17  7:20 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Hi!

Has something broken with netfilter/iptables logging in 2.6.35-rc,
or is there something new I should set in .config since .34?


I just verified that if I boot .34 and ping the pc it does logging:

Jul 17 09:42:49 xxxxx kernel: Linux version 2.6.34-ab (root@xxxxx) 
(gcc version 4.4.4 (Debian 4.4.4-1) ) #1 SMP PREEMPT Mon May 17 
09:15
:15 EEST 2010
....
Jul 17 09:44:52 xxxxx kernel: DENY  in: IN=eth0 OUT= MAC=xxxxx 
SRC=xxxxx DST=xxxxx LEN=60 TOS=0x00 PREC=0x00 TTL=127 ID=38945 
PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=256


but if I boot .35-rc4 and ping:

Jul 17 09:48:08 xxxxx kernel: Linux version 2.6.35-rc4-aa 
(root@xxxxx) (gcc version 4.4.4 (Debian 4.4.4-6) ) #1 SMP PREEMPT 
Mon Jul 5 15:22:02 EEST 2010
....
nothing from iptables in log


userspace is same, only booted different kernel versions


thanks.


^ permalink raw reply

* £1,000,000.00
From: Tobacco Promo @ 2010-07-17  7:11 UTC (permalink / raw)



Your Mail Id As Been Awarded £1,000,000.00 In The British Tobacco
On-line Promo:For Instant Claims Send Your Detials:
Name..
Sex..
Occupation..
location..

^ permalink raw reply

* Re: Raise initial congestion window size / speedup slow start?
From: H.K. Jerry Chu @ 2010-07-17  1:23 UTC (permalink / raw)
  To: Ed W; +Cc: Patrick McManus, David Miller, davidsen, linux-kernel, netdev
In-Reply-To: <4C4099D6.6020305@wildgooses.com>

On Fri, Jul 16, 2010 at 10:41 AM, Ed W <lists@wildgooses.com> wrote:
>
>> and while I'm asking for info, can you expand on the conclusion
>> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
>> only read the slides.. maybe the paper has more info?)
>>
>
> My guess is that this result is specific to google and their servers?
>
> I guess we can probably stereotype the world into two pools of devices:
>
> 1) Devices in a pool of fast networking, but connected to the rest of the
> world through a relatively slow router
> 2) Devices connected via a high speed network and largely the bottleneck
> device is many hops down the line and well away from us
>
> I'm thinking here 1) client users behind broadband routers, wireless, 3G,
> dialup, etc and 2) public servers that have obviously been deliberately
> placed in locations with high levels of interconnectivity.
>
> I think history information could be more useful for clients in category 1)
> because there is a much higher probability that their most restrictive
> device is one hop away and hence affects all connections and relatively
> occasionally the bottleneck is multiple hops away.  For devices in category
> 2) it's much harder because the restriction will usually be lots of hops
> away and effectively you are trying to figure out and cache the speed of
> every ADSL router out there...  For sure you can probably figure out how to
> cluster this stuff and say that pool there is 56K dialup, that pool there is
> "broadband", that pool is cell phone, etc, but probably it's hard to do
> better than that?
>
> So my guess is this is why google have had poor results investigating cwnd
> caching?

Actually we have investigated two type of caches, a short-history limited size
internal cache that is subject to some LRU replacement policy hence
much limiting
the cache hit rate, and a long-history external cache, which provides much more
accurate cwnd history per subnet but with high complexity and
deployment headache.

Also we have set out for a much more ambitious goal, to not just speed
up our own
services, but also provide a solution that could benefit the whole web
(see http://code.google.com/speed/index.html). The latter pretty much
precludes a complex
external cache scheme mentioned above.

Jerry

>
> However, I would suggest that whilst it's of little value for the server
> side, it still remains a very interesting idea for the client side and the
> cache hit ratio would seem to be dramatically higher here?
>
>
> I haven't studied the code, but given there is a userspace ability to change
> init cwnd through the IP utility, it would seem likely that relatively
> little coding would now be required to implement some kind of limited cwnd
> caching and experiment with whether this is a valuable addition?  I would
> have thought if you are only fiddling with devices behind a broadband router
> then there is little chance of you "crashing the internet" with these kind
> of experiments?
>
> Good luck
>
> Ed W
>

^ permalink raw reply

* [PATCH] LSM: Add post recvmsg() hook.
From: Tetsuo Handa @ 2010-07-17  1:17 UTC (permalink / raw)
  To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore
  Cc: netdev, linux-security-module
In-Reply-To: <20100716.123558.71592004.davem@davemloft.net>

David Miller wrote:
> From: Tetsuo Handa
> Date: Sat, 17 Jul 2010 01:14:38 +0900
> 
> > Below is a patch for post recvmsg() operation. I modified the patch to call
> > skb_recv_datagram() again (for udp_recvmsg(), raw_recvmsg(), udpv6_recvmsg())
> > if LSM dicided to drop the message. (Regarding rawv6_recvmsg(), I didn't do so
> > in accordance with the comment at "csum_copy_err:".)
> > What do you think about this verion?
> 
> This looks fine, but regardless of that comment I think the IPV6 raw recvmsg()
> should loop just as the IPV4 one does in your patch.
> 
Thank you, David.
I updated to call skb_recv_datagram() for rawv6_recvmsg() case too.

NETWORKING [IPv4/IPv6] maintainers and Paul, is below patch fine for you?

Regards.
----------------------------------------
>From b43154a90bc7494ec1ee301e692d2bbf29c8f2f8 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 17 Jul 2010 09:52:38 +0900
Subject: [PATCH] LSM: Add post recvmsg() hook.

Current pre recvmsg hook (i.e. security_socket_recvmsg()) has two problems.

One is that it will cause eating 100% of CPU time if the caller does not
close() the socket when recvmsg() failed due to security_socket_recvmsg(), for
subsequent select() notifies the caller of readiness for recvmsg() since the
datagram which would have been already picked up if security_socket_recvmsg()
did not return error is remaining in the queue.

The other is that it is racy if LSM module wants to do filtering based on
"which process can pick up datagrams from which source" because the process
which picks up the datagram is not known until skb_recv_datagram() and lock
is not held between security_socket_recvmsg() and skb_recv_datagram().

This patch introduces post recvmsg hook (i.e. security_socket_post_recvmsg())
in order to solve above problems at the cost of ability to pick up the datagram
which would have been picked up if preceding security_socket_post_recvmsg() did
not return error.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/linux/security.h |   14 ++++++++++++++
 net/ipv4/raw.c           |   12 +++++++++---
 net/ipv4/udp.c           |    9 ++++++++-
 net/ipv6/raw.c           |   12 +++++++++---
 net/ipv6/udp.c           |    9 ++++++++-
 security/capability.c    |    6 ++++++
 security/security.c      |    6 ++++++
 7 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 723a93d..409c44d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -879,6 +879,12 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@size contains the size of message structure.
  *	@flags contains the operational flags.
  *	Return 0 if permission is granted.
+ * @socket_post_recvmsg:
+ *	Check permission after receiving a message from a socket.
+ *	The message is discarded if permission is not granted.
+ *	@sk contains the sock structure.
+ *	@skb contains the sk_buff structure.
+ *	Return 0 if permission is granted.
  * @socket_getsockname:
  *	Check permission before the local address (name) of the socket object
  *	@sock is retrieved.
@@ -1575,6 +1581,7 @@ struct security_operations {
 			       struct msghdr *msg, int size);
 	int (*socket_recvmsg) (struct socket *sock,
 			       struct msghdr *msg, int size, int flags);
+	int (*socket_post_recvmsg) (struct sock *sk, struct sk_buff *skb);
 	int (*socket_getsockname) (struct socket *sock);
 	int (*socket_getpeername) (struct socket *sock);
 	int (*socket_getsockopt) (struct socket *sock, int level, int optname);
@@ -2526,6 +2533,7 @@ int security_socket_accept(struct socket *sock, struct socket *newsock);
 int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size);
 int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 			    int size, int flags);
+int security_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb);
 int security_socket_getsockname(struct socket *sock);
 int security_socket_getpeername(struct socket *sock);
 int security_socket_getsockopt(struct socket *sock, int level, int optname);
@@ -2617,6 +2625,12 @@ static inline int security_socket_recvmsg(struct socket *sock,
 	return 0;
 }
 
+static inline int security_socket_post_recvmsg(struct sock *sk,
+					       struct sk_buff *skb)
+{
+	return 0;
+}
+
 static inline int security_socket_getsockname(struct socket *sock)
 {
 	return 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 2c7a163..69652d4 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -676,9 +676,15 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
-	if (!skb)
-		goto out;
+	for (;;) {
+		skb = skb_recv_datagram(sk, flags, noblock, &err);
+		if (!skb)
+			goto out;
+		err = security_socket_post_recvmsg(sk, skb);
+		if (likely(!err))
+			break;
+		skb_kill_datagram(sk, skb, flags);
+	}
 
 	copied = skb->len;
 	if (len < copied) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5858574..9145685 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1125,6 +1125,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	bool slow;
+	bool update_stat;
 
 	/*
 	 *	Check any passed addresses
@@ -1140,6 +1141,12 @@ try_again:
 				  &peeked, &err);
 	if (!skb)
 		goto out;
+	err = security_socket_post_recvmsg(sk, skb);
+	if (err) {
+		update_stat = false;
+		goto csum_copy_err;
+	}
+	update_stat = true;
 
 	ulen = skb->len - sizeof(struct udphdr);
 	if (len > ulen)
@@ -1200,7 +1207,7 @@ out:
 
 csum_copy_err:
 	slow = lock_sock_fast(sk);
-	if (!skb_kill_datagram(sk, skb, flags))
+	if (!skb_kill_datagram(sk, skb, flags) && update_stat)
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
 	unlock_sock_fast(sk, slow);
 
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4a4dcbe..6915b01 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -464,9 +464,15 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (np->rxpmtu && np->rxopt.bits.rxpmtu)
 		return ipv6_recv_rxpmtu(sk, msg, len);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
-	if (!skb)
-		goto out;
+	for (;;) {
+		skb = skb_recv_datagram(sk, flags, noblock, &err);
+		if (!skb)
+			goto out;
+		err = security_socket_post_recvmsg(sk, skb);
+		if (likely(!err))
+			break;
+		skb_kill_datagram(sk, skb, flags);
+	}
 
 	copied = skb->len;
 	if (copied > len) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 87be586..6cae276 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -329,6 +329,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	int is_udplite = IS_UDPLITE(sk);
 	int is_udp4;
 	bool slow;
+	bool update_stat;
 
 	if (addr_len)
 		*addr_len=sizeof(struct sockaddr_in6);
@@ -344,6 +345,12 @@ try_again:
 				  &peeked, &err);
 	if (!skb)
 		goto out;
+	err = security_socket_post_recvmsg(sk, skb);
+	if (err) {
+		update_stat = false;
+		goto csum_copy_err;
+	}
+	update_stat = true;
 
 	ulen = skb->len - sizeof(struct udphdr);
 	if (len > ulen)
@@ -426,7 +433,7 @@ out:
 
 csum_copy_err:
 	slow = lock_sock_fast(sk);
-	if (!skb_kill_datagram(sk, skb, flags)) {
+	if (!skb_kill_datagram(sk, skb, flags) && update_stat) {
 		if (is_udp4)
 			UDP_INC_STATS_USER(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
diff --git a/security/capability.c b/security/capability.c
index 4aeb699..709aea3 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -597,6 +597,11 @@ static int cap_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 	return 0;
 }
 
+static int cap_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb)
+{
+	return 0;
+}
+
 static int cap_socket_getsockname(struct socket *sock)
 {
 	return 0;
@@ -1001,6 +1006,7 @@ void __init security_fixup_ops(struct security_operations *ops)
 	set_to_cap_if_null(ops, socket_accept);
 	set_to_cap_if_null(ops, socket_sendmsg);
 	set_to_cap_if_null(ops, socket_recvmsg);
+	set_to_cap_if_null(ops, socket_post_recvmsg);
 	set_to_cap_if_null(ops, socket_getsockname);
 	set_to_cap_if_null(ops, socket_getpeername);
 	set_to_cap_if_null(ops, socket_setsockopt);
diff --git a/security/security.c b/security/security.c
index e8c87b8..4291bd7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1037,6 +1037,12 @@ int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 	return security_ops->socket_recvmsg(sock, msg, size, flags);
 }
 
+int security_socket_post_recvmsg(struct sock *sk, struct sk_buff *skb)
+{
+	return security_ops->socket_post_recvmsg(sk, skb);
+}
+EXPORT_SYMBOL(security_socket_post_recvmsg);
+
 int security_socket_getsockname(struct socket *sock)
 {
 	return security_ops->socket_getsockname(sock);
-- 
1.6.1

^ permalink raw reply related

* Re: Raise initial congestion window size / speedup slow start?
From: H.K. Jerry Chu @ 2010-07-17  0:36 UTC (permalink / raw)
  To: Patrick McManus; +Cc: David Miller, davidsen, lists, linux-kernel, netdev
In-Reply-To: <1279299709.2156.5814.camel@tng>

On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus@ducksong.com> wrote:
> On Wed, 2010-07-14 at 21:51 -0700, H.K. Jerry Chu wrote:
>>  except there are indeed bugs in the code today in that the
>> code in various places assumes initcwnd as per RFC3390. So when
>> initcwnd is raised, that actual value may be limited unnecessarily by
>> the initial wmem/sk_sndbuf.
>
> Thanks for the discussion!
>
> can you tell us more about the impl concerns of initcwnd stored on the
> route?

We have found two issues when altering initcwnd through the ip route cmd:
1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
defaulted to a small value of 16KB). This problem has been made obscured
by the TSO code, which fudges the flow control limit (and could be a bug by
itself).

2. the congestion backoff code is supposed to take inflight, rather than cwnd,
but initcwnd presents a special case. I don't fully understand the code yet to
propose a fix.

>
> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)

This is partly due to our load balancer policy resulting in poor cache hit,
partly due to the sheer volumes of remote clients. Some of colleagues
tried to change the host cache to a /24 subnet cache but the result wasn't
that good either (sorry I don't remember all the details.)

>
> article and slides much appreciated and very interetsing. I've long been
> of the opinion that the downsides of being too aggressive once in a
> while aren't all that serious anymore.. as someone else said in a
> non-reservation world you are always trying to predict the future anyhow
> and therefore overflowing a queue is always possible no matter how
> conservative.

Please voice your support to TCPM then :)

Jerry

>
>
>
>
>

^ permalink raw reply

* RE: [REGRESSION] e1000e stopped working [MANUALLY BISECTED]
From: Tantilov, Emil S @ 2010-07-16 23:23 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kirsher, Jeffrey T, netdev@vger.kernel.org, Allan, Bruce W,
	Pieper, Jeffrey E
In-Reply-To: <1279308358.3979.0.camel@localhost.localdomain>

Maxim Levitsky wrote:
> On Thu, 2010-07-15 at 22:09 +0300, Maxim Levitsky wrote:
>> On Thu, 2010-07-15 at 13:02 -0600, Tantilov, Emil S wrote:
>>> Maxim Levitsky wrote:
>>>> On Thu, 2010-07-15 at 02:33 +0300, Maxim Levitsky wrote:
>>>>> On Wed, 2010-07-14 at 16:56 -0600, Tantilov, Emil S wrote:
>>>>>> Maxim Levitsky wrote:
>>>>>>> On Mon, 2010-07-12 at 15:23 -0600, Tantilov, Emil S wrote:
>>>>>>>> Maxim Levitsky wrote:
>>>>>>>>> On Mon, 2010-07-05 at 12:58 +0300, Maxim Levitsky wrote:
>>>>>>>>>> On Mon, 2010-07-05 at 01:13 -0700, Jeff Kirsher wrote:
>>>>>>>>>>> On Sun, Jul 4, 2010 at 15:48, Maxim Levitsky
>>>>>>>>>>> <maximlevitsky@gmail.com> wrote:
>>>>>>>>>>>> Did few guesses, and now I see that reverting the below
>>>>>>>>>>>> commit fixes the problem. 
>>>>>>>>>>>> 
>>>>>>>>>>>> "e1000e: Fix/cleanup PHY reset code for ICHx/PCHx"
>>>>>>>>>>>> e98cac447cc1cc418dff1d610a5c79c4f2bdec7f.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>        Maxim Levitsky
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> Can you give us till Tuesday to respond?  I know that there
>>>>>>>>>>> are some additional e1000e patches in my queue, which may
>>>>>>>>>>> resolve the issue, but this weekend the power is down to do
>>>>>>>>>>> some infrastructure upgrades which prevents us from doing
>>>>>>>>>>> any investigation.debugging until Tuesday.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Sure.
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> 	Maxim Levitsky
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Updates?
>>>>>>>> 
>>>>>>>> We are working on reproducing the issue. So far we have not
>>>>>>>> seen the problem when testing with net-next.
>>>>>>>> 
>>>>>>>> I asked in previous email about some additional info from
>>>>>>>> ethtool (-d, -e, -S) and kernel config. That would help us to
>>>>>>>> narrow it down. 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Emil
>>>>>>> I did send -e and -d output.
>>>>>> 
>>>>>> Sorry, looks like I lost the email with the attachements.
>>>>>> 
>>>>>> Could you provide the output of dmesg after the failure occurs?
>>>>>> 
>>>>>>> Since you probably want -S output during failure, I need to
>>>>>>> recompile kernel for that. I will do that soon.
>>>>>>> 
>>>>>>> 
>>>>>>> One question, in two weeks I hope 2.6.35 won't be released?
>>>>>>> If so, I will have enough free time then to narrow down this
>>>>>>> issue. 
>>>>>>> 
>>>>>>> Other solution, is to revert this commit.
>>>>>>> (I have never seen this problem with it reverted).
>>>>>> 
>>>>>> We have been running reboot tests on 2 separate systems with
>>>>>> recent net-next kernels using your config and so far no luck in
>>>>>> reproducing this issue. 
>>>>>> 
>>>>>> What is the make model of your system (or MB)?
>>>>> 
>>>>> the motherboard is Intel DG965RY.
>>>>> 
>>>>> However, I am using vanilla kernel.
>>>>> net-next might contain further fixes.
>>>>> 
>>>>> I see if net-next works here.
>>>> 
>>>> Yep, net-next works here.
>>>> 
>>>> 
>>>> I have the problem on vanilla kernel.
>>>> Last revision of it, I tested is 2.6.35-rc4 exactly
>>>> (815c4163b6c8ebf8152f42b0a5fd015cfdcedc78)
>>>> 
>>>> 
>>>> Maybe vanilla git master works, I test it too soon.
>>> 
>>> Thanks for the information! Good to know that this issue does not
>>> exist in the latest branch. 
>>> 
>>> Have you by any chance tested a stable branch (2.6.34.x)?
>> 
>> I only did test plain 2.6.34 (v2.6.34)
> And forgot to add, that it did work.
> 
>> 
>> Also I repeat that revert of e98cac447cc1cc418dff1d610a5c79c4f2bdec7f
>> (e1000e: Fix/cleanup PHY reset code for ICHx/PCHx) fixes the bug on
>> vanilla kernel. 
>> 
>> Also I just pulled latest vanilla git, and I according to diffstat I
>> see no changes in e1000e, so its likely that bug remains there.
>> I will test that soon.
> Tested, broken as expected.

That makes sense. Unfortunately we are still not able to reproduce even on recent pull from Linus tree.

If you want - you can look at the patches for e1000e in net-next and start applying those to your tree until the issue is resolved.

I will keep trying it here, but none of the systems we have exhibit the issue you described, so the bug could be exposed by something in your system/config.

Thanks,
Emil

^ permalink raw reply

* RE:  2.6.34.1: kernel warning at igb_main.c:2080
From: Wyborny, Carolyn @ 2010-07-16 22:45 UTC (permalink / raw)
  To: Ben Greear, NetDev
In-Reply-To: <4C3FA03E.9090802@candelatech.com>

 

>-----Original Message-----
>From: netdev-owner@vger.kernel.org 
>[mailto:netdev-owner@vger.kernel.org] On Behalf Of Ben Greear
>Sent: Thursday, July 15, 2010 4:57 PM
>To: NetDev
>Subject: igb: 2.6.34.1: kernel warning at igb_main.c:2080
>
>We just saw this kernel warning on 2.6.34.1 + a few patches 
>from the pending stable queue,
>plus our own hacks (though none to igb).
>
>We were running a modified version of pktgen traffic and at 
>the same time
>bounced the port.
>
>This warning didn't seem to cause any real problems.
>
>Please let us know if you would like any additional information.
>
>]# lspci|grep Ethern
>01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit 
>Network Connection
>02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit 
>Network Connection
>05:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit 
>Network Connection (rev 02)
>05:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit 
>Network Connection (rev 02)
>06:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit 
>Network Connection (rev 02)
>06:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit 
>Network Connection (rev 02)
>
>
>Jul 15 16:36:01 localhost kernel: ------------[ cut here ]------------
>Jul 15 16:36:01 localhost kernel: WARNING: at 
>/home/greearb/git/linux-2.6.dev.34.y/drivers/net/igb/igb_main.c
>:2080 igb_close+0x28/0x9f [igb]()
>Jul 15 16:36:01 localhost kernel: Hardware name: X8STi
>Jul 15 16:36:01 localhost kernel: Modules linked in: bridge 
>arc4 michael_mic wanlink(P) 8021q garp xt_CT iptable_raw 
>ipt_addrtype xt_DSCP xt_dscp xt_string 
>xt_owner xt_NFQUEUE xt_multiport xt_mark xt_iprange 
>xt_hashlimit xt_CONNMARK xt_connmark stp llc veth fuse macvlan 
>bpctl_mod pktgen iscsi_tcp libiscsi_tcp 
>libiscsi scsi_transport_iscsi nfs lockd fscache nfs_acl 
>auth_rpcgss sunrpc ipv6 dm_multipath uinput i2c_i801 iTCO_wdt 
>i2c_core ioatdma e1000e igb pcspkr 
>iTCO_vendor_support dca ata_generic pata_acpi [last unloaded: nf_nat]
>Jul 15 16:36:01 localhost kernel: Pid: 17516, comm: ip 
>Tainted: P           2.6.34.1 #2
>Jul 15 16:36:01 localhost kernel: Call Trace:
>Jul 15 16:36:01 localhost kernel: [<ffffffffa002a37f>] ? 
>igb_close+0x28/0x9f [igb]
>Jul 15 16:36:01 localhost kernel: [<ffffffff81041bb6>] 
>warn_slowpath_common+0x77/0x8f
>Jul 15 16:36:01 localhost kernel: [<ffffffff81041bdd>] 
>warn_slowpath_null+0xf/0x11
>Jul 15 16:36:01 localhost kernel: [<ffffffffa002a37f>] 
>igb_close+0x28/0x9f [igb]
>Jul 15 16:36:01 localhost kernel: [<ffffffff8133c626>] 
>__dev_close+0x73/0x86
>Jul 15 16:36:01 localhost kernel: [<ffffffff8133a719>] 
>__dev_change_flags+0xa8/0x12b
>Jul 15 16:36:01 localhost kernel: [<ffffffff8133c2aa>] 
>dev_change_flags+0x1c/0x51
>Jul 15 16:36:01 localhost kernel: [<ffffffff813466bc>] 
>do_setlink+0x273/0x482
>Jul 15 16:36:01 localhost kernel: [<ffffffff810ab1bb>] ? 
>zone_statistics+0x5e/0x63
>Jul 15 16:36:01 localhost kernel: [<ffffffff8134755e>] 
>rtnl_newlink+0x26c/0x422
>Jul 15 16:36:01 localhost kernel: [<ffffffff81345a06>] ? 
>nla_nest_start+0x1d/0x31
>Jul 15 16:36:01 localhost kernel: [<ffffffff810ca997>] ? 
>virt_to_head_page+0x9/0x2a
>Jul 15 16:36:01 localhost kernel: [<ffffffff813dd1f5>] ? 
>__mutex_lock_common+0x38e/0x3ac
>Jul 15 16:36:01 localhost kernel: [<ffffffff81346ffb>] 
>rtnetlink_rcv_msg+0x1d9/0x1f7
>Jul 15 16:36:01 localhost kernel: [<ffffffff81346e22>] ? 
>rtnetlink_rcv_msg+0x0/0x1f7
>Jul 15 16:36:01 localhost kernel: [<ffffffff81356939>] 
>netlink_rcv_skb+0x3e/0x8e
>Jul 15 16:36:01 localhost kernel: [<ffffffff81346cc9>] 
>rtnetlink_rcv+0x20/0x29
>Jul 15 16:36:01 localhost kernel: [<ffffffff813567b2>] 
>netlink_unicast+0xea/0x151
>Jul 15 16:36:01 localhost kernel: [<ffffffff8133545c>] ? 
>memcpy_fromiovec+0x42/0x73
>Jul 15 16:36:01 localhost kernel: [<ffffffff81357bc8>] 
>netlink_sendmsg+0x242/0x255
>Jul 15 16:36:01 localhost kernel: [<ffffffff8132a771>] ? 
>__sock_recvmsg_nosec+0x29/0x2b
>Jul 15 16:36:01 localhost kernel: [<ffffffff8132bd02>] 
>__sock_sendmsg+0x56/0x5f
>Jul 15 16:36:01 localhost kernel: [<ffffffff8132c127>] 
>sock_sendmsg+0xa3/0xbc
>Jul 15 16:36:01 localhost kernel: [<ffffffff813351e5>] ? 
>copy_from_user+0x28/0x30
>Jul 15 16:36:01 localhost kernel: [<ffffffff8133555e>] ? 
>verify_iovec+0x52/0x95
>Jul 15 16:36:01 localhost kernel: [<ffffffff8132c368>] 
>sys_sendmsg+0x1c6/0x22a
>Jul 15 16:36:01 localhost kernel: [<ffffffff810a277f>] ? 
>lru_cache_add_lru+0x38/0x3d
>Jul 15 16:36:01 localhost kernel: [<ffffffff813de23e>] ? 
>_raw_spin_unlock+0x2d/0x38
>Jul 15 16:36:01 localhost kernel: [<ffffffff810ad802>] ? 
>spin_unlock+0x9/0xb
>Jul 15 16:36:01 localhost kernel: [<ffffffff810b0138>] ? 
>handle_mm_fault+0x6d3/0x6f3
>Jul 15 16:36:01 localhost kernel: [<ffffffff810b3a0a>] ? 
>__vma_link_rb+0x2b/0x2d
>Jul 15 16:36:01 localhost kernel: [<ffffffff810b45f7>] ? 
>vma_link+0xcd/0xcf
>Jul 15 16:36:01 localhost kernel: [<ffffffff810d9369>] ? 
>fget_light+0x39/0x87
>Jul 15 16:36:01 localhost kernel: [<ffffffff81083b91>] ? 
>audit_syscall_entry+0xfe/0x12a
>Jul 15 16:36:01 localhost kernel: [<ffffffff81009ac2>] 
>system_call_fastpath+0x16/0x1b
>Jul 15 16:36:01 localhost kernel: ---[ end trace 75242fae6dbfdf6d ]---
>
>
>Thanks,
>Ben
>
>-- 
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Thanks, I'll take a look at it.

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation


^ permalink raw reply

* Re: [PATCH] Drivers: net: 8139cp: Improved conformance to the Linux coding style guidelines.
From: Jeff Garzik @ 2010-07-16 21:40 UTC (permalink / raw)
  To: Joseph Kogut; +Cc: davem, netdev, linux-kernel
In-Reply-To: <1279286391-31464-1-git-send-email-joseph.kogut@gmail.com>

On 07/16/2010 09:19 AM, Joseph Kogut wrote:
> Fixed several issues that made the 8139C+ driver nonconformant to the Linux coding style guidelines.
>
> Signed-off-by: Joseph Kogut<joseph.kogut@gmail.com>
> ---
>   drivers/net/8139cp.c |  304 +++++++++++++++++++++++++-------------------------
>   1 files changed, 153 insertions(+), 151 deletions(-)

This patch is still failing in places to recognize obvious intent of the 
code writer.



>   /*
>   	Copyright 2001-2004 Jeff Garzik<jgarzik@pobox.com>
>
> -	Copyright (C) 2001, 2002 David S. Miller (davem@redhat.com) [tg3.c]
> -	Copyright (C) 2000, 2001 David S. Miller (davem@redhat.com) [sungem.c]
> -	Copyright 2001 Manfred Spraul				    [natsemi.c]
> -	Copyright 1999-2001 by Donald Becker.			    [natsemi.c]
> -       	Written 1997-2001 by Donald Becker.			    [8139too.c]
> -	Copyright 1998-2001 by Jes Sorensen,<jes@trained-monkey.org>. [acenic.c]
> +	Copyright (C) 2001, 2002 David S. Miller (davem@redhat.com)	[tg3.c]
> +	Copyright (C) 2000, 2001 David S. Miller (davem@redhat.com)	[sungem.c]
> +	Copyright 2001 Manfred Spraul					[natsemi.c]
> +	Copyright 1999-2001 by Donald Becker.				[natsemi.c]
> +	Written 1997-2001 by Donald Becker.				[8139too.c]
> +	Copyright 1998-2001 by Jes Sorensen,<jes@trained-monkey.org>.	[acenic.c]

what is the point of this?  I would leave copyright messages as-is.


> @@ -1295,32 +1295,32 @@ static void mdio_write(struct net_device *dev, int phy_id, int location,
>   }
>
>   /* Set the ethtool Wake-on-LAN settings */
> -static int netdev_set_wol (struct cp_private *cp,
> +static int netdev_set_wol(struct cp_private *cp,
>   			   const struct ethtool_wolinfo *wol)
>   {
>   	u8 options;
>
> -	options = cpr8 (Config3)&  ~(LinkUp | MagicPacket);
> +	options = cpr8(Config3)&  ~(LinkUp | MagicPacket);
>   	/* If WOL is being disabled, no need for complexity */
>   	if (wol->wolopts) {
> -		if (wol->wolopts&  WAKE_PHY)	options |= LinkUp;
> -		if (wol->wolopts&  WAKE_MAGIC)	options |= MagicPacket;
> +		if (wol->wolopts&  WAKE_PHY) options |= LinkUp;
> +		if (wol->wolopts&  WAKE_MAGIC) options |= MagicPacket;
>   	}
>
> -	cpw8 (Cfg9346, Cfg9346_Unlock);
> -	cpw8 (Config3, options);
> -	cpw8 (Cfg9346, Cfg9346_Lock);
> +	cpw8(Cfg9346, Cfg9346_Unlock);
> +	cpw8(Config3, options);
> +	cpw8(Cfg9346, Cfg9346_Lock);
>
>   	options = 0; /* Paranoia setting */
> -	options = cpr8 (Config5)&  ~(UWF | MWF | BWF);
> +	options = cpr8(Config5)&  ~(UWF | MWF | BWF);
>   	/* If WOL is being disabled, no need for complexity */
>   	if (wol->wolopts) {
> -		if (wol->wolopts&  WAKE_UCAST)  options |= UWF;
> -		if (wol->wolopts&  WAKE_BCAST)	options |= BWF;
> -		if (wol->wolopts&  WAKE_MCAST)	options |= MWF;
> +		if (wol->wolopts&  WAKE_UCAST) options |= UWF;
> +		if (wol->wolopts&  WAKE_BCAST) options |= BWF;
> +		if (wol->wolopts&  WAKE_MCAST) options |= MWF;
>   	}

you just un-aligned things that were nicely tab-aligned


> @@ -1328,35 +1328,36 @@ static int netdev_set_wol (struct cp_private *cp,
>  }
>
>  /* Get the ethtool Wake-on-LAN settings */
> -static void netdev_get_wol (struct cp_private *cp,
> -	             struct ethtool_wolinfo *wol)
> +static void netdev_get_wol(struct cp_private *cp,
> +			   struct ethtool_wolinfo *wol)
>  {
>  	u8 options;
>
>  	wol->wolopts   = 0; /* Start from scratch */
> -	wol->supported = WAKE_PHY   | WAKE_BCAST | WAKE_MAGIC |
> -		         WAKE_MCAST | WAKE_UCAST;
> +	wol->supported = WAKE_PHY   | WAKE_BCAST | WAKE_MAGIC | WAKE_MCAST | WAKE_UCAST;
> +
>  	/* We don't need to go on if WOL is disabled */
> -	if (!cp->wol_enabled) return;
> +	if (!cp->wol_enabled)
> +		return;
>
> -	options        = cpr8 (Config3);
> -	if (options & LinkUp)        wol->wolopts |= WAKE_PHY;
> -	if (options & MagicPacket)   wol->wolopts |= WAKE_MAGIC;
> +	options        = cpr8(Config3);
> +	if (options & LinkUp) wol->wolopts |= WAKE_PHY;
> +	if (options & MagicPacket) wol->wolopts |= WAKE_MAGIC;
>
>  	options        = 0; /* Paranoia setting */
> -	options        = cpr8 (Config5);
> -	if (options & UWF)           wol->wolopts |= WAKE_UCAST;
> -	if (options & BWF)           wol->wolopts |= WAKE_BCAST;
> -	if (options & MWF)           wol->wolopts |= WAKE_MCAST;
> +	options        = cpr8(Config5);
> +	if (options & UWF) wol->wolopts |= WAKE_UCAST;
> +	if (options & BWF) wol->wolopts |= WAKE_BCAST;
> +	if (options & MWF) wol->wolopts |= WAKE_MCAST;
>  }
>
> -static void cp_get_drvinfo (struct net_device *dev, struct ethtool_drvinfo *info)

ditto


Also, it's disappointing to see so much diff noise created by
	s/func ()/func()/
but I don't suppose I'll win that battle.

	Jeff

^ permalink raw reply

* Re: [PATCH 2/2] bnx2: use device model DMA API
From: Michael Chan @ 2010-07-16 21:29 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: netdev@vger.kernel.org
In-Reply-To: <20100715142544.12504.283.send-patch@dhcp-lab-109.englab.brq.redhat.com>


On Thu, 2010-07-15 at 07:25 -0700, Stanislaw Gruszka wrote:
> Use DMA API as PCI equivalents will be deprecated. This change also allow
> to allocate with GFP_KERNEL in some places.
> 
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>

Acked-by: Michael Chan <mchan@broadcom.com>

> ---
>  drivers/net/bnx2.c |  111 +++++++++++++++++++++++++++-------------------------
>  1 files changed, 58 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 6de4cb7..98aed05 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -692,9 +692,9 @@ bnx2_free_tx_mem(struct bnx2 *bp)
>  		struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
>  
>  		if (txr->tx_desc_ring) {
> -			pci_free_consistent(bp->pdev, TXBD_RING_SIZE,
> -					    txr->tx_desc_ring,
> -					    txr->tx_desc_mapping);
> +			dma_free_coherent(&bp->pdev->dev, TXBD_RING_SIZE,
> +					  txr->tx_desc_ring,
> +					  txr->tx_desc_mapping);
>  			txr->tx_desc_ring = NULL;
>  		}
>  		kfree(txr->tx_buf_ring);
> @@ -714,9 +714,9 @@ bnx2_free_rx_mem(struct bnx2 *bp)
>  
>  		for (j = 0; j < bp->rx_max_ring; j++) {
>  			if (rxr->rx_desc_ring[j])
> -				pci_free_consistent(bp->pdev, RXBD_RING_SIZE,
> -						    rxr->rx_desc_ring[j],
> -						    rxr->rx_desc_mapping[j]);
> +				dma_free_coherent(&bp->pdev->dev, RXBD_RING_SIZE,
> +						  rxr->rx_desc_ring[j],
> +						  rxr->rx_desc_mapping[j]);
>  			rxr->rx_desc_ring[j] = NULL;
>  		}
>  		vfree(rxr->rx_buf_ring);
> @@ -724,9 +724,9 @@ bnx2_free_rx_mem(struct bnx2 *bp)
>  
>  		for (j = 0; j < bp->rx_max_pg_ring; j++) {
>  			if (rxr->rx_pg_desc_ring[j])
> -				pci_free_consistent(bp->pdev, RXBD_RING_SIZE,
> -						    rxr->rx_pg_desc_ring[j],
> -						    rxr->rx_pg_desc_mapping[j]);
> +				dma_free_coherent(&bp->pdev->dev, RXBD_RING_SIZE,
> +						  rxr->rx_pg_desc_ring[j],
> +						  rxr->rx_pg_desc_mapping[j]);
>  			rxr->rx_pg_desc_ring[j] = NULL;
>  		}
>  		vfree(rxr->rx_pg_ring);
> @@ -748,8 +748,8 @@ bnx2_alloc_tx_mem(struct bnx2 *bp)
>  			return -ENOMEM;
>  
>  		txr->tx_desc_ring =
> -			pci_alloc_consistent(bp->pdev, TXBD_RING_SIZE,
> -					     &txr->tx_desc_mapping);
> +			dma_alloc_coherent(&bp->pdev->dev, TXBD_RING_SIZE,
> +					   &txr->tx_desc_mapping, GFP_KERNEL);
>  		if (txr->tx_desc_ring == NULL)
>  			return -ENOMEM;
>  	}
> @@ -776,8 +776,10 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
>  
>  		for (j = 0; j < bp->rx_max_ring; j++) {
>  			rxr->rx_desc_ring[j] =
> -				pci_alloc_consistent(bp->pdev, RXBD_RING_SIZE,
> -						     &rxr->rx_desc_mapping[j]);
> +				dma_alloc_coherent(&bp->pdev->dev,
> +						   RXBD_RING_SIZE,
> +						   &rxr->rx_desc_mapping[j],
> +						   GFP_KERNEL);
>  			if (rxr->rx_desc_ring[j] == NULL)
>  				return -ENOMEM;
>  
> @@ -795,8 +797,10 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
>  
>  		for (j = 0; j < bp->rx_max_pg_ring; j++) {
>  			rxr->rx_pg_desc_ring[j] =
> -				pci_alloc_consistent(bp->pdev, RXBD_RING_SIZE,
> -						&rxr->rx_pg_desc_mapping[j]);
> +				dma_alloc_coherent(&bp->pdev->dev,
> +						   RXBD_RING_SIZE,
> +						   &rxr->rx_pg_desc_mapping[j],
> +						   GFP_KERNEL);
>  			if (rxr->rx_pg_desc_ring[j] == NULL)
>  				return -ENOMEM;
>  
> @@ -816,16 +820,16 @@ bnx2_free_mem(struct bnx2 *bp)
>  
>  	for (i = 0; i < bp->ctx_pages; i++) {
>  		if (bp->ctx_blk[i]) {
> -			pci_free_consistent(bp->pdev, BCM_PAGE_SIZE,
> -					    bp->ctx_blk[i],
> -					    bp->ctx_blk_mapping[i]);
> +			dma_free_coherent(&bp->pdev->dev, BCM_PAGE_SIZE,
> +					  bp->ctx_blk[i],
> +					  bp->ctx_blk_mapping[i]);
>  			bp->ctx_blk[i] = NULL;
>  		}
>  	}
>  	if (bnapi->status_blk.msi) {
> -		pci_free_consistent(bp->pdev, bp->status_stats_size,
> -				    bnapi->status_blk.msi,
> -				    bp->status_blk_mapping);
> +		dma_free_coherent(&bp->pdev->dev, bp->status_stats_size,
> +				  bnapi->status_blk.msi,
> +				  bp->status_blk_mapping);
>  		bnapi->status_blk.msi = NULL;
>  		bp->stats_blk = NULL;
>  	}
> @@ -846,8 +850,8 @@ bnx2_alloc_mem(struct bnx2 *bp)
>  	bp->status_stats_size = status_blk_size +
>  				sizeof(struct statistics_block);
>  
> -	status_blk = pci_alloc_consistent(bp->pdev, bp->status_stats_size,
> -					  &bp->status_blk_mapping);
> +	status_blk = dma_alloc_coherent(&bp->pdev->dev, bp->status_stats_size,
> +					&bp->status_blk_mapping, GFP_KERNEL);
>  	if (status_blk == NULL)
>  		goto alloc_mem_err;
>  
> @@ -885,9 +889,10 @@ bnx2_alloc_mem(struct bnx2 *bp)
>  		if (bp->ctx_pages == 0)
>  			bp->ctx_pages = 1;
>  		for (i = 0; i < bp->ctx_pages; i++) {
> -			bp->ctx_blk[i] = pci_alloc_consistent(bp->pdev,
> +			bp->ctx_blk[i] = dma_alloc_coherent(&bp->pdev->dev,
>  						BCM_PAGE_SIZE,
> -						&bp->ctx_blk_mapping[i]);
> +						&bp->ctx_blk_mapping[i],
> +						GFP_KERNEL);
>  			if (bp->ctx_blk[i] == NULL)
>  				goto alloc_mem_err;
>  		}
> @@ -2674,9 +2679,9 @@ bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gf
>  
>  	if (!page)
>  		return -ENOMEM;
> -	mapping = pci_map_page(bp->pdev, page, 0, PAGE_SIZE,
> +	mapping = dma_map_page(&bp->pdev->dev, page, 0, PAGE_SIZE,
>  			       PCI_DMA_FROMDEVICE);
> -	if (pci_dma_mapping_error(bp->pdev, mapping)) {
> +	if (dma_mapping_error(&bp->pdev->dev, mapping)) {
>  		__free_page(page);
>  		return -EIO;
>  	}
> @@ -2697,8 +2702,8 @@ bnx2_free_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
>  	if (!page)
>  		return;
>  
> -	pci_unmap_page(bp->pdev, dma_unmap_addr(rx_pg, mapping), PAGE_SIZE,
> -		       PCI_DMA_FROMDEVICE);
> +	dma_unmap_page(&bp->pdev->dev, dma_unmap_addr(rx_pg, mapping),
> +		       PAGE_SIZE, PCI_DMA_FROMDEVICE);
>  
>  	__free_page(page);
>  	rx_pg->page = NULL;
> @@ -2721,9 +2726,9 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp
>  	if (unlikely((align = (unsigned long) skb->data & (BNX2_RX_ALIGN - 1))))
>  		skb_reserve(skb, BNX2_RX_ALIGN - align);
>  
> -	mapping = pci_map_single(bp->pdev, skb->data, bp->rx_buf_use_size,
> -		PCI_DMA_FROMDEVICE);
> -	if (pci_dma_mapping_error(bp->pdev, mapping)) {
> +	mapping = dma_map_single(&bp->pdev->dev, skb->data, bp->rx_buf_use_size,
> +				 PCI_DMA_FROMDEVICE);
> +	if (dma_mapping_error(&bp->pdev->dev, mapping)) {
>  		dev_kfree_skb(skb);
>  		return -EIO;
>  	}
> @@ -2829,7 +2834,7 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
>  			}
>  		}
>  
> -		pci_unmap_single(bp->pdev, dma_unmap_addr(tx_buf, mapping),
> +		dma_unmap_single(&bp->pdev->dev, dma_unmap_addr(tx_buf, mapping),
>  			skb_headlen(skb), PCI_DMA_TODEVICE);
>  
>  		tx_buf->skb = NULL;
> @@ -2838,7 +2843,7 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
>  		for (i = 0; i < last; i++) {
>  			sw_cons = NEXT_TX_BD(sw_cons);
>  
> -			pci_unmap_page(bp->pdev,
> +			dma_unmap_page(&bp->pdev->dev,
>  				dma_unmap_addr(
>  					&txr->tx_buf_ring[TX_RING_IDX(sw_cons)],
>  					mapping),
> @@ -2945,7 +2950,7 @@ bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
>  	cons_rx_buf = &rxr->rx_buf_ring[cons];
>  	prod_rx_buf = &rxr->rx_buf_ring[prod];
>  
> -	pci_dma_sync_single_for_device(bp->pdev,
> +	dma_sync_single_for_device(&bp->pdev->dev,
>  		dma_unmap_addr(cons_rx_buf, mapping),
>  		BNX2_RX_OFFSET + BNX2_RX_COPY_THRESH, PCI_DMA_FROMDEVICE);
>  
> @@ -2987,7 +2992,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
>  	}
>  
>  	skb_reserve(skb, BNX2_RX_OFFSET);
> -	pci_unmap_single(bp->pdev, dma_addr, bp->rx_buf_use_size,
> +	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
>  			 PCI_DMA_FROMDEVICE);
>  
>  	if (hdr_len == 0) {
> @@ -3049,7 +3054,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
>  				return err;
>  			}
>  
> -			pci_unmap_page(bp->pdev, mapping_old,
> +			dma_unmap_page(&bp->pdev->dev, mapping_old,
>  				       PAGE_SIZE, PCI_DMA_FROMDEVICE);
>  
>  			frag_size -= frag_len;
> @@ -3120,7 +3125,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
>  
>  		dma_addr = dma_unmap_addr(rx_buf, mapping);
>  
> -		pci_dma_sync_single_for_cpu(bp->pdev, dma_addr,
> +		dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr,
>  			BNX2_RX_OFFSET + BNX2_RX_COPY_THRESH,
>  			PCI_DMA_FROMDEVICE);
>  
> @@ -5338,7 +5343,7 @@ bnx2_free_tx_skbs(struct bnx2 *bp)
>  				continue;
>  			}
>  
> -			pci_unmap_single(bp->pdev,
> +			dma_unmap_single(&bp->pdev->dev,
>  					 dma_unmap_addr(tx_buf, mapping),
>  					 skb_headlen(skb),
>  					 PCI_DMA_TODEVICE);
> @@ -5349,7 +5354,7 @@ bnx2_free_tx_skbs(struct bnx2 *bp)
>  			j++;
>  			for (k = 0; k < last; k++, j++) {
>  				tx_buf = &txr->tx_buf_ring[TX_RING_IDX(j)];
> -				pci_unmap_page(bp->pdev,
> +				dma_unmap_page(&bp->pdev->dev,
>  					dma_unmap_addr(tx_buf, mapping),
>  					skb_shinfo(skb)->frags[k].size,
>  					PCI_DMA_TODEVICE);
> @@ -5379,7 +5384,7 @@ bnx2_free_rx_skbs(struct bnx2 *bp)
>  			if (skb == NULL)
>  				continue;
>  
> -			pci_unmap_single(bp->pdev,
> +			dma_unmap_single(&bp->pdev->dev,
>  					 dma_unmap_addr(rx_buf, mapping),
>  					 bp->rx_buf_use_size,
>  					 PCI_DMA_FROMDEVICE);
> @@ -5732,9 +5737,9 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
>  	for (i = 14; i < pkt_size; i++)
>  		packet[i] = (unsigned char) (i & 0xff);
>  
> -	map = pci_map_single(bp->pdev, skb->data, pkt_size,
> -		PCI_DMA_TODEVICE);
> -	if (pci_dma_mapping_error(bp->pdev, map)) {
> +	map = dma_map_single(&bp->pdev->dev, skb->data, pkt_size,
> +			     PCI_DMA_TODEVICE);
> +	if (dma_mapping_error(&bp->pdev->dev, map)) {
>  		dev_kfree_skb(skb);
>  		return -EIO;
>  	}
> @@ -5772,7 +5777,7 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
>  
>  	udelay(5);
>  
> -	pci_unmap_single(bp->pdev, map, pkt_size, PCI_DMA_TODEVICE);
> +	dma_unmap_single(&bp->pdev->dev, map, pkt_size, PCI_DMA_TODEVICE);
>  	dev_kfree_skb(skb);
>  
>  	if (bnx2_get_hw_tx_cons(tx_napi) != txr->tx_prod)
> @@ -5789,7 +5794,7 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
>  	rx_hdr = rx_buf->desc;
>  	skb_reserve(rx_skb, BNX2_RX_OFFSET);
>  
> -	pci_dma_sync_single_for_cpu(bp->pdev,
> +	dma_sync_single_for_cpu(&bp->pdev->dev,
>  		dma_unmap_addr(rx_buf, mapping),
>  		bp->rx_buf_size, PCI_DMA_FROMDEVICE);
>  
> @@ -6457,8 +6462,8 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	} else
>  		mss = 0;
>  
> -	mapping = pci_map_single(bp->pdev, skb->data, len, PCI_DMA_TODEVICE);
> -	if (pci_dma_mapping_error(bp->pdev, mapping)) {
> +	mapping = dma_map_single(&bp->pdev->dev, skb->data, len, PCI_DMA_TODEVICE);
> +	if (dma_mapping_error(&bp->pdev->dev, mapping)) {
>  		dev_kfree_skb(skb);
>  		return NETDEV_TX_OK;
>  	}
> @@ -6486,9 +6491,9 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		txbd = &txr->tx_desc_ring[ring_prod];
>  
>  		len = frag->size;
> -		mapping = pci_map_page(bp->pdev, frag->page, frag->page_offset,
> -			len, PCI_DMA_TODEVICE);
> -		if (pci_dma_mapping_error(bp->pdev, mapping))
> +		mapping = dma_map_page(&bp->pdev->dev, frag->page, frag->page_offset,
> +				       len, PCI_DMA_TODEVICE);
> +		if (dma_mapping_error(&bp->pdev->dev, mapping))
>  			goto dma_error;
>  		dma_unmap_addr_set(&txr->tx_buf_ring[ring_prod], mapping,
>  				   mapping);
> @@ -6527,7 +6532,7 @@ dma_error:
>  	ring_prod = TX_RING_IDX(prod);
>  	tx_buf = &txr->tx_buf_ring[ring_prod];
>  	tx_buf->skb = NULL;
> -	pci_unmap_single(bp->pdev, dma_unmap_addr(tx_buf, mapping),
> +	dma_unmap_single(&bp->pdev->dev, dma_unmap_addr(tx_buf, mapping),
>  			 skb_headlen(skb), PCI_DMA_TODEVICE);
>  
>  	/* unmap remaining mapped pages */
> @@ -6535,7 +6540,7 @@ dma_error:
>  		prod = NEXT_TX_BD(prod);
>  		ring_prod = TX_RING_IDX(prod);
>  		tx_buf = &txr->tx_buf_ring[ring_prod];
> -		pci_unmap_page(bp->pdev, dma_unmap_addr(tx_buf, mapping),
> +		dma_unmap_page(&bp->pdev->dev, dma_unmap_addr(tx_buf, mapping),
>  			       skb_shinfo(skb)->frags[i].size,
>  			       PCI_DMA_TODEVICE);
>  	}



^ permalink raw reply

* Re: [PATCH v2 1/2] bnx2: allocate with GFP_KERNEL flag on RX path init
From: Michael Chan @ 2010-07-16 21:24 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: netdev@vger.kernel.org
In-Reply-To: <20100716105540.613d40cc@dhcp-lab-109.englab.brq.redhat.com>


On Fri, 2010-07-16 at 01:55 -0700, Stanislaw Gruszka wrote:
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>

Acked-by: Michael Chan <mchan@broadcom.com>

> ---
> v1->v2: use GFP_ATOMIC in bnx2_rx_skb
> 
>  drivers/net/bnx2.c |   17 +++++++++--------
>  1 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index a203f39..a7df539 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -2664,13 +2664,13 @@ bnx2_set_mac_addr(struct bnx2 *bp, u8 *mac_addr, u32 pos)
>  }
>  
>  static inline int
> -bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
> +bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
>  {
>  	dma_addr_t mapping;
>  	struct sw_pg *rx_pg = &rxr->rx_pg_ring[index];
>  	struct rx_bd *rxbd =
>  		&rxr->rx_pg_desc_ring[RX_RING(index)][RX_IDX(index)];
> -	struct page *page = alloc_page(GFP_ATOMIC);
> +	struct page *page = alloc_page(gfp);
>  
>  	if (!page)
>  		return -ENOMEM;
> @@ -2705,7 +2705,7 @@ bnx2_free_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
>  }
>  
>  static inline int
> -bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
> +bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
>  {
>  	struct sk_buff *skb;
>  	struct sw_bd *rx_buf = &rxr->rx_buf_ring[index];
> @@ -2713,7 +2713,7 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
>  	struct rx_bd *rxbd = &rxr->rx_desc_ring[RX_RING(index)][RX_IDX(index)];
>  	unsigned long align;
>  
> -	skb = netdev_alloc_skb(bp->dev, bp->rx_buf_size);
> +	skb = __netdev_alloc_skb(bp->dev, bp->rx_buf_size, gfp);
>  	if (skb == NULL) {
>  		return -ENOMEM;
>  	}
> @@ -2974,7 +2974,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
>  	int err;
>  	u16 prod = ring_idx & 0xffff;
>  
> -	err = bnx2_alloc_rx_skb(bp, rxr, prod);
> +	err = bnx2_alloc_rx_skb(bp, rxr, prod, GFP_ATOMIC);
>  	if (unlikely(err)) {
>  		bnx2_reuse_rx_skb(bp, rxr, skb, (u16) (ring_idx >> 16), prod);
>  		if (hdr_len) {
> @@ -3039,7 +3039,8 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
>  			rx_pg->page = NULL;
>  
>  			err = bnx2_alloc_rx_page(bp, rxr,
> -						 RX_PG_RING_IDX(pg_prod));
> +						 RX_PG_RING_IDX(pg_prod),
> +						 GFP_ATOMIC);
>  			if (unlikely(err)) {
>  				rxr->rx_pg_cons = pg_cons;
>  				rxr->rx_pg_prod = pg_prod;
> @@ -5179,7 +5180,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
>  
>  	ring_prod = prod = rxr->rx_pg_prod;
>  	for (i = 0; i < bp->rx_pg_ring_size; i++) {
> -		if (bnx2_alloc_rx_page(bp, rxr, ring_prod) < 0) {
> +		if (bnx2_alloc_rx_page(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
>  			netdev_warn(bp->dev, "init'ed rx page ring %d with %d/%d pages only\n",
>  				    ring_num, i, bp->rx_pg_ring_size);
>  			break;
> @@ -5191,7 +5192,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
>  
>  	ring_prod = prod = rxr->rx_prod;
>  	for (i = 0; i < bp->rx_ring_size; i++) {
> -		if (bnx2_alloc_rx_skb(bp, rxr, ring_prod) < 0) {
> +		if (bnx2_alloc_rx_skb(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
>  			netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d skbs only\n",
>  				    ring_num, i, bp->rx_ring_size);
>  			break;



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox