Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Ben Hutchings @ 2010-07-20 12:07 UTC (permalink / raw)
  To: Stefan Assmann
  Cc: netdev, Linux Kernel Mailing List, davem, Andy Gospodarek,
	Rose, Gregory V, Duyck, Alexander H, Casey Leedom, Harald Hoyer
In-Reply-To: <4C458CB7.3030508@redhat.com>

On Tue, 2010-07-20 at 13:47 +0200, Stefan Assmann wrote:
> On 20.07.2010 13:20, Ben Hutchings wrote:
> > On Tue, 2010-07-20 at 12:50 +0200, Stefan Assmann wrote:
> >> From: Stefan Assmann <sassmann@redhat.com>
> >>
> >> Reserve a bit in struct net_device to indicate whether an interface
> >> generates its MAC address randomly, and expose the information via
> >> sysfs.
> >> May look like this:
> >> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/net/eth0/ifrndmac
> >>
> >> By default the value of ifrndmac is 0. Any driver that generates the MAC
> >> address randomly should return a value to 1.
> > 
> > The name should incorporate 'address', not 'mac', for consistency with
> > the generic 'address' attribute.
> 
> We can call it "ifrndhwaddr" if that's more consistent.
> 
> > 
> > What about devices that 'steal' MAC addresses from slave devices?
> > Currently I believe udev has special cases for them but ideally these
> > drivers would indicate explicitly that their addresses are not stable
> > identifiers (even though they aren't random).
> 
> It's really up to the driver to decide whether it makes sense to set the
> flag or not. The question is what should udev do with these MAC address
> stealing devices apart from ignoring them? Sorry I have no higher
> insight into it.
> This flag has the purpose to allow udev to explicitly handle devices
> that generate their MAC address randomly and generate a persistent
> rule based on the device path instead of the MAC address.
> I'm open for suggestions but I'm not sure we can handle both cases with
> a single flag.

OK, then call it something like 'address_temporary'.

> JFYI this is an alternative approach to changing the kernel name of VFs
> to vfeth. The advantage of this way should be that we don't break any
> user-space applications that rely on network interfaces being names
> "ethX". Actually this goes in the direction of "fixing udev" which was
> what you asked for in your comment on my first patch concering vfeth. :)
> 
> > 
> > [...]
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index b626289..2ea0298 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -845,6 +845,7 @@ struct net_device {
> >>  #define NETIF_F_FCOE_MTU	(1 << 26) /* Supports max FCoE MTU, 2158 bytes*/
> >>  #define NETIF_F_NTUPLE		(1 << 27) /* N-tuple filters supported */
> >>  #define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
> >> +#define NETIF_F_RNDMAC		(1 << 29) /* Interface with random MAC address */
> > [...]
> > 
> > This is not really a feature, and we are running out of real feature
> > bits.  Can you find somewhere else to put this flag?
> 
> Actually Dave Miller suggested to put it there. What other place is
> there to put it?

If Dave said that then I'm sure it's OK.

However, if you define this as an interface flag (net_device::flags;
<linux/if.h>) and add it to the set of changeable flags in
__dev_change_flags(), user-space will be able to clear the flag if it
later sets a stable address.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Stefan Assmann @ 2010-07-20 12:17 UTC (permalink / raw)
  To: Alex Badea
  Cc: Ben Hutchings, netdev, Linux Kernel Mailing List, davem,
	Andy Gospodarek, Rose, Gregory V, Duyck, Alexander H,
	Casey Leedom, Harald Hoyer
In-Reply-To: <4C458F50.4070200@ixiacom.com>

On 20.07.2010 13:58, Alex Badea wrote:
> Hi,
> 
> On 07/20/2010 02:47 PM, Stefan Assmann wrote:
>>> What about devices that 'steal' MAC addresses from slave devices?
> 
> Might I suggest an attribute such as "address_type", which could report
> "permanent", "random", "stolen", or something else in the future?

In case there's demand for such a multi-purpose attribute I see no
reason to object. More thoughts on this?

Thanks for your suggestion Alex.

  Stefan
--
Stefan Assmann         | Red Hat GmbH
Software Engineer      | Otto-Hahn-Strasse 20, 85609 Dornach
                       | HR: Amtsgericht Muenchen HRB 153243
                       | GF: Brendan Lane, Charlie Peters,
sassmann at redhat.com |     Michael Cunningham, Charles Cachera

^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Stefan Assmann @ 2010-07-20 12:41 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, Linux Kernel Mailing List, davem, Andy Gospodarek,
	Rose, Gregory V, Duyck, Alexander H, Casey Leedom, Harald Hoyer
In-Reply-To: <1279627656.2110.13.camel@achroite.uk.solarflarecom.com>

On 20.07.2010 14:07, Ben Hutchings wrote:
> On Tue, 2010-07-20 at 13:47 +0200, Stefan Assmann wrote:
>> On 20.07.2010 13:20, Ben Hutchings wrote:
>>> On Tue, 2010-07-20 at 12:50 +0200, Stefan Assmann wrote:
>>>> From: Stefan Assmann <sassmann@redhat.com>

[snip]

>> Actually Dave Miller suggested to put it there. What other place is
>> there to put it?
> 
> If Dave said that then I'm sure it's OK.
> 
> However, if you define this as an interface flag (net_device::flags;
> <linux/if.h>) and add it to the set of changeable flags in
> __dev_change_flags(), user-space will be able to clear the flag if it
> later sets a stable address.

As I said I'm not that knowledgeable about this MAC address stealing
thing and I'm assuming that's what you're aiming at. Would you really
want/need it to be user-space writable? Currently all I can think of is
the scenario where you set a "stable" address that outlasts a reboot so
udev might be able to assign it a permanent name after it gets stable.

So it might make sense to have it writable, but I'd like to avoid to add
unnecessary complexity that may cause errors if it's not necessary.
Read-only is simple, just read the flag and deal with it.

Btw, the driver itself could also alter the flag. Then we'd have a well
defined way of setting a stable address.

  Stefan
--
Stefan Assmann         | Red Hat GmbH
Software Engineer      | Otto-Hahn-Strasse 20, 85609 Dornach
                       | HR: Amtsgericht Muenchen HRB 153243
                       | GF: Brendan Lane, Charlie Peters,
sassmann at redhat.com |     Michael Cunningham, Charles Cachera

> 
> Ben.
> 

^ permalink raw reply

* Re: [patch v2.2 1/4] [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)
From: Hannes Eder @ 2010-07-20 12:44 UTC (permalink / raw)
  To: Simon Horman
  Cc: Patrick McHardy, lvs-devel, netdev, linux-kernel, netfilter,
	Wensong Zhang, Julius Volz, David S. Miller,
	Netfilter Development Mailinglist
In-Reply-To: <20100622071326.GB9928@verge.net.au>

Hi Simon,

On Tue, Jun 22, 2010 at 09:13, Simon Horman <horms@verge.net.au> wrote:
> On Mon, May 03, 2010 at 01:29:46PM +0200, Hannes Eder wrote:
>> Thank you for picking this series of patches up again and thanks for
>> the feedback.
>>
>> I'll send an updated version in the next days.
>
> Hi Hanes,
>
> more than a few days seems to have passed.
> Do you have time to fix the patches up?
> If not, I'll take a stab at it.

/me working through the backlog of emails after vacation, however this
email was buried in my inbox before my vacation, my bad.  I've been
extremely busy lately and I did not have the time to work on the
patches.  I saw your updated versions, I appreciate very much that you
are taking it from there.

Cheers,
-Hannes

^ permalink raw reply

* Re: netfilter/iptables stopped logging 2.6.35-rc
From: Maciej Rutecki @ 2010-07-20 12:51 UTC (permalink / raw)
  To: auto401300; +Cc: netdev, linux-kernel
In-Reply-To: <20100717072036.1BBE52804B@smtp.hushmail.com>

On sobota, 17 lipca 2010 o 09:20:36 auto401300@hushmail.com wrote:
> Hi!
> 
> Has something broken with netfilter/iptables logging in 2.6.35-rc,
> or is there something new I should set in .config since .34?
> 
> 
> I just verified that if I boot .34 and ping the pc it does logging:
> 
> Jul 17 09:42:49 xxxxx kernel: Linux version 2.6.34-ab (root@xxxxx)
> (gcc version 4.4.4 (Debian 4.4.4-1) ) #1 SMP PREEMPT Mon May 17
> 09:15
> 
> :15 EEST 2010
> 
> ....
> Jul 17 09:44:52 xxxxx kernel: DENY  in: IN=eth0 OUT= MAC=xxxxx
> SRC=xxxxx DST=xxxxx LEN=60 TOS=0x00 PREC=0x00 TTL=127 ID=38945
> PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=256
> 
> 
> but if I boot .35-rc4 and ping:
> 
> Jul 17 09:48:08 xxxxx kernel: Linux version 2.6.35-rc4-aa
> (root@xxxxx) (gcc version 4.4.4 (Debian 4.4.4-6) ) #1 SMP PREEMPT
> Mon Jul 5 15:22:02 EEST 2010
> ....
> nothing from iptables in log
> 
> 
> userspace is same, only booted different kernel versions

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=16423
for your bug report, please add your address to the CC list in there, thanks!

-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply

* Re: Very low latency TCP for clusters
From: Brian Bloniarz @ 2010-07-20 12:57 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Eric Dumazet, netdev
In-Reply-To: <AANLkTillCVWMHHBImGEeRYy2MJYqtGIvSPvacLKJnpP2@mail.gmail.com>

Tom Herbert wrote:
> On Mon, Jul 19, 2010 at 3:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le lundi 19 juillet 2010 à 11:44 -0700, Tom Herbert a écrit :
>>
>>> I see about 7 usecs as best number on loopback, so I believe this is
>>> in the ballpark.  As I mentioned above, this about "best case" latency
>>> of a single thread, so we assume any amount of pinning or other
>>> customized configuration to that purpose.
>> Well, given I get 29 us on a ping between two machines (Gb link, no
>> process involved on receiver, only softirq), I really doubt we can reach
>> 5 us on a tcp test involving a user process on both side ;)
>>
> That's pretty pokey ;-) I see numbers around 25 usecs between to
> machines, this is with TCP_NBRR.  With TCP_RR it's more like 35 usecs,
> so eliminating the scheduler is already a big reduction.  That leaves
> 18 usecs in device time, interrupt processing, network, and cache
> misses; 7 usecs in TCP processing, user space.  While 5 usecs is an
> aggressive goal, I am not ready to concede that there's an
> architectural limit in either NICs, TCP, or sockets that can't be
> overcome.

Have you toyed with the NIC's interrupt coalescing yet?
I'm wondering if any part of the 25usecs is that.

^ permalink raw reply

* [PATCH net-next-2.6] netlink: netlink_recvmsg() fix
From: Eric Dumazet @ 2010-07-20 13:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Johannes Berg

Please note following potential bug was discovered by code review, and
my patch not even tested, please double check !

Thanks

[PATCH net-next-2.6] netlink: netlink_recvmsg() fix

commit 1dacc76d0014 
(net/compat/wext: send different messages to compat tasks)
introduced a race condition on netlink, in case MSG_PEEK is used.

An skb given by skb_recv_datagram() might be shared, we must clone it
before any modification, or risk fatal corruption.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/netlink/af_netlink.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7aeaa83..dad5e81 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1405,7 +1405,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	struct netlink_sock *nlk = nlk_sk(sk);
 	int noblock = flags&MSG_DONTWAIT;
 	size_t copied;
-	struct sk_buff *skb, *frag __maybe_unused = NULL;
+	struct sk_buff *skb;
 	int err;
 
 	if (flags&MSG_OOB)
@@ -1440,8 +1440,17 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 			kfree_skb(skb);
 			skb = compskb;
 		} else {
-			frag = skb_shinfo(skb)->frag_list;
-			skb_shinfo(skb)->frag_list = NULL;
+			struct sk_buff *nskb = skb_clone(skb, GFP_KERNEL);
+
+			if (!nskb) {
+				skb_free_datagram(sk, skb);
+				err = -ENOMEM;
+				goto out;
+			}
+			kfree_skb(skb);
+			kfree_skb(skb_shinfo(nskb)->frag_list);
+			skb_shinfo(nskb)->frag_list = NULL;
+			skb = nskb;
 		}
 	}
 #endif
@@ -1477,10 +1486,6 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	if (flags & MSG_TRUNC)
 		copied = skb->len;
 
-#ifdef CONFIG_COMPAT_NETLINK_MESSAGES
-	skb_shinfo(skb)->frag_list = frag;
-#endif
-
 	skb_free_datagram(sk, skb);
 
 	if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2)



^ permalink raw reply related

* Re: net/dsa
From: Lennert Buytenhek @ 2010-07-20 13:59 UTC (permalink / raw)
  To: Karl Beldan; +Cc: netdev
In-Reply-To: <4C3B8231.6020706@gmail.com>

On Mon, Jul 12, 2010 at 10:59:29PM +0200, Karl Beldan wrote:

> Hi,

Hi Karl,

Sorry, I didn't see your mail initially -- please CC me in the future.


> I found the dsa code very handy to help manage a switch.

Ah.  What particular part are you using?


> Yet I was surprised I had to tweak the code to simply use the phy
> layer state machine.

You mean that net/dsa uses phy_attach() but not phy_start_machine() ?
Have you seen problems arising from this?


> And I don't see much activity in the code nor any discussion, e.g no
> follow up to http://patchwork.ozlabs.org/patch/16578.
> 
> So I was wondering if there was anybody playing with this code, or
> having ideas about features to add (vlan/stp callbacks) ?

As far as I know, the code currently in the kernel works well for what
it intends to do (which is to just expose the switch ports), and I'm
not aware of any bugs in it.

That said, you're right in that there are several more features that
the hardware supports that the software could be extended to handle.

For one, I don't have access to any Marvell switch chip hardware
anymore, so that limits my ability to play with this.  Also, the
relevant documentation is under a rather restrictive license, so the
only way I can see net/dsa support for Marvell parts improving is if
there's pressure from a large enough customer to make this happen.

If this is about non-Marvell parts, I'd welcome adding support for
those into net/dsa.  For one, I would really like to see Broadcom
switch chip support added -- the documentation for those chips is
under similarly restrictive licensing, though.


thanks,
Lennert

^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Ben Hutchings @ 2010-07-20 14:29 UTC (permalink / raw)
  To: Stefan Assmann
  Cc: netdev, Linux Kernel Mailing List, davem, Andy Gospodarek,
	Rose, Gregory V, Duyck, Alexander H, Casey Leedom, Harald Hoyer
In-Reply-To: <4C45996B.3000003@redhat.com>

On Tue, 2010-07-20 at 14:41 +0200, Stefan Assmann wrote:
> On 20.07.2010 14:07, Ben Hutchings wrote:
> > On Tue, 2010-07-20 at 13:47 +0200, Stefan Assmann wrote:
> >> On 20.07.2010 13:20, Ben Hutchings wrote:
> >>> On Tue, 2010-07-20 at 12:50 +0200, Stefan Assmann wrote:
> >>>> From: Stefan Assmann <sassmann@redhat.com>
> 
> [snip]
> 
> >> Actually Dave Miller suggested to put it there. What other place is
> >> there to put it?
> > 
> > If Dave said that then I'm sure it's OK.
> > 
> > However, if you define this as an interface flag (net_device::flags;
> > <linux/if.h>) and add it to the set of changeable flags in
> > __dev_change_flags(), user-space will be able to clear the flag if it
> > later sets a stable address.
> 
> As I said I'm not that knowledgeable about this MAC address stealing
> thing and I'm assuming that's what you're aiming at. Would you really
> want/need it to be user-space writable? Currently all I can think of is
> the scenario where you set a "stable" address that outlasts a reboot so
> udev might be able to assign it a permanent name after it gets stable.
>
> So it might make sense to have it writable, but I'd like to avoid to add
> unnecessary complexity that may cause errors if it's not necessary.
> Read-only is simple, just read the flag and deal with it.

Once this flag has been added, it may be used by any tool and not just
udev, so it ought to indicate the status of the currently assigned
address.  That requires that it be writable.

> Btw, the driver itself could also alter the flag. Then we'd have a well
> defined way of setting a stable address.

The driver can't know whether an address assigned by the user is stable.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH] drop_monitor: Add error code to detect duplicate state changes
From: Neil Horman @ 2010-07-20 14:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, nhorman

Hey-
	Patch to add -EAGAIN error to dropwatch netlink message handling code.
-EAGAIN will be returned anytime userspace attempts to transition the state of
the drop monitor service to a state that its already in.  That allows user space
to detect this condition, so it doesn't wait for a success ACK that will never
arrive.  Tested successfully by me

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 drop_monitor.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)


diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index ad41529..646ef3b 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -223,6 +223,11 @@ static int set_all_monitor_traces(int state)
 
 	spin_lock(&trace_state_lock);
 
+	if (state == trace_state) {
+		rc = -EAGAIN;
+		goto out_unlock;
+	}
+
 	switch (state) {
 	case TRACE_ON:
 		rc |= register_trace_kfree_skb(trace_kfree_skb_hit, NULL);
@@ -251,11 +256,12 @@ static int set_all_monitor_traces(int state)
 
 	if (!rc)
 		trace_state = state;
+	else
+		rc = -EINPROGRESS;
 
+out_unlock:
 	spin_unlock(&trace_state_lock);
 
-	if (rc)
-		return -EINPROGRESS;
 	return rc;
 }
 

^ permalink raw reply related

* [PATCH] e1000e: Fix irq_synchronize in MSI-X case
From: Jean Delvare @ 2010-07-20 15:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Bruce Allan, Jesse Brandeburg

Synchronize all IRQs when in MSI-X IRQ mode.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
I sent this patch to the e1000-devel list on June 8th, 2010, but
didn't receive any answer:
http://sourceforge.net/mailarchive/forum.php?thread_name=201006081818.59098.jdelvare%40suse.de&forum_name=e1000-devel

I don't know how critical synchronize_irq() is, so I don't know if
this patch should go to stable branches or not.

 drivers/net/e1000e/netdev.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1939,7 +1939,13 @@ static void e1000_irq_disable(struct e10
 	if (adapter->msix_entries)
 		ew32(EIAC_82574, 0);
 	e1e_flush();
-	synchronize_irq(adapter->pdev->irq);
+
+	if (adapter->msix_entries) {
+		synchronize_irq(adapter->msix_entries[0].vector);
+		synchronize_irq(adapter->msix_entries[1].vector);
+		synchronize_irq(adapter->msix_entries[2].vector);
+	} else
+		synchronize_irq(adapter->pdev->irq);
 }
 
 /**

-- 
Jean Delvare
Suse L3

^ permalink raw reply

* Re: [PATCH net-next-2.6] netlink: netlink_recvmsg() fix
From: Eric Dumazet @ 2010-07-20 15:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Johannes Berg
In-Reply-To: <1279631789.2498.71.camel@edumazet-laptop>

Le mardi 20 juillet 2010 à 15:16 +0200, Eric Dumazet a écrit :
> Please note following potential bug was discovered by code review, and
> my patch not even tested, please double check !
> 
> Thanks
> 
> [PATCH net-next-2.6] netlink: netlink_recvmsg() fix
> 
> commit 1dacc76d0014 
> (net/compat/wext: send different messages to compat tasks)
> introduced a race condition on netlink, in case MSG_PEEK is used.
> 
> An skb given by skb_recv_datagram() might be shared, we must clone it
> before any modification, or risk fatal corruption.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---

Oh well, skb_copy() or skb_unshare() is needed.

[PATCH net-next-2.6 v2] netlink: netlink_recvmsg() fix

commit 1dacc76d0014 
(net/compat/wext: send different messages to compat tasks)
introduced a race condition on netlink, in case MSG_PEEK is used.

An skb given by skb_recv_datagram() might be shared, we must copy it
before any modification, or risk fatal corruption.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/netlink/af_netlink.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7aeaa83..1537fa5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1405,7 +1405,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	struct netlink_sock *nlk = nlk_sk(sk);
 	int noblock = flags&MSG_DONTWAIT;
 	size_t copied;
-	struct sk_buff *skb, *frag __maybe_unused = NULL;
+	struct sk_buff *skb;
 	int err;
 
 	if (flags&MSG_OOB)
@@ -1440,7 +1440,12 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 			kfree_skb(skb);
 			skb = compskb;
 		} else {
-			frag = skb_shinfo(skb)->frag_list;
+			skb = skb_unshare(skb, GFP_KERNEL);
+			if (!skb) {
+				err = -ENOMEM;
+				goto out;
+			}
+			kfree_skb(skb_shinfo(skb)->frag_list);
 			skb_shinfo(skb)->frag_list = NULL;
 		}
 	}
@@ -1477,10 +1482,6 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	if (flags & MSG_TRUNC)
 		copied = skb->len;
 
-#ifdef CONFIG_COMPAT_NETLINK_MESSAGES
-	skb_shinfo(skb)->frag_list = frag;
-#endif
-
 	skb_free_datagram(sk, skb);
 
 	if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2)



^ permalink raw reply related

* Re: [PATCH] phy: add suspend/resume in the ic+
From: Randy Dunlap @ 2010-07-20 15:24 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: netdev
In-Reply-To: <1279609954-30274-1-git-send-email-peppe.cavallaro@st.com>

On Tue, 20 Jul 2010 09:12:34 +0200 Giuseppe CAVALLARO wrote:

> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> ---
>  drivers/net/phy/icplus.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/phy/icplus.c b/drivers/net/phy/icplus.c
> index 439adaf..3f2583f 100644
> --- a/drivers/net/phy/icplus.c
> +++ b/drivers/net/phy/icplus.c
> @@ -116,6 +116,8 @@ static struct phy_driver ip175c_driver = {
>  	.config_init	= &ip175c_config_init,
>  	.config_aneg	= &ip175c_config_aneg,
>  	.read_status	= &ip175c_read_status,
> +	.suspend	= genphy_suspend,
> +	.resume		= genphy_resume,
>  	.driver		= { .owner = THIS_MODULE,},
>  };
>  
> -- 

I was wondering how that works when CONFIG_PM is disabled, but I did a build
with it disabled and see that genphy_suspend and genphy_resume are always
built -- even when PM is disabled.  Interesting.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* [PATCH] e1000e: Drop a useless statement
From: Jean Delvare @ 2010-07-20 15:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Auke Kok, Bruce Allan, Jesse Brandeburg

err is set again a few lines below.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
I sent this patch to the e1000-devel list on June 8th, 2010, but
didn't receive any answer:
http://sourceforge.net/mailarchive/forum.php?thread_name=201006081820.25381.jdelvare%40suse.de&forum_name=e1000-devel

 drivers/net/e1000e/netdev.c |    2 --
 1 file changed, 2 deletions(-)

--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -5557,8 +5557,6 @@ static int __devinit e1000_probe(struct
 	if (err)
 		goto err_sw_init;
 
-	err = -EIO;
-
 	memcpy(&hw->mac.ops, ei->mac_ops, sizeof(hw->mac.ops));
 	memcpy(&hw->nvm.ops, ei->nvm_ops, sizeof(hw->nvm.ops));
 	memcpy(&hw->phy.ops, ei->phy_ops, sizeof(hw->phy.ops));

-- 
Jean Delvare
Suse L3

^ permalink raw reply

* Re: [PATCH 11/19] drivers/net/irda: use for_each_pci_dev()
From: Jiri Kosina @ 2010-07-20 15:33 UTC (permalink / raw)
  To: Kulikov Vasiliy
  Cc: Kernel Janitors, Samuel Ortiz, David S. Miller, Stephen Hemminger,
	Joe Perches, Eric Dumazet, netdev, linux-kernel
In-Reply-To: <1278173056-11779-1-git-send-email-segooon@gmail.com>

On Sat, 3 Jul 2010, Kulikov Vasiliy wrote:

> Use for_each_pci_dev() to simplify the code.
> 
> Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
> ---
>  drivers/net/irda/smsc-ircc2.c |    5 +----
>  1 files changed, 1 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/irda/smsc-ircc2.c b/drivers/net/irda/smsc-ircc2.c
> index d67e484..850ca1c 100644
> --- a/drivers/net/irda/smsc-ircc2.c
> +++ b/drivers/net/irda/smsc-ircc2.c
> @@ -2848,9 +2848,7 @@ static int __init smsc_ircc_preconfigure_subsystems(unsigned short ircc_cfg,
>  	unsigned short ss_device = 0x0000;
>  	int ret = 0;
>  
> -	dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev);
> -
> -	while (dev != NULL) {
> +	for_each_pci_dev(dev) {
>  		struct smsc_ircc_subsystem_configuration *conf;
>  
>  		/*
> @@ -2899,7 +2897,6 @@ static int __init smsc_ircc_preconfigure_subsystems(unsigned short ircc_cfg,
>  					ret = -ENODEV;
>  			}
>  		}
> -		dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev);
>  	}
>  
>  	return ret;

Doesn't seem to hit linux-next as of today. Dave, are you going to merge 
it?

Thanks,

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH net-next 0/3] cxgb4vf: fixes for several small issues discovered by QA
From: Casey Leedom @ 2010-07-20 15:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100719.204002.219738044.davem@davemloft.net>

| From: David Miller <davem@davemloft.net>
| Date: Monday, July 19, 2010 08:40 pm
| 
| From: Casey Leedom <leedom@chelsio.com>
| Date: Mon, 19 Jul 2010 18:12:10 -0700
| 
| >   A couple of small (but important) fixes discovered by our QA people. 
| >   I've also
| > 
| > included a patch to add myself as the maintainer of cxgb4vf in the
| > MAINTAINERS file which I think is the protocol but please correct me if
| > changes to that file are usually performed by someone else.
| 
| Well, where are they?

  Oops!  Sorry!!  It was my first effort at using "git send-email" and in my manic 
attempt to make sure I wasn't screwing it up I had directed the patches to 
myself to verify that the process worked and then forgot to change the To: 
address to netdev@vger.kernel.org in my "real attempt."

  Patches coming forthwith.

Casey

^ permalink raw reply

* [PATCH net-next 1/3] cxgb4vf: Fix off-by-one error checking for the end of the mailbox delay array
From: Casey Leedom @ 2010-07-20 15:58 UTC (permalink / raw)
  To: netdev

>From 1d32860335fad8c67e23254aec7c30750276f2b4 Mon Sep 17 00:00:00 2001
From: Casey Leedom <leedom@chelsio.com>
Date: Mon, 19 Jul 2010 17:51:46 -0700
Subject: [PATCH net-next 1/3] cxgb4vf: Fix off-by-one error checking for the end of the mailbox delay array

Fix off-by-one error in checking for the end of the mailbox response delay
array.  We ended up walking off the end and, if we were unlucky, we'd end up
pulling in a 0 and never terminate the mailbox response delay loop ...

Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/cxgb4vf/t4vf_hw.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb4vf/t4vf_hw.c b/drivers/net/cxgb4vf/t4vf_hw.c
index 1ef2528..ea1c123 100644
--- a/drivers/net/cxgb4vf/t4vf_hw.c
+++ b/drivers/net/cxgb4vf/t4vf_hw.c
@@ -163,7 +163,7 @@ int t4vf_wr_mbox_core(struct adapter *adapter, const void *cmd, int size,
 	for (i = 0; i < 500; i += ms) {
 		if (sleep_ok) {
 			ms = delay[delay_idx];
-			if (delay_idx < ARRAY_SIZE(delay))
+			if (delay_idx < ARRAY_SIZE(delay) - 1)
 				delay_idx++;
 			msleep(ms);
 		} else
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH net-next 2/3] cxgb4vf: Fix bug where we were only allocating one queue in MSI mode
From: Casey Leedom @ 2010-07-20 15:59 UTC (permalink / raw)
  To: netdev

>From 7e141cafe989958267803791aa1bcacfffe5cfb2 Mon Sep 17 00:00:00 2001
From: Casey Leedom <leedom@chelsio.com>
Date: Mon, 19 Jul 2010 17:53:48 -0700
Subject: [PATCH net-next 2/3] cxgb4vf: Fix bug where we were only allocating one queue in MSI mode

Fix bug in setup_sge_queues() where we were incorrectly only allocating a
single "Queue Set" for MSI mode.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/cxgb4vf/cxgb4vf_main.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c b/drivers/net/cxgb4vf/cxgb4vf_main.c
index d065516..a165632 100644
--- a/drivers/net/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
@@ -533,10 +533,9 @@ static int setup_sge_queues(struct adapter *adapter)
 		struct port_info *pi = netdev_priv(dev);
 		struct sge_eth_rxq *rxq = &s->ethrxq[pi->first_qset];
 		struct sge_eth_txq *txq = &s->ethtxq[pi->first_qset];
-		int nqsets = (adapter->flags & USING_MSIX) ? pi->nqsets : 1;
 		int qs;
 
-		for (qs = 0; qs < nqsets; qs++, rxq++, txq++) {
+		for (qs = 0; qs < pi->nqsets; qs++, rxq++, txq++) {
 			err = t4vf_sge_alloc_rxq(adapter, &rxq->rspq, false,
 						 dev, msix++,
 						 &rxq->fl, t4vf_ethrx_handler);
@@ -565,10 +564,9 @@ static int setup_sge_queues(struct adapter *adapter)
 		struct port_info *pi = netdev_priv(dev);
 		struct sge_eth_rxq *rxq = &s->ethrxq[pi->first_qset];
 		struct sge_eth_txq *txq = &s->ethtxq[pi->first_qset];
-		int nqsets = (adapter->flags & USING_MSIX) ? pi->nqsets : 1;
 		int qs;
 
-		for (qs = 0; qs < nqsets; qs++, rxq++, txq++) {
+		for (qs = 0; qs < pi->nqsets; qs++, rxq++, txq++) {
 			IQ_MAP(s, rxq->rspq.abs_id) = &rxq->rspq;
 			EQ_MAP(s, txq->q.abs_id) = &txq->q;
 
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH net-next 3/3] cxgb4vf: add maintainer entry for cxgb4vf
From: Casey Leedom @ 2010-07-20 15:59 UTC (permalink / raw)
  To: netdev

>From 7710873beb494b46333fdf9841b4a117bdb66a5a Mon Sep 17 00:00:00 2001
From: Casey Leedom <leedom@chelsio.com>
Date: Mon, 19 Jul 2010 17:55:33 -0700
Subject: [PATCH net-next 3/3] cxgb4vf: add maintainer entry for cxgb4vf

Adding myself as the official maintainer of the Chelsio T4 Virtual function
Driver (cxgb4vf).

Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
 MAINTAINERS |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c05b499..aa73c9a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1767,6 +1767,13 @@ W:	http://www.openfabrics.org
 S:	Supported
 F:	drivers/infiniband/hw/cxgb4/
 
+CXGB4VF ETHERNET DRIVER (CXGB4VF)
+M:	Casey Leedom <leedom@chelsio.com>
+L:	netdev@vger.kernel.org
+W:	http://www.chelsio.com
+S:	Supported
+F:	drivers/net/cxgb4vf/
+
 CYBERPRO FB DRIVER
 M:	Russell King <linux@arm.linux.org.uk>
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH net-next] drivers/acpi/acpica/utmisc.c: Use printk extension %pV
From: Bjorn Helgaas @ 2010-07-20 16:08 UTC (permalink / raw)
  To: Joe Perches; +Cc: Len Brown, linux-acpi, linux-kernel, netdev, David Miller
In-Reply-To: <1279602367.19374.20.camel@Joe-Laptop.home>

On Monday, July 19, 2010 11:06:07 pm Joe Perches wrote:
> Consolidates the printk messages to a single
> call so the messages can not be interleaved.
> 
> Reduces text a bit.
> 
> $ size drivers/acpi/acpica/utmisc.o.*
>    text	   data	    bss	    dec	    hex	filename
>    7822	     56	   1832	   9710	   25ee	drivers/acpi/acpica/utmisc.o.old
>    7748	     56	   1736	   9540	   2544	drivers/acpi/acpica/utmisc.o.new
> 
> Depends on net-next commit 7db6f5fb65a82af03229eef104dc9899c5eecf33
> (vsprintf: Recursive vsnprintf: Add "%pV", struct va_format)

drivers/acpi/acpica/utmisc.c is part of the ACPI CA and is used in
several different OSes, but %pV sounds like a Linux-specific feature,
so I don't see how this patch can work.

Bjorn

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/5] ixgbe: drop support for UDP in RSS hash generation
From: Alexander Duyck @ 2010-07-20 16:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, bphilips@novell.com, Skidmore, Donald C
In-Reply-To: <1279607980.2458.82.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le lundi 19 juillet 2010 à 16:59 -0700, Jeff Kirsher a écrit :
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> This change removes UDP from the supported protocols for RSS hashing.  The
>> reason for removing this protocol is because IP fragmentation was causing a
>> network flow to be broken into two streams, one for fragmented, and one for
>> non-fragmented and this in turn was causing out-of-order issues.
>>
> 
> Jeff, does it mean all UDP packets are going to be delivered to a single
> queue ?
> 
> This would be a serious regression.
> 
> Many UDP applications try hard to not use fragments. 
> 
> They are going to pay the price because some application :
> - Use big segments, fragmented.
> - Is subject to OOO artifacts.
> 
> We would like some clarifications please :)
> 
> 
> 

The packets will still be hashed on source and destination IPv4/IPv6 
addresses.  The change just drops reading the UDP source/destination 
ports since in the case of fragmented packets they are not available and 
as such were being parsed as IPv4/IPv6 packets.  By making this change 
the queue selection is consistent between all packets in the UDP stream.

The only regression I would expect to see would be in testing between 
two fixed systems since the IP addresses of the two systems would be 
fixed and so running multiple flows between the two would yield the same 
  RSS hash for multiple UDP streams.  As long as multiple ip addresses 
are used  you should see multiple RSS hashes generated and as such the 
load should still be distributed.

Thanks,

Alex

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/5] ixgbe: drop support for UDP in RSS hash generation
From: Eric Dumazet @ 2010-07-20 16:39 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, bphilips@novell.com, Skidmore, Donald C
In-Reply-To: <4C45CAC6.1050006@intel.com>

Le mardi 20 juillet 2010 à 09:11 -0700, Alexander Duyck a écrit :

> The packets will still be hashed on source and destination IPv4/IPv6 
> addresses.  The change just drops reading the UDP source/destination 
> ports since in the case of fragmented packets they are not available and 
> as such were being parsed as IPv4/IPv6 packets.  By making this change 
> the queue selection is consistent between all packets in the UDP stream.
> 

Excellent, this is perfect IMHO.

> The only regression I would expect to see would be in testing between 
> two fixed systems since the IP addresses of the two systems would be 
> fixed and so running multiple flows between the two would yield the same 
>   RSS hash for multiple UDP streams.  As long as multiple ip addresses 
> are used  you should see multiple RSS hashes generated and as such the 
> load should still be distributed.
> 

Ack. Fortunately, one can still use RPS to spread load onto multiple
cpus in this case.

This until ixgpe fills skb->rxhash with a non null value.
If it happens one day, we shall remind _not_ filling it for UDP packets.

BTW, this reminds me a netdev discussion we had for bnx2x

http://www.kerneltrap.com/mailarchive/linux-netdev/2010/4/23/6275415/thread

And now, I understand why Toepliz hash doesnt use src port/dst port,
since this is not available on fragments, obviously...

Thanks !



^ permalink raw reply

* [PATCH] drop_monitor: convert some kfree_skb call sites to consume_skb
From: Neil Horman @ 2010-07-20 16:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, viro, eparis, nhorman

Convert a few calls from kfree_skb to consume_skb

Noticed while I was working on dropwatch that I was detecting lots of internal
skb drops in several places.  While some are legitimate, several were not,
freeing skbs that were at the end of their life, rather than being discarded due
to an error.  This patch converts those calls sites from using kfree_skb to
consume_skb, which quiets the in-kernel drop_monitor code from detecting them as
drops.  Tested successfully by myself

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Eric Paris <eparis@redhat.com>
CC: Al Viro <viro@zeniv.linux.org.uk>


 kernel/audit.c           |    2 +-
 net/netlink/af_netlink.c |    9 +++++----
 net/unix/af_unix.c       |    2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index c71bd26..8296aa5 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -407,7 +407,7 @@ static void kauditd_send_skb(struct sk_buff *skb)
 		audit_hold_skb(skb);
 	} else
 		/* drop the extra reference if sent ok */
-		kfree_skb(skb);
+		consume_skb(skb);
 }
 
 static int kauditd_thread(void *dummy)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7aeaa83..8648a99 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1076,14 +1076,15 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 pid,
 	sk_for_each_bound(sk, node, &nl_table[ssk->sk_protocol].mc_list)
 		do_one_broadcast(sk, &info);
 
-	kfree_skb(skb);
+	consume_skb(skb);
 
 	netlink_unlock_table();
 
-	kfree_skb(info.skb2);
-
-	if (info.delivery_failure)
+	if (info.delivery_failure) {
+		kfree_skb(info.skb2);
 		return -ENOBUFS;
+	} else
+		consume_skb(info.skb2);
 
 	if (info.delivered) {
 		if (info.congested && (allocation & __GFP_WAIT))
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 75ba48b..4414a18 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1906,7 +1906,7 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 				break;
 			}
 
-			kfree_skb(skb);
+			consume_skb(skb);
 
 			if (siocb->scm->fp)
 				break;

^ permalink raw reply related

* Re: [PATCH] net: Add batman-adv meshing protocol
From: David Miller @ 2010-07-20 16:59 UTC (permalink / raw)
  To: sven.eckelmann; +Cc: b.a.t.m.a.n, siwu, netdev
In-Reply-To: <201007201028.12470.sven.eckelmann@gmx.de>

From: Sven Eckelmann <sven.eckelmann@gmx.de>
Date: Tue, 20 Jul 2010 10:28:10 +0200

> I would leave that to the original author of those functions.

Whoever you leave it to, this needs to be fixed up before this
code can be integrated.  So if those people are not active or
don't have time, you very much should consider the reality that
you might have to do these fixups yourself.


^ permalink raw reply

* Re: [net-next-2.6 PATCH] e1000: allow option to limit number of descriptors down to 48 per ring
From: Brandeburg, Jesse @ 2010-07-20 17:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, bphilips@novell.com, Duyck, Alexander H
In-Reply-To: <1279600823.2458.59.camel@edumazet-laptop>



On Mon, 19 Jul 2010, Eric Dumazet wrote:

> Le lundi 19 juillet 2010 à 16:43 -0700, Jeff Kirsher a écrit :
> > From: Alexander Duyck <alexander.h.duyck@intel.com>
> > 
> > This change makes it possible to limit the number of descriptors down to 48
> > per ring.  The reason for this change is to address a variation on hardware
> > errata 10 for 82546GB in which descriptors will be lost if more than 32
> > descriptors are fetched and the PCI-X MRBC is 512.
> > -#define E1000_MIN_TXD                       80
> > +#define E1000_MIN_TXD                       48
> >  #define E1000_MAX_82544_TXD               4096
> >  
> >  #define E1000_DEFAULT_RXD                  256
> >  #define E1000_MAX_RXD                      256
> > -#define E1000_MIN_RXD                       80
> > +#define E1000_MIN_RXD                       48
> >  #define E1000_MAX_82544_RXD               4096

> So this limit is a pure software one ?

Yes, the hardware will work fine (with the exception of some limits when 
certain performance registers are configured like [TR]XDCTL) all the way 
down to 1 descriptor, that said, the practical limit is probably more like 
8 for RX descriptors, and is 2 * (MAX_SKB_FRAGS + 1) for transmit if TSO 
is enabled.

if all offloads are disabled you could run with only 8 tx descriptors, and 
I believe it would work fine.

> Why not let an admin chose a lower limit if he wants to ?

I think in this case just to prevent odd interaction bugs with low values 
(there would have to be several driver changes and the testing work would 
increase to support very low values)

> I am asking because big ring sizes can be a latency source in some
> workloads.

yes, if you are trying to pace traffic in software.  That said, at least 
on transmit, most frames exit immediately from the hardware unless some 
external event like flow control is slowing down transmit.

Jesse

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox