* Re: multicast, interfaces, kernel 3.0+...
From: Michael Tokarev @ 2012-09-22 4:43 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120922.003133.2272531217856660981.davem@davemloft.net>
On 22.09.2012 08:31, David Miller wrote:
> From: Michael Tokarev <mjt@tls.msk.ru>
> Date: Sat, 22 Sep 2012 08:21:52 +0400
>
>> The IP_ADD_MEMBERSHIP interface is apparently misdefined, because it
>> accepts an IP address of an interface, instead of an ifindex, or
>> ifname, or something like this, since there's no, obviously, 1:1
>> correspondence between ifaces and addresses, an iface can have no
>> addresses assotiated with it, or two ifaces can share one IP address
>> like in my case. But the "questionable" part is the "usualness" of
>> this setup I have here, with two ifaces having the same IP address.
>
> Can you at least look at the API specification for IP_ADD_MEMBERSHIP
> before making such claims?
As I mentioned in previous email, I haven't dealt with multicast before,
so obviously I tried my best to learn before making any claims at all.
And the fine manual, http://tldp.org/HOWTO/Multicast-HOWTO-6.html ,
says:
6.4 IP_ADD_MEMBERSHIP.
The ip_mreq structure (taken from /usr/include/linux/in.h) has the following members:
struct ip_mreq
{
struct in_addr imr_multiaddr; /* IP multicast address of group */
struct in_addr imr_interface; /* local IP address of interface */
};
...
setsockopt (socket, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
Yes I probably should have read the mentioned header and the manpage too.
But the app does this:
setsockopt(567, SOL_IP, IP_ADD_MEMBERSHIP, "\344\5\6\7\nM\7Z", 8) = 0
setsockopt(567, SOL_IP, IP_ADD_MEMBERSHIP, "\344\5\6\7\nM\7Z", 8) = -1 EADDRINUSE (Address already in use)
so apparently sizeof(mreq) is 8 bytes for it, and I just didn't think there
may be additional fields in "real life". That was puzzling, so I asked.
(This is most likely a generic java interface to this facility, not
linux-specific).
As you can see, I at least tried. It wasn't apparently successful, but
that isn't entirely my fault either, -- the "canonical" howto does not
mention that there might be more members in this structure.. ;)
> The IP_ADD_MEMBERSHIP interface allows for the specification of a
> specific interface, the structure you pass into IP_ADD_MEMBERSHIP has
> an ->imr_ifindex field and this is the first key the call uses
> to pick a device.
Ok, after you mentioned this, I looked at the other sources and indeed
it has. I stand corrected, and my questions answered.
> If you do not specify an explicit ifindex, and leave it at zero which
> I bet your application is doing, it picks the first address which has
> the specified address.
>
> As you have discovered, just specifying the address can cause unwanted
> effects when multiple devices have the same IP address. Because the
> order of network devices in the system is never, and has never, been
> guaranteed.
>
> So the selection in this situation is essentially random because
> you haven't given the kernel enough information to choose things
> the way that you want it to.
Yes, that's what I thought too, it was just puzzling with the missing
bits of info. And yes I remember when order of addresses changed
in various places (routing table was one example) and people started
complaining, even when order had never been deterministic.
I'm not complaining, I'm just asking. And you answered my questions
perfectly, thank you!
/mjt
^ permalink raw reply
* Re: multicast, interfaces, kernel 3.0+...
From: David Miller @ 2012-09-22 4:47 UTC (permalink / raw)
To: mjt; +Cc: netdev
In-Reply-To: <505D41DB.7060508@msgid.tls.msk.ru>
From: Michael Tokarev <mjt@tls.msk.ru>
Date: Sat, 22 Sep 2012 08:43:07 +0400
> And the fine manual, http://tldp.org/HOWTO/Multicast-HOWTO-6.html ,
You're reading a document that's 14 years old.
^ permalink raw reply
* Re: multicast, interfaces, kernel 3.0+...
From: David Miller @ 2012-09-22 4:50 UTC (permalink / raw)
To: mjt; +Cc: netdev
In-Reply-To: <20120922.004712.758145747373586331.davem@davemloft.net>
BTW, your site must be in a huge number of anti SPAM databases.
Because every time a mail is sent in this thread, as vger.kernel.org
postmaster I see hundreds of subscribers bounce.
It only happens for postings where your email address appears.
So you are basically invisible to much of the world, just FYI.
^ permalink raw reply
* Re: [PATCH net-next 3/3] ptp: derive the device name from the parent device
From: Richard Cochran @ 2012-09-22 5:43 UTC (permalink / raw)
To: Ben Hutchings
Cc: Keller, Jacob E, netdev@vger.kernel.org, David Miller,
Kirsher, Jeffrey T, John Stultz, Vick, Matthew
In-Reply-To: <1348254106.2521.56.camel@bwh-desktop.uk.solarflarecom.com>
On Fri, Sep 21, 2012 at 08:01:46PM +0100, Ben Hutchings wrote:
>
> The ethtool command is useful but setting the parent device may be even
> more useful, e.g. you will be able to write udev rules for PHC devices
> based on the parent device's identity.
Thinking about this a bit more, it makes no sense to put the parent
device name into clock_name, because that information is redundant.
# ls -l /sys/class/ptp/ptp0/
-r--r--r-- 1 root root 8192 Jan 1 00:00 clock_name
-r--r--r-- 1 root root 8192 Jan 1 00:00 dev
lrwxrwxrwx 1 root root 0 Jan 1 00:00 device -> ../../../fec-1:01
I agree that the parent device is useful, and I will add it. However,
I will leave the clock_name as it is, adding a bit more prose to the
ABI description.
The one case where clock_name becomes really important is when you
have a PHY clock, like in the above example, since it provides a way
to see that the clock is *not* related to the MAC.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH net-next 3/3] ptp: derive the device name from the parent device
From: Richard Cochran @ 2012-09-22 5:46 UTC (permalink / raw)
To: Ben Hutchings
Cc: Keller, Jacob E, netdev@vger.kernel.org, David Miller,
Kirsher, Jeffrey T, John Stultz, Vick, Matthew
In-Reply-To: <20120922054345.GA6124@netboy.at.omicron.at>
On Sat, Sep 22, 2012 at 07:43:46AM +0200, Richard Cochran wrote:
>
> The one case where clock_name becomes really important is when you
> have a PHY clock, like in the above example, since it provides a way
> to see that the clock is *not* related to the MAC.
I mean that a PHY clock is not a part of the MAC. Of course it is
related to the MAC by virtue of being connected to it via a MDIO bus.
Thanks,
Richard
^ permalink raw reply
* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-09-22 6:26 UTC (permalink / raw)
To: Chris Clayton; +Cc: Eric Dumazet, netdev
In-Reply-To: <5059E40C.4070607@googlemail.com>
I guess you network developer folks are either very busy or this
regression is proving a bit troublesome to identify, so I've opened a
bugzilla report to keep track of it. The report number is 47761.
Chris
On 09/19/12 16:26, Chris Clayton wrote:
>>
>> It would help to have some traffic sample, maybe.
>>
>> Especially if the problem is not easily reproductible for us.
>>
>
> OK, I've used an netsniff-ng to capture the traffic on all interfaces on
> the host (that would be tap0 and eth0, I guess) whilst attempting to
> ping the router from the WinXP KVM client. The result is a pcap file
> that I processed with tcpdump to produce:
>
> reading from file net-trace.pcap, link-type EN10MB (Ethernet)
> 14:56:31.406336 ARP, Request who-has 192.168.200.254 tell 192.168.200.1,
> length 28
> 0x0000: 0001 0800 0604 0001 5254 0c3b 1728 c0a8
> 0x0010: c801 0000 0000 0000 c0a8 c8fe
> 14:56:31.406357 ARP, Reply 192.168.200.254 is-at 46:83:93:8f:f0:7e,
> length 28
> 0x0000: 0001 0800 0604 0002 4683 938f f07e c0a8
> 0x0010: c8fe 5254 0c3b 1728 c0a8 c801
> 14:56:31.406534 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 4352, length 40
> 0x0000: 4500 003c 0195 0000 8001 efd8 c0a8 c801
> 0x0010: c0a8 0001 0800 3a5c 0200 1100 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:31.406566 ARP, Request who-has 192.168.0.1 tell 192.168.0.40,
> length 28
> 0x0000: 0001 0800 0604 0001 5c9a d85c 6331 c0a8
> 0x0010: 0028 0000 0000 0000 c0a8 0001
> 14:56:31.410830 ARP, Reply 192.168.0.1 is-at 00:1f:33:80:09:44, length 46
> 0x0000: 0001 0800 0604 0002 001f 3380 0944 c0a8
> 0x0010: 0001 5c9a d85c 6331 c0a8 0028 c0a8 0001
> 0x0020: e000 0001 1164 ee9b 0000 0000 4500
> 14:56:31.410851 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 4352, length 40
> 0x0000: 4500 003c 0195 0000 7f01 b8b2 c0a8 0028
> 0x0010: c0a8 0001 0800 3a5c 0200 1100 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:31.414474 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 4352, length 40
> 0x0000: 4500 003c cf4f 0000 ff01 6af7 c0a8 0001
> 0x0010: c0a8 0028 0000 425c 0200 1100 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:36.404781 ARP, Request who-has 192.168.0.40 tell 192.168.0.1,
> length 46
> 0x0000: 0001 0800 0604 0001 001f 3380 0944 c0a8
> 0x0010: 0001 0000 0000 0000 c0a8 0028 c0a8 0001
> 0x0020: c0a8 0028 0000 425c 0200 1100 6162
> 14:56:36.404806 ARP, Reply 192.168.0.40 is-at 5c:9a:d8:5c:63:31, length 28
> 0x0000: 0001 0800 0604 0002 5c9a d85c 6331 c0a8
> 0x0010: 0028 001f 3380 0944 c0a8 0001
> 14:56:36.689750 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 4608, length 40
> 0x0000: 4500 003c 0196 0000 8001 efd7 c0a8 c801
> 0x0010: c0a8 0001 0800 395c 0200 1200 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:36.689774 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 4608, length 40
> 0x0000: 4500 003c 0196 0000 7f01 b8b1 c0a8 0028
> 0x0010: c0a8 0001 0800 395c 0200 1200 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:36.693330 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 4608, length 40
> 0x0000: 4500 003c cf50 0000 ff01 6af6 c0a8 0001
> 0x0010: c0a8 0028 0000 415c 0200 1200 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:42.189424 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 4864, length 40
> 0x0000: 4500 003c 0197 0000 8001 efd6 c0a8 c801
> 0x0010: c0a8 0001 0800 385c 0200 1300 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:42.189447 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 4864, length 40
> 0x0000: 4500 003c 0197 0000 7f01 b8b0 c0a8 0028
> 0x0010: c0a8 0001 0800 385c 0200 1300 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:42.193029 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 4864, length 40
> 0x0000: 4500 003c cf51 0000 ff01 6af5 c0a8 0001
> 0x0010: c0a8 0028 0000 405c 0200 1300 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:47.689414 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 5120, length 40
> 0x0000: 4500 003c 0198 0000 8001 efd5 c0a8 c801
> 0x0010: c0a8 0001 0800 375c 0200 1400 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:47.689439 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 5120, length 40
> 0x0000: 4500 003c 0198 0000 7f01 b8af c0a8 0028
> 0x0010: c0a8 0001 0800 375c 0200 1400 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
> 14:56:47.693661 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 5120, length 40
> 0x0000: 4500 003c cf52 0000 ff01 6af4 c0a8 0001
> 0x0010: c0a8 0028 0000 3f5c 0200 1400 6162 6364
> 0x0020: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374
> 0x0030: 7576 7761 6263 6465 6667 6869
>
> Is this what you asked for?
>
> Chris
>
^ permalink raw reply
* [PATCH V2 net-next 0/4] Two new PTP Hardware Clock features
From: Richard Cochran @ 2012-09-22 7:42 UTC (permalink / raw)
To: netdev
Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
John Stultz, Matthew Vick
This patch series adds two new features to the PHC code.
The first two patches let a user program find out the previously
dialed frequency adjustment. This is primarily useful when restarting
a PTP service, since without this information, the presumably correct
adjustment will bias the new frequency estimation.
The third patch links the phc class device to its parent device within
the driver model and sysfs.
The fourth patch adds a bit more documentation of the sysfs clock_name
attribute. This should help clarify the naming scheme.
Thanks,
Richard
Richard Cochran (4):
ptp: remember the adjusted frequency
ptp: provide the clock's adjusted frequency
ptp: link the phc device to its parent device
ptp: clarify the clock_name sysfs attribute
Documentation/ABI/testing/sysfs-ptp | 5 ++++-
drivers/net/ethernet/freescale/gianfar_ptp.c | 2 +-
drivers/net/ethernet/intel/igb/igb_ptp.c | 3 ++-
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 3 ++-
drivers/net/ethernet/sfc/ptp.c | 3 ++-
drivers/net/phy/dp83640.c | 2 +-
drivers/ptp/ptp_clock.c | 11 +++++++++--
drivers/ptp/ptp_ixp46x.c | 2 +-
drivers/ptp/ptp_pch.c | 2 +-
drivers/ptp/ptp_private.h | 1 +
include/linux/ptp_clock_kernel.h | 7 +++++--
11 files changed, 29 insertions(+), 12 deletions(-)
--
1.7.2.5
^ permalink raw reply
* [PATCH V2 net-next 1/4] ptp: remember the adjusted frequency
From: Richard Cochran @ 2012-09-22 7:42 UTC (permalink / raw)
To: netdev
Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>
This patch adds a field to the representation of a PTP hardware clock in
order to remember the frequency adjustment value dialed by the user.
Adding this field will let us answer queries in the manner of adjtimex
in a follow on patch.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
drivers/ptp/ptp_clock.c | 1 +
drivers/ptp/ptp_private.h | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 966875d..67e628e 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -147,6 +147,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx)
} else if (tx->modes & ADJ_FREQUENCY) {
err = ops->adjfreq(ops, scaled_ppm_to_ppb(tx->freq));
+ ptp->dialed_frequency = tx->freq;
}
return err;
diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
index 4d5b508..69d3207 100644
--- a/drivers/ptp/ptp_private.h
+++ b/drivers/ptp/ptp_private.h
@@ -45,6 +45,7 @@ struct ptp_clock {
dev_t devid;
int index; /* index into clocks.map */
struct pps_device *pps_source;
+ long dialed_frequency; /* remembers the frequency adjustment */
struct timestamp_event_queue tsevq; /* simple fifo for time stamps */
struct mutex tsevq_mux; /* one process at a time reading the fifo */
wait_queue_head_t tsev_wq;
--
1.7.2.5
^ permalink raw reply related
* [PATCH V2 net-next 2/4] ptp: provide the clock's adjusted frequency
From: Richard Cochran @ 2012-09-22 7:42 UTC (permalink / raw)
To: netdev
Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>
If the timex.mode field indicates a query, then we provide the value of
the current frequency adjustment.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
drivers/ptp/ptp_clock.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 67e628e..6f7009a 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -148,6 +148,11 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx)
err = ops->adjfreq(ops, scaled_ppm_to_ppb(tx->freq));
ptp->dialed_frequency = tx->freq;
+
+ } else if (tx->modes == 0) {
+
+ tx->freq = ptp->dialed_frequency;
+ err = 0;
}
return err;
--
1.7.2.5
^ permalink raw reply related
* [PATCH V2 net-next 3/4] ptp: link the phc device to its parent device
From: Richard Cochran @ 2012-09-22 7:42 UTC (permalink / raw)
To: netdev
Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>
PTP Hardware Clock devices appear as class devices in sysfs. This patch
changes the registration API to use the parent device, clarifying the
clock's relationship to the underlying device.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
drivers/net/ethernet/freescale/gianfar_ptp.c | 2 +-
drivers/net/ethernet/intel/igb/igb_ptp.c | 3 ++-
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 3 ++-
drivers/net/ethernet/sfc/ptp.c | 3 ++-
drivers/net/phy/dp83640.c | 2 +-
drivers/ptp/ptp_clock.c | 5 +++--
drivers/ptp/ptp_ixp46x.c | 2 +-
drivers/ptp/ptp_pch.c | 2 +-
include/linux/ptp_clock_kernel.h | 7 +++++--
9 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/freescale/gianfar_ptp.c b/drivers/net/ethernet/freescale/gianfar_ptp.c
index c08e5d4..18762a3 100644
--- a/drivers/net/ethernet/freescale/gianfar_ptp.c
+++ b/drivers/net/ethernet/freescale/gianfar_ptp.c
@@ -510,7 +510,7 @@ static int gianfar_ptp_probe(struct platform_device *dev)
spin_unlock_irqrestore(&etsects->lock, flags);
- etsects->clock = ptp_clock_register(&etsects->caps);
+ etsects->clock = ptp_clock_register(&etsects->caps, &dev->dev);
if (IS_ERR(etsects->clock)) {
err = PTR_ERR(etsects->clock);
goto no_clock;
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index e13ba1d..ee21445 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -752,7 +752,8 @@ void igb_ptp_init(struct igb_adapter *adapter)
wr32(E1000_IMS, E1000_IMS_TS);
}
- adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps);
+ adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps,
+ &adapter->pdev->dev);
if (IS_ERR(adapter->ptp_clock)) {
adapter->ptp_clock = NULL;
dev_err(&adapter->pdev->dev, "ptp_clock_register failed\n");
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 3456d56..39881cb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -960,7 +960,8 @@ void ixgbe_ptp_init(struct ixgbe_adapter *adapter)
/* (Re)start the overflow check */
adapter->flags2 |= IXGBE_FLAG2_OVERFLOW_CHECK_ENABLED;
- adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps);
+ adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps,
+ &adapter->pdev->dev);
if (IS_ERR(adapter->ptp_clock)) {
adapter->ptp_clock = NULL;
e_dev_err("ptp_clock_register failed\n");
diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index 2b07a4e..3ed5d13 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -931,7 +931,8 @@ static int efx_ptp_probe_channel(struct efx_channel *channel)
ptp->phc_clock_info.settime = efx_phc_settime;
ptp->phc_clock_info.enable = efx_phc_enable;
- ptp->phc_clock = ptp_clock_register(&ptp->phc_clock_info);
+ ptp->phc_clock = ptp_clock_register(&ptp->phc_clock_info,
+ &channel->napi_dev->dev);
if (!ptp->phc_clock)
goto fail3;
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index b0da022..24e05c4 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -980,7 +980,7 @@ static int dp83640_probe(struct phy_device *phydev)
if (choose_this_phy(clock, phydev)) {
clock->chosen = dp83640;
- clock->ptp_clock = ptp_clock_register(&clock->caps);
+ clock->ptp_clock = ptp_clock_register(&clock->caps, &phydev->dev);
if (IS_ERR(clock->ptp_clock)) {
err = PTR_ERR(clock->ptp_clock);
goto no_register;
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 6f7009a..b15a376 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -186,7 +186,8 @@ static void delete_ptp_clock(struct posix_clock *pc)
/* public interface */
-struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info)
+struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
+ struct device *parent)
{
struct ptp_clock *ptp;
int err = 0, index, major = MAJOR(ptp_devt);
@@ -219,7 +220,7 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info)
init_waitqueue_head(&ptp->tsev_wq);
/* Create a new device in our class. */
- ptp->dev = device_create(ptp_class, NULL, ptp->devid, ptp,
+ ptp->dev = device_create(ptp_class, parent, ptp->devid, ptp,
"ptp%d", ptp->index);
if (IS_ERR(ptp->dev))
goto no_device;
diff --git a/drivers/ptp/ptp_ixp46x.c b/drivers/ptp/ptp_ixp46x.c
index e03c406..d49b851 100644
--- a/drivers/ptp/ptp_ixp46x.c
+++ b/drivers/ptp/ptp_ixp46x.c
@@ -298,7 +298,7 @@ static int __init ptp_ixp_init(void)
ixp_clock.caps = ptp_ixp_caps;
- ixp_clock.ptp_clock = ptp_clock_register(&ixp_clock.caps);
+ ixp_clock.ptp_clock = ptp_clock_register(&ixp_clock.caps, NULL);
if (IS_ERR(ixp_clock.ptp_clock))
return PTR_ERR(ixp_clock.ptp_clock);
diff --git a/drivers/ptp/ptp_pch.c b/drivers/ptp/ptp_pch.c
index 3a9c17e..e624e4d 100644
--- a/drivers/ptp/ptp_pch.c
+++ b/drivers/ptp/ptp_pch.c
@@ -627,7 +627,7 @@ pch_probe(struct pci_dev *pdev, const struct pci_device_id *id)
}
chip->caps = ptp_pch_caps;
- chip->ptp_clock = ptp_clock_register(&chip->caps);
+ chip->ptp_clock = ptp_clock_register(&chip->caps, &pdev->dev);
if (IS_ERR(chip->ptp_clock))
return PTR_ERR(chip->ptp_clock);
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index a644b29..56c71b2 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -21,6 +21,7 @@
#ifndef _PTP_CLOCK_KERNEL_H_
#define _PTP_CLOCK_KERNEL_H_
+#include <linux/device.h>
#include <linux/pps_kernel.h>
#include <linux/ptp_clock.h>
@@ -93,10 +94,12 @@ struct ptp_clock;
/**
* ptp_clock_register() - register a PTP hardware clock driver
*
- * @info: Structure describing the new clock.
+ * @info: Structure describing the new clock.
+ * @parent: Pointer to the parent device of the new clock.
*/
-extern struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info);
+extern struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
+ struct device *parent);
/**
* ptp_clock_unregister() - unregister a PTP hardware clock driver
--
1.7.2.5
^ permalink raw reply related
* [PATCH V2 net-next 4/4] ptp: clarify the clock_name sysfs attribute
From: Richard Cochran @ 2012-09-22 7:42 UTC (permalink / raw)
To: netdev
Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>
There has been some confusion among PHC driver authors about the
intended purpose of the clock_name attribute. This patch expands the
documation in order to clarify how the clock_name field should be
understood.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
Documentation/ABI/testing/sysfs-ptp | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-ptp b/Documentation/ABI/testing/sysfs-ptp
index d40d2b5..c906488 100644
--- a/Documentation/ABI/testing/sysfs-ptp
+++ b/Documentation/ABI/testing/sysfs-ptp
@@ -19,7 +19,10 @@ Date: September 2010
Contact: Richard Cochran <richardcochran@gmail.com>
Description:
This file contains the name of the PTP hardware clock
- as a human readable string.
+ as a human readable string. The purpose of this
+ attribute is to provide the user with a "friendly
+ name" and to help distinguish PHY based devices from
+ MAC based ones.
What: /sys/class/ptp/ptpN/max_adjustment
Date: September 2010
--
1.7.2.5
^ permalink raw reply related
* Re: [PATCH V2 net-next 0/4] Two new PTP Hardware Clock features
From: Richard Cochran @ 2012-09-22 7:45 UTC (permalink / raw)
To: netdev
Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>
On Sat, Sep 22, 2012 at 09:42:52AM +0200, Richard Cochran wrote:
> This patch series adds two new features to the PHC code.
Forgot to say:
V2 preserves the clock_name attribute as it was meant to be, instead
of making any changes to it.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH net-next v1] net: use a per task frag allocator
From: Eric Dumazet @ 2012-09-21 21:11 UTC (permalink / raw)
To: Vijay Subramanian
Cc: David Miller, linux-kernel, netdev, Ben Hutchings,
Alexander Duyck
In-Reply-To: <CAGK4HS98RG78Auvaai3Ny6zz5c_hccO-ShD3GuT=moRRm4+RUA@mail.gmail.com>
On Fri, 2012-09-21 at 13:27 -0700, Vijay Subramanian wrote:
> I get the following compile error with the newer version of the patch
>
> net/sched/em_meta.c: In function ‘meta_int_sk_sendmsg_off’:
> net/sched/em_meta.c:464: error: ‘struct sock’ has no member named
> ‘sk_sndmsg_off’
> make[1]: *** [net/sched/em_meta.o] Error 1
> make: *** [net/sched/em_meta.o] Error 2
>
>
>
> Vijay
Oh well, I wonder what's the expected use of this crap...
Thanks, I'll fix this on v3 !
^ permalink raw reply
* Re: [PATCH V2 net-next 0/4] Two new PTP Hardware Clock features
From: Richard Cochran @ 2012-09-22 9:11 UTC (permalink / raw)
To: netdev
Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
John Stultz, Matthew Vick
In-Reply-To: <20120922074553.GA4143@netboy.at.omicron.at>
On Sat, Sep 22, 2012 at 09:45:53AM +0200, Richard Cochran wrote:
> On Sat, Sep 22, 2012 at 09:42:52AM +0200, Richard Cochran wrote:
> > This patch series adds two new features to the PHC code.
>
> Forgot to say:
>
> V2 preserves the clock_name attribute as it was meant to be, instead
> of making any changes to it.
... and covers the registration API change in the brand new solarflare
phc device, which was overlooked in V1.
Thanks,
Richard
^ permalink raw reply
* Warning! Your mailbox is almost full.
From: Webmail Account Upgrade @ 2012-09-22 8:35 UTC (permalink / raw)
You have exceeded your email limit quota of 450MB. You need to upgrade
your email limit quota to 2GB within the next 48 hours. Use the below
web link to upgrade your email account:
click link below:
http://www.formchamp.com/goform.php?id=38467
Thank you for using our email.
Copyright ©2012 Email Helpdesk Centre.
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
^ permalink raw reply
* [PATCH] pppoe: drop PPPOX_ZOMBIEs in pppoe_release
From: Xiaodong Xu @ 2012-09-22 9:53 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev
From: Xiaodong Xu <stid.smth@gmail.com>
When PPPOE is running over a virtual ethernet interface (e.g., a
bonding interface) and the user tries to delete the interface in case
the PPPOE state is ZOMBIE, the kernel will loop forever while
unregistering net_device for the reference count is not decreased to
zero which should have been done with dev_put().
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
---
--- drivers/net/ppp/pppoe.c.orig 2012-09-19 11:49:27.921826868 +0800
+++ drivers/net/ppp/pppoe.c 2012-09-22 17:44:03.642730082 +0800
@@ -570,7 +570,7 @@ static int pppoe_release(struct socket *
po = pppox_sk(sk);
- if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+ if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
dev_put(po->pppoe_dev);
po->pppoe_dev = NULL;
}
--
Regards,
Xiaodong Xu
^ permalink raw reply
* [PATCH] ipv4: raw: fix icmp_filter()
From: Eric Dumazet @ 2012-09-22 10:08 UTC (permalink / raw)
To: David Miller; +Cc: netdev
From: Eric Dumazet <edumazet@google.com>
icmp_filter() should not modify its input, or else its caller
would need to recompute ip_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
This is a minimal fix, to meet stable expectations.
net/ipv4/raw.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff0f071..d23c657 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -131,18 +131,20 @@ found:
* 0 - deliver
* 1 - block
*/
-static __inline__ int icmp_filter(struct sock *sk, struct sk_buff *skb)
+static int icmp_filter(const struct sock *sk, const struct sk_buff *skb)
{
- int type;
+ struct icmphdr _hdr;
+ const struct icmphdr *hdr;
- if (!pskb_may_pull(skb, sizeof(struct icmphdr)))
+ hdr = skb_header_pointer(skb, skb_transport_offset(skb),
+ sizeof(_hdr), &_hdr);
+ if (!hdr)
return 1;
- type = icmp_hdr(skb)->type;
- if (type < 32) {
+ if (hdr->type < 32) {
__u32 data = raw_sk(sk)->filter.data;
- return ((1 << type) & data) != 0;
+ return ((1U << hdr->type) & data) != 0;
}
/* Do not block unknown ICMP types */
^ permalink raw reply related
* [PATCH] pppoe: drop PPPOX_ZOMBIEs in pppoe_release
From: Xiaodong Xu @ 2012-09-22 10:09 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev
From: Xiaodong Xu <stid.smth@gmail.com>
When PPPOE is running over a virtual ethernet interface (e.g., a
bonding interface) and the user tries to delete the interface in case
the PPPOE state is ZOMBIE, the kernel will loop forever while
unregistering net_device for the reference count is not decreased to
zero which should have been done with dev_put().
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
---
--- linux/drivers/net/ppp/pppoe.c.orig 2012-09-19 11:49:27.921826868 +0800
+++ linux/drivers/net/ppp/pppoe.c 2012-09-22 17:44:03.642730082 +0800
@@ -570,7 +570,7 @@ static int pppoe_release(struct socket *
po = pppox_sk(sk);
- if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+ if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
dev_put(po->pppoe_dev);
po->pppoe_dev = NULL;
}
--
Regards,
Xiaodong Xu
^ permalink raw reply
* [net-next 0/7][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann
This series contains updates to igb only.
The following are changes since commit abb17e6c0c7b27693201dc85f75dbb184279fd10:
netlink: use <linux/export.h> instead of <linux/module.h>
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
Alexander Duyck (5):
igb: Remove logic that was doing NUMA pseudo-aware allocations
igb: Change Tx cleanup loop to do/while instead of for
igb: Change how we populate the RSS indirection table
igb: Simplify how we populate the RSS key
igb: Use dma_unmap_addr and dma_unmap_len defines
Carolyn Wyborny (1):
igb: Fix stats output on i210/i211 parts.
Stefan Assmann (1):
igb: Change how we check for pre-existing and assigned VFs
drivers/net/ethernet/intel/igb/igb.h | 8 +-
drivers/net/ethernet/intel/igb/igb_main.c | 370 ++++++++++--------------------
2 files changed, 122 insertions(+), 256 deletions(-)
--
1.7.11.4
^ permalink raw reply
* [net-next 1/7] igb: Change how we check for pre-existing and assigned VFs
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
To: davem; +Cc: Stefan Assmann, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Stefan Assmann <sassmann@kpanic.de>
Adapt the pre-existing and assigned VFs code to the ixgbe way introduced
in commit 9297127b9cdd8d30c829ef5fd28b7cc0323a7bcd.
Instead of searching the enabled VFs we use pci_num_vf to determine enabled VFs.
By comparing to which PF an assigned VF is owned it's possible to decide
whether to leave it enabled or not.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 1 -
drivers/net/ethernet/intel/igb/igb_main.c | 104 +++++++-----------------------
2 files changed, 22 insertions(+), 83 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 43c8e29..6f17f69 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -101,7 +101,6 @@ struct vf_data_storage {
u16 pf_vlan; /* When set, guest VLAN config not allowed. */
u16 pf_qos;
u16 tx_rate;
- struct pci_dev *vfdev;
};
#define IGB_VF_FLAG_CTS 0x00000001 /* VF is clear to send data */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 246646b..0730096 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -172,8 +172,7 @@ static void igb_check_vf_rate_limit(struct igb_adapter *);
#ifdef CONFIG_PCI_IOV
static int igb_vf_configure(struct igb_adapter *adapter, int vf);
-static int igb_find_enabled_vfs(struct igb_adapter *adapter);
-static int igb_check_vf_assignment(struct igb_adapter *adapter);
+static bool igb_vfs_are_assigned(struct igb_adapter *adapter);
#endif
#ifdef CONFIG_PM
@@ -2300,11 +2299,11 @@ static void __devexit igb_remove(struct pci_dev *pdev)
/* reclaim resources allocated to VFs */
if (adapter->vf_data) {
/* disable iov and allow time for transactions to clear */
- if (!igb_check_vf_assignment(adapter)) {
+ if (igb_vfs_are_assigned(adapter)) {
+ dev_info(&pdev->dev, "Unloading driver while VFs are assigned - VFs will not be deallocated\n");
+ } else {
pci_disable_sriov(pdev);
msleep(500);
- } else {
- dev_info(&pdev->dev, "VF(s) assigned to guests!\n");
}
kfree(adapter->vf_data);
@@ -2344,7 +2343,7 @@ static void __devinit igb_probe_vfs(struct igb_adapter * adapter)
#ifdef CONFIG_PCI_IOV
struct pci_dev *pdev = adapter->pdev;
struct e1000_hw *hw = &adapter->hw;
- int old_vfs = igb_find_enabled_vfs(adapter);
+ int old_vfs = pci_num_vf(adapter->pdev);
int i;
/* Virtualization features not supported on i210 family. */
@@ -5037,102 +5036,43 @@ static int igb_notify_dca(struct notifier_block *nb, unsigned long event,
static int igb_vf_configure(struct igb_adapter *adapter, int vf)
{
unsigned char mac_addr[ETH_ALEN];
- struct pci_dev *pdev = adapter->pdev;
- struct e1000_hw *hw = &adapter->hw;
- struct pci_dev *pvfdev;
- unsigned int device_id;
- u16 thisvf_devfn;
eth_random_addr(mac_addr);
igb_set_vf_mac(adapter, vf, mac_addr);
- switch (adapter->hw.mac.type) {
- case e1000_82576:
- device_id = IGB_82576_VF_DEV_ID;
- /* VF Stride for 82576 is 2 */
- thisvf_devfn = (pdev->devfn + 0x80 + (vf << 1)) |
- (pdev->devfn & 1);
- break;
- case e1000_i350:
- device_id = IGB_I350_VF_DEV_ID;
- /* VF Stride for I350 is 4 */
- thisvf_devfn = (pdev->devfn + 0x80 + (vf << 2)) |
- (pdev->devfn & 3);
- break;
- default:
- device_id = 0;
- thisvf_devfn = 0;
- break;
- }
-
- pvfdev = pci_get_device(hw->vendor_id, device_id, NULL);
- while (pvfdev) {
- if (pvfdev->devfn == thisvf_devfn)
- break;
- pvfdev = pci_get_device(hw->vendor_id,
- device_id, pvfdev);
- }
-
- if (pvfdev)
- adapter->vf_data[vf].vfdev = pvfdev;
- else
- dev_err(&pdev->dev,
- "Couldn't find pci dev ptr for VF %4.4x\n",
- thisvf_devfn);
- return pvfdev != NULL;
+ return 0;
}
-static int igb_find_enabled_vfs(struct igb_adapter *adapter)
+static bool igb_vfs_are_assigned(struct igb_adapter *adapter)
{
- struct e1000_hw *hw = &adapter->hw;
struct pci_dev *pdev = adapter->pdev;
- struct pci_dev *pvfdev;
- u16 vf_devfn = 0;
- u16 vf_stride;
- unsigned int device_id;
- int vfs_found = 0;
+ struct pci_dev *vfdev;
+ int dev_id;
switch (adapter->hw.mac.type) {
case e1000_82576:
- device_id = IGB_82576_VF_DEV_ID;
- /* VF Stride for 82576 is 2 */
- vf_stride = 2;
+ dev_id = IGB_82576_VF_DEV_ID;
break;
case e1000_i350:
- device_id = IGB_I350_VF_DEV_ID;
- /* VF Stride for I350 is 4 */
- vf_stride = 4;
+ dev_id = IGB_I350_VF_DEV_ID;
break;
default:
- device_id = 0;
- vf_stride = 0;
- break;
- }
-
- vf_devfn = pdev->devfn + 0x80;
- pvfdev = pci_get_device(hw->vendor_id, device_id, NULL);
- while (pvfdev) {
- if (pvfdev->devfn == vf_devfn &&
- (pvfdev->bus->number >= pdev->bus->number))
- vfs_found++;
- vf_devfn += vf_stride;
- pvfdev = pci_get_device(hw->vendor_id,
- device_id, pvfdev);
+ return false;
}
- return vfs_found;
-}
-
-static int igb_check_vf_assignment(struct igb_adapter *adapter)
-{
- int i;
- for (i = 0; i < adapter->vfs_allocated_count; i++) {
- if (adapter->vf_data[i].vfdev) {
- if (adapter->vf_data[i].vfdev->dev_flags &
- PCI_DEV_FLAGS_ASSIGNED)
+ /* loop through all the VFs to see if we own any that are assigned */
+ vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, dev_id, NULL);
+ while (vfdev) {
+ /* if we don't own it we don't care */
+ if (vfdev->is_virtfn && vfdev->physfn == pdev) {
+ /* if it is assigned we cannot release it */
+ if (vfdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED)
return true;
}
+
+ vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, dev_id, vfdev);
}
+
return false;
}
--
1.7.11.4
^ permalink raw reply related
* [net-next 2/7] igb: Fix stats output on i210/i211 parts.
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
To: davem; +Cc: Carolyn Wyborny, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Carolyn Wyborny <carolyn.wyborny@intel.com>
Due to a hardware issue, on i210 and i211 parts, the TNCRS statistic
provides an invalid value. This patch changes the update stats function
to increment the stat only for non-i210/i211 parts.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 0730096..60cf3eb 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4776,7 +4776,11 @@ void igb_update_stats(struct igb_adapter *adapter,
reg = rd32(E1000_CTRL_EXT);
if (!(reg & E1000_CTRL_EXT_LINK_MODE_MASK)) {
adapter->stats.rxerrc += rd32(E1000_RXERRC);
- adapter->stats.tncrs += rd32(E1000_TNCRS);
+
+ /* this stat has invalid values on i210/i211 */
+ if ((hw->mac.type != e1000_i210) &&
+ (hw->mac.type != e1000_i211))
+ adapter->stats.tncrs += rd32(E1000_TNCRS);
}
adapter->stats.tsctc += rd32(E1000_TSCTC);
--
1.7.11.4
^ permalink raw reply related
* [net-next 3/7] igb: Remove logic that was doing NUMA pseudo-aware allocations
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This change removes the code that was doing the NUMA allocations for the
q_vectors, rings, and ring resources. The problem is the logic used assumed
that the NUMA nodes were always interleved and that is not always the case.
At some point I hope to add this functionality back in a more controlled
manner in the future.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 3 -
drivers/net/ethernet/intel/igb/igb_main.c | 95 +++++--------------------------
2 files changed, 13 insertions(+), 85 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 6f17f69..9cad058 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -213,7 +213,6 @@ struct igb_q_vector {
struct igb_ring_container rx, tx;
struct napi_struct napi;
- int numa_node;
u16 itr_val;
u8 set_itr;
@@ -258,7 +257,6 @@ struct igb_ring {
};
/* Items past this point are only used during ring alloc / free */
dma_addr_t dma; /* phys address of the ring */
- int numa_node; /* node to alloc ring memory on */
};
enum e1000_ring_flags_t {
@@ -373,7 +371,6 @@ struct igb_adapter {
int vf_rate_link_speed;
u32 rss_queues;
u32 wvbr;
- int node;
u32 *shadow_vfta;
#ifdef CONFIG_IGB_PTP
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 60cf3eb..c9997d8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -682,52 +682,29 @@ static int igb_alloc_queues(struct igb_adapter *adapter)
{
struct igb_ring *ring;
int i;
- int orig_node = adapter->node;
for (i = 0; i < adapter->num_tx_queues; i++) {
- if (orig_node == -1) {
- int cur_node = next_online_node(adapter->node);
- if (cur_node == MAX_NUMNODES)
- cur_node = first_online_node;
- adapter->node = cur_node;
- }
- ring = kzalloc_node(sizeof(struct igb_ring), GFP_KERNEL,
- adapter->node);
- if (!ring)
- ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
+ ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
if (!ring)
goto err;
ring->count = adapter->tx_ring_count;
ring->queue_index = i;
ring->dev = &adapter->pdev->dev;
ring->netdev = adapter->netdev;
- ring->numa_node = adapter->node;
/* For 82575, context index must be unique per ring. */
if (adapter->hw.mac.type == e1000_82575)
set_bit(IGB_RING_FLAG_TX_CTX_IDX, &ring->flags);
adapter->tx_ring[i] = ring;
}
- /* Restore the adapter's original node */
- adapter->node = orig_node;
for (i = 0; i < adapter->num_rx_queues; i++) {
- if (orig_node == -1) {
- int cur_node = next_online_node(adapter->node);
- if (cur_node == MAX_NUMNODES)
- cur_node = first_online_node;
- adapter->node = cur_node;
- }
- ring = kzalloc_node(sizeof(struct igb_ring), GFP_KERNEL,
- adapter->node);
- if (!ring)
- ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
+ ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
if (!ring)
goto err;
ring->count = adapter->rx_ring_count;
ring->queue_index = i;
ring->dev = &adapter->pdev->dev;
ring->netdev = adapter->netdev;
- ring->numa_node = adapter->node;
/* set flag indicating ring supports SCTP checksum offload */
if (adapter->hw.mac.type >= e1000_82576)
set_bit(IGB_RING_FLAG_RX_SCTP_CSUM, &ring->flags);
@@ -741,16 +718,12 @@ static int igb_alloc_queues(struct igb_adapter *adapter)
adapter->rx_ring[i] = ring;
}
- /* Restore the adapter's original node */
- adapter->node = orig_node;
igb_cache_ring_register(adapter);
return 0;
err:
- /* Restore the adapter's original node */
- adapter->node = orig_node;
igb_free_queues(adapter);
return -ENOMEM;
@@ -1116,24 +1089,10 @@ static int igb_alloc_q_vectors(struct igb_adapter *adapter)
struct igb_q_vector *q_vector;
struct e1000_hw *hw = &adapter->hw;
int v_idx;
- int orig_node = adapter->node;
for (v_idx = 0; v_idx < adapter->num_q_vectors; v_idx++) {
- if ((adapter->num_q_vectors == (adapter->num_rx_queues +
- adapter->num_tx_queues)) &&
- (adapter->num_rx_queues == v_idx))
- adapter->node = orig_node;
- if (orig_node == -1) {
- int cur_node = next_online_node(adapter->node);
- if (cur_node == MAX_NUMNODES)
- cur_node = first_online_node;
- adapter->node = cur_node;
- }
- q_vector = kzalloc_node(sizeof(struct igb_q_vector), GFP_KERNEL,
- adapter->node);
- if (!q_vector)
- q_vector = kzalloc(sizeof(struct igb_q_vector),
- GFP_KERNEL);
+ q_vector = kzalloc(sizeof(struct igb_q_vector),
+ GFP_KERNEL);
if (!q_vector)
goto err_out;
q_vector->adapter = adapter;
@@ -1142,14 +1101,10 @@ static int igb_alloc_q_vectors(struct igb_adapter *adapter)
netif_napi_add(adapter->netdev, &q_vector->napi, igb_poll, 64);
adapter->q_vector[v_idx] = q_vector;
}
- /* Restore the adapter's original node */
- adapter->node = orig_node;
return 0;
err_out:
- /* Restore the adapter's original node */
- adapter->node = orig_node;
igb_free_q_vectors(adapter);
return -ENOMEM;
}
@@ -2423,8 +2378,6 @@ static int __devinit igb_sw_init(struct igb_adapter *adapter)
VLAN_HLEN;
adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
- adapter->node = -1;
-
spin_lock_init(&adapter->stats64_lock);
#ifdef CONFIG_PCI_IOV
switch (hw->mac.type) {
@@ -2671,13 +2624,11 @@ static int igb_close(struct net_device *netdev)
int igb_setup_tx_resources(struct igb_ring *tx_ring)
{
struct device *dev = tx_ring->dev;
- int orig_node = dev_to_node(dev);
int size;
size = sizeof(struct igb_tx_buffer) * tx_ring->count;
- tx_ring->tx_buffer_info = vzalloc_node(size, tx_ring->numa_node);
- if (!tx_ring->tx_buffer_info)
- tx_ring->tx_buffer_info = vzalloc(size);
+
+ tx_ring->tx_buffer_info = vzalloc(size);
if (!tx_ring->tx_buffer_info)
goto err;
@@ -2685,18 +2636,10 @@ int igb_setup_tx_resources(struct igb_ring *tx_ring)
tx_ring->size = tx_ring->count * sizeof(union e1000_adv_tx_desc);
tx_ring->size = ALIGN(tx_ring->size, 4096);
- set_dev_node(dev, tx_ring->numa_node);
tx_ring->desc = dma_alloc_coherent(dev,
tx_ring->size,
&tx_ring->dma,
GFP_KERNEL);
- set_dev_node(dev, orig_node);
- if (!tx_ring->desc)
- tx_ring->desc = dma_alloc_coherent(dev,
- tx_ring->size,
- &tx_ring->dma,
- GFP_KERNEL);
-
if (!tx_ring->desc)
goto err;
@@ -2707,8 +2650,8 @@ int igb_setup_tx_resources(struct igb_ring *tx_ring)
err:
vfree(tx_ring->tx_buffer_info);
- dev_err(dev,
- "Unable to allocate memory for the transmit descriptor ring\n");
+ tx_ring->tx_buffer_info = NULL;
+ dev_err(dev, "Unable to allocate memory for the Tx descriptor ring\n");
return -ENOMEM;
}
@@ -2825,34 +2768,23 @@ static void igb_configure_tx(struct igb_adapter *adapter)
int igb_setup_rx_resources(struct igb_ring *rx_ring)
{
struct device *dev = rx_ring->dev;
- int orig_node = dev_to_node(dev);
- int size, desc_len;
+ int size;
size = sizeof(struct igb_rx_buffer) * rx_ring->count;
- rx_ring->rx_buffer_info = vzalloc_node(size, rx_ring->numa_node);
- if (!rx_ring->rx_buffer_info)
- rx_ring->rx_buffer_info = vzalloc(size);
+
+ rx_ring->rx_buffer_info = vzalloc(size);
if (!rx_ring->rx_buffer_info)
goto err;
- desc_len = sizeof(union e1000_adv_rx_desc);
/* Round up to nearest 4K */
- rx_ring->size = rx_ring->count * desc_len;
+ rx_ring->size = rx_ring->count * sizeof(union e1000_adv_rx_desc);
rx_ring->size = ALIGN(rx_ring->size, 4096);
- set_dev_node(dev, rx_ring->numa_node);
rx_ring->desc = dma_alloc_coherent(dev,
rx_ring->size,
&rx_ring->dma,
GFP_KERNEL);
- set_dev_node(dev, orig_node);
- if (!rx_ring->desc)
- rx_ring->desc = dma_alloc_coherent(dev,
- rx_ring->size,
- &rx_ring->dma,
- GFP_KERNEL);
-
if (!rx_ring->desc)
goto err;
@@ -2864,8 +2796,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
err:
vfree(rx_ring->rx_buffer_info);
rx_ring->rx_buffer_info = NULL;
- dev_err(dev, "Unable to allocate memory for the receive descriptor"
- " ring\n");
+ dev_err(dev, "Unable to allocate memory for the Rx descriptor ring\n");
return -ENOMEM;
}
--
1.7.11.4
^ permalink raw reply related
* [net-next 4/7] igb: Change Tx cleanup loop to do/while instead of for
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This change makes it so that Tx cleanup is done in a do/while loop instead
of a for loop. The main motivation behind this is the fact that we should
never be invoked with a budget less than 1 so we can skip checking the
budget before processing the first descriptor.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c9997d8..91f542c 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5690,7 +5690,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
struct igb_adapter *adapter = q_vector->adapter;
struct igb_ring *tx_ring = q_vector->tx.ring;
struct igb_tx_buffer *tx_buffer;
- union e1000_adv_tx_desc *tx_desc, *eop_desc;
+ union e1000_adv_tx_desc *tx_desc;
unsigned int total_bytes = 0, total_packets = 0;
unsigned int budget = q_vector->tx.work_limit;
unsigned int i = tx_ring->next_to_clean;
@@ -5702,16 +5702,16 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
tx_desc = IGB_TX_DESC(tx_ring, i);
i -= tx_ring->count;
- for (; budget; budget--) {
- eop_desc = tx_buffer->next_to_watch;
-
- /* prevent any other reads prior to eop_desc */
- rmb();
+ do {
+ union e1000_adv_tx_desc *eop_desc = tx_buffer->next_to_watch;
/* if next_to_watch is not set then there is no work pending */
if (!eop_desc)
break;
+ /* prevent any other reads prior to eop_desc */
+ rmb();
+
/* if DD is not set pending work has not been completed */
if (!(eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)))
break;
@@ -5767,7 +5767,13 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
tx_buffer = tx_ring->tx_buffer_info;
tx_desc = IGB_TX_DESC(tx_ring, 0);
}
- }
+
+ /* issue prefetch for next Tx descriptor */
+ prefetch(tx_desc);
+
+ /* update budget accounting */
+ budget--;
+ } while (likely(budget));
netdev_tx_completed_queue(txring_txq(tx_ring),
total_packets, total_bytes);
@@ -5783,12 +5789,10 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
struct e1000_hw *hw = &adapter->hw;
- eop_desc = tx_buffer->next_to_watch;
-
/* Detect a transmit hang in hardware, this serializes the
* check with the clearing of time_stamp and movement of i */
clear_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags);
- if (eop_desc &&
+ if (tx_buffer->next_to_watch &&
time_after(jiffies, tx_buffer->time_stamp +
(adapter->tx_timeout_factor * HZ)) &&
!(rd32(E1000_STATUS) & E1000_STATUS_TXOFF)) {
@@ -5812,9 +5816,9 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
tx_ring->next_to_use,
tx_ring->next_to_clean,
tx_buffer->time_stamp,
- eop_desc,
+ tx_buffer->next_to_watch,
jiffies,
- eop_desc->wb.status);
+ tx_buffer->next_to_watch->wb.status);
netif_stop_subqueue(tx_ring->netdev,
tx_ring->queue_index);
--
1.7.11.4
^ permalink raw reply related
* [net-next 5/7] igb: Change how we populate the RSS indirection table
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch cleans up our RSS indirection table configuration so that we
generate the same table regardless of CPU endianness. In addition it
changes the table setup so that instead of doing a modulo based setup it is
instead a divisor based setup. The advantage to this is that we should be
able to take the Rx hash and compute the Rx queue with very little CPU
overhead if needed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 55 +++++++++++++++----------------
1 file changed, 26 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 91f542c..27688d9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2834,11 +2834,7 @@ static void igb_setup_mrqc(struct igb_adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
u32 mrqc, rxcsum;
- u32 j, num_rx_queues, shift = 0, shift2 = 0;
- union e1000_reta {
- u32 dword;
- u8 bytes[4];
- } reta;
+ u32 j, num_rx_queues, shift = 0;
static const u8 rsshash[40] = {
0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, 0x41, 0x67,
0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, 0xd0, 0xca, 0x2b, 0xcb,
@@ -2856,35 +2852,36 @@ static void igb_setup_mrqc(struct igb_adapter *adapter)
num_rx_queues = adapter->rss_queues;
- if (adapter->vfs_allocated_count) {
- /* 82575 and 82576 supports 2 RSS queues for VMDq */
- switch (hw->mac.type) {
- case e1000_i350:
- case e1000_82580:
- num_rx_queues = 1;
- shift = 0;
- break;
- case e1000_82576:
+ switch (hw->mac.type) {
+ case e1000_82575:
+ shift = 6;
+ break;
+ case e1000_82576:
+ /* 82576 supports 2 RSS queues for SR-IOV */
+ if (adapter->vfs_allocated_count) {
shift = 3;
num_rx_queues = 2;
- break;
- case e1000_82575:
- shift = 2;
- shift2 = 6;
- default:
- break;
}
- } else {
- if (hw->mac.type == e1000_82575)
- shift = 6;
+ break;
+ default:
+ break;
}
- for (j = 0; j < (32 * 4); j++) {
- reta.bytes[j & 3] = (j % num_rx_queues) << shift;
- if (shift2)
- reta.bytes[j & 3] |= num_rx_queues << shift2;
- if ((j & 3) == 3)
- wr32(E1000_RETA(j >> 2), reta.dword);
+ /*
+ * Populate the indirection table 4 entries at a time. To do this
+ * we are generating the results for n and n+2 and then interleaving
+ * those with the results with n+1 and n+3.
+ */
+ for (j = 0; j < 32; j++) {
+ /* first pass generates n and n+2 */
+ u32 base = ((j * 0x00040004) + 0x00020000) * num_rx_queues;
+ u32 reta = (base & 0x07800780) >> (7 - shift);
+
+ /* second pass generates n+1 and n+3 */
+ base += 0x00010001 * num_rx_queues;
+ reta |= (base & 0x07800780) << (1 + shift);
+
+ wr32(E1000_RETA(j), reta);
}
/*
--
1.7.11.4
^ permalink raw reply related
* [net-next 6/7] igb: Simplify how we populate the RSS key
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Instead of storing the RSS key as a character array we can simplify the
configuration by making it a u32 array. This allows us to just write one
value per register without any unnecessary operations to construct the
value.
This change will produce the same exact key, the only difference is that I
translated the u8 array to a u32 array which will be correctly ordered on
writes to hardware by the cpu_to_le32 operations that are built into the
writel calls.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 27688d9..db6e456 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2835,20 +2835,14 @@ static void igb_setup_mrqc(struct igb_adapter *adapter)
struct e1000_hw *hw = &adapter->hw;
u32 mrqc, rxcsum;
u32 j, num_rx_queues, shift = 0;
- static const u8 rsshash[40] = {
- 0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, 0x41, 0x67,
- 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, 0xd0, 0xca, 0x2b, 0xcb,
- 0xae, 0x7b, 0x30, 0xb4, 0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30,
- 0xf2, 0x0c, 0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa };
+ static const u32 rsskey[10] = { 0xDA565A6D, 0xC20E5B25, 0x3D256741,
+ 0xB08FA343, 0xCB2BCAD0, 0xB4307BAE,
+ 0xA32DCB77, 0x0CF23080, 0x3BB7426A,
+ 0xFA01ACBE };
/* Fill out hash function seeds */
- for (j = 0; j < 10; j++) {
- u32 rsskey = rsshash[(j * 4)];
- rsskey |= rsshash[(j * 4) + 1] << 8;
- rsskey |= rsshash[(j * 4) + 2] << 16;
- rsskey |= rsshash[(j * 4) + 3] << 24;
- array_wr32(E1000_RSSRK(0), j, rsskey);
- }
+ for (j = 0; j < 10; j++)
+ wr32(E1000_RSSRK(j), rsskey[j]);
num_rx_queues = adapter->rss_queues;
--
1.7.11.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox