* [net-next 4/6] ixgbe: ptp code cleanup
From: Jeff Kirsher @ 2012-06-14 10:18 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1339669089-27955-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
This patch fixes two minor nits from Richard Cochran. The first is a case of
ambitious line wrapping that wasn't necessary. The second is to re-order the
flag checks for PPS support. Previously, the hardware test was done first, and
the interrupt flag test was done second. Now, test the interrupt flag and use
the unlikely macro.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 +++-----
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 13 +++++++------
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 17ad6a3..1675b66 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -790,12 +790,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
total_packets += tx_buffer->gso_segs;
#ifdef CONFIG_IXGBE_PTP
- if (unlikely(tx_buffer->tx_flags &
- IXGBE_TX_FLAGS_TSTAMP))
- ixgbe_ptp_tx_hwtstamp(q_vector,
- tx_buffer->skb);
-
+ if (unlikely(tx_buffer->tx_flags & IXGBE_TX_FLAGS_TSTAMP))
+ ixgbe_ptp_tx_hwtstamp(q_vector, tx_buffer->skb);
#endif
+
/* free the skb */
dev_kfree_skb_any(tx_buffer->skb);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index ddc6a4d..174f41f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -307,13 +307,14 @@ void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr)
!(adapter->flags2 & IXGBE_FLAG2_PTP_PPS_ENABLED))
return;
- switch (hw->mac.type) {
- case ixgbe_mac_X540:
- if (eicr & IXGBE_EICR_TIMESYNC)
+ if (unlikely(eicr & IXGBE_EICR_TIMESYNC)) {
+ switch (hw->mac.type) {
+ case ixgbe_mac_X540:
ptp_clock_event(adapter->ptp_clock, &event);
- break;
- default:
- break;
+ break;
+ default:
+ break;
+ }
}
}
--
1.7.10.2
^ permalink raw reply related
* [net-next 5/6] ixgbe: PTP Fix hwtstamp mode settings
From: Jeff Kirsher @ 2012-06-14 10:18 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1339669089-27955-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
When enabling the hwtstamp mode for Rx timestamping the V2 ptp event type
specific modes (Delay Request and Sync) have been rolled into the V2 all event
packet modes, in order to more accurately represent what hardware is doing.
Hardware always timestamps the Path delay packets when a V2 mode is selected,
regardless of what type was selected (in order to always support Path delay
mode). However this means the user selected modes of timestamping only Sync or
Delay Request is not truly supported. This patch correctly sets the mode for
the hwtstamp config and returns to the user that all V2 event packets will be
timestamped.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 174f41f..5ed8cff 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -540,6 +540,11 @@ void ixgbe_ptp_rx_hwtstamp(struct ixgbe_q_vector *q_vector,
* type has to be specified. Matching the kind of event packet is
* not supported, with the exception of "all V2 events regardless of
* level 2 or 4".
+ *
+ * Since hardware always timestamps Path delay packets when timestamping V2
+ * packets, regardless of the type specified in the register, only use V2
+ * Event mode. This more accurately tells the user what the hardware is going
+ * to do anyways.
*/
int ixgbe_ptp_hwtstamp_ioctl(struct ixgbe_adapter *adapter,
struct ifreq *ifr, int cmd)
@@ -583,27 +588,15 @@ int ixgbe_ptp_hwtstamp_ioctl(struct ixgbe_adapter *adapter,
tsync_rx_mtrl = IXGBE_RXMTRL_V1_DELAY_REQ_MSG;
is_l4 = true;
break;
+ case HWTSTAMP_FILTER_PTP_V2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
case HWTSTAMP_FILTER_PTP_V2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
- tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_L2_L4_V2;
- tsync_rx_mtrl = IXGBE_RXMTRL_V2_SYNC_MSG;
- is_l2 = true;
- is_l4 = true;
- config.rx_filter = HWTSTAMP_FILTER_SOME;
- break;
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
- tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_L2_L4_V2;
- tsync_rx_mtrl = IXGBE_RXMTRL_V2_DELAY_REQ_MSG;
- is_l2 = true;
- is_l4 = true;
- config.rx_filter = HWTSTAMP_FILTER_SOME;
- break;
- case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
- case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
- case HWTSTAMP_FILTER_PTP_V2_EVENT:
tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_EVENT_V2;
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
is_l2 = true;
--
1.7.10.2
^ permalink raw reply related
* [net-next 6/6] ixgbe: Check PTP Rx timestamps via BPF filter
From: Jeff Kirsher @ 2012-06-14 10:18 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1339669089-27955-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
This patch fixes a potential Rx timestamp deadlock that causes the Rx
timestamping to stall indefinitely. The issue could occur when a PTP packet is
timestamped by hardware but never reaches the Rx queue. In order to prevent a
permanent loss of timestamping, the RXSTMP(L/H) registers have to be read to
unlock them. (This used to only occur when a packet that was timestamped
reached the software.) However the registers can't be read early otherwise
there is no way to correlate them to the packet.
This patch introduces a filter function which can be used to determine if a
packet should have been timestamped. Supplied with the filter setup by the
hwtstamp ioctl, check to make sure the PTP protocol and message type match the
expected values. If so, then read the timestamp registers (to free them.) At
this point check the descriptor bit, if the bit is set then we know this
packet correlates to the timestamp stored in the RXTSTAMP registers.
Otherwise, assume that packet was dropped by the hardware, and ignore this
timestamp value. However, we have at least unlocked the rxtstamp registers for
future timestamping.
Due to the way the driver handles skb data, it cannot be directly accessed. In
order to work around this, a copy of the skb data into a linear buffer is
made. From this buffer it becomes possible to read the data correctly
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 113 ++++++++++++++++++++++---
3 files changed, 104 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 3ef3c52..41f9f6e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -561,6 +561,7 @@ struct ixgbe_adapter {
spinlock_t tmreg_lock;
struct cyclecounter cc;
struct timecounter tc;
+ int rx_hwtstamp_filter;
u32 base_incval;
u32 cycle_speed;
#endif /* CONFIG_IXGBE_PTP */
@@ -718,6 +719,7 @@ extern void ixgbe_ptp_overflow_check(struct ixgbe_adapter *adapter);
extern void ixgbe_ptp_tx_hwtstamp(struct ixgbe_q_vector *q_vector,
struct sk_buff *skb);
extern void ixgbe_ptp_rx_hwtstamp(struct ixgbe_q_vector *q_vector,
+ union ixgbe_adv_rx_desc *rx_desc,
struct sk_buff *skb);
extern int ixgbe_ptp_hwtstamp_ioctl(struct ixgbe_adapter *adapter,
struct ifreq *ifr, int cmd);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1675b66..b0ddfd4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1397,8 +1397,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
ixgbe_rx_checksum(rx_ring, rx_desc, skb);
#ifdef CONFIG_IXGBE_PTP
- if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_STAT_TS))
- ixgbe_ptp_rx_hwtstamp(rx_ring->q_vector, skb);
+ ixgbe_ptp_rx_hwtstamp(rx_ring->q_vector, rx_desc, skb);
#endif
if ((dev->features & NETIF_F_HW_VLAN_RX) &&
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 5ed8cff..cb7d1b2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -26,6 +26,7 @@
*******************************************************************************/
#include "ixgbe.h"
#include <linux/export.h>
+#include <linux/ptp_classify.h>
/*
* The 82599 and the X540 do not have true 64bit nanosecond scale
@@ -100,6 +101,10 @@
#define NSECS_PER_SEC 1000000000ULL
#endif
+static struct sock_filter ptp_filter[] = {
+ PTP_FILTER
+};
+
/**
* ixgbe_ptp_read - read raw cycle counter (to be used by time counter)
* @cc - the cyclecounter structure
@@ -426,6 +431,68 @@ void ixgbe_ptp_overflow_check(struct ixgbe_adapter *adapter)
}
/**
+ * ixgbe_ptp_match - determine if this skb matches a ptp packet
+ * @skb: pointer to the skb
+ * @hwtstamp: pointer to the hwtstamp_config to check
+ *
+ * Determine whether the skb should have been timestamped, assuming the
+ * hwtstamp was set via the hwtstamp ioctl. Returns non-zero when the packet
+ * should have a timestamp waiting in the registers, and 0 otherwise.
+ *
+ * V1 packets have to check the version type to determine whether they are
+ * correct. However, we can't directly access the data because it might be
+ * fragmented in the SKB, in paged memory. In order to work around this, we
+ * use skb_copy_bits which will properly copy the data whether it is in the
+ * paged memory fragments or not. We have to copy the IP header as well as the
+ * message type.
+ */
+static int ixgbe_ptp_match(struct sk_buff *skb, int rx_filter)
+{
+ struct iphdr iph;
+ u8 msgtype;
+ unsigned int type, offset;
+
+ if (rx_filter == HWTSTAMP_FILTER_NONE)
+ return 0;
+
+ type = sk_run_filter(skb, ptp_filter);
+
+ if (likely(rx_filter == HWTSTAMP_FILTER_PTP_V2_EVENT))
+ return type & PTP_CLASS_V2;
+
+ /* For the remaining cases actually check message type */
+ switch (type) {
+ case PTP_CLASS_V1_IPV4:
+ skb_copy_bits(skb, OFF_IHL, &iph, sizeof(iph));
+ offset = ETH_HLEN + (iph.ihl << 2) + UDP_HLEN + OFF_PTP_CONTROL;
+ break;
+ case PTP_CLASS_V1_IPV6:
+ offset = OFF_PTP6 + OFF_PTP_CONTROL;
+ break;
+ default:
+ /* other cases invalid or handled above */
+ return 0;
+ }
+
+ /* Make sure our buffer is long enough */
+ if (skb->len < offset)
+ return 0;
+
+ skb_copy_bits(skb, offset, &msgtype, sizeof(msgtype));
+
+ switch (rx_filter) {
+ case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+ return (msgtype == IXGBE_RXMTRL_V1_SYNC_MSG);
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+ return (msgtype == IXGBE_RXMTRL_V1_DELAY_REQ_MSG);
+ break;
+ default:
+ return 0;
+ }
+}
+
+/**
* ixgbe_ptp_tx_hwtstamp - utility function which checks for TX time stamp
* @q_vector: structure containing interrupt and ring information
* @skb: particular skb to send timestamp with
@@ -474,6 +541,7 @@ void ixgbe_ptp_tx_hwtstamp(struct ixgbe_q_vector *q_vector,
/**
* ixgbe_ptp_rx_hwtstamp - utility function which checks for RX time stamp
* @q_vector: structure containing interrupt and ring information
+ * @rx_desc: the rx descriptor
* @skb: particular skb to send timestamp with
*
* if the timestamp is valid, we convert it into the timecounter ns
@@ -481,6 +549,7 @@ void ixgbe_ptp_tx_hwtstamp(struct ixgbe_q_vector *q_vector,
* is passed up the network stack
*/
void ixgbe_ptp_rx_hwtstamp(struct ixgbe_q_vector *q_vector,
+ union ixgbe_adv_rx_desc *rx_desc,
struct sk_buff *skb)
{
struct ixgbe_adapter *adapter;
@@ -498,21 +567,33 @@ void ixgbe_ptp_rx_hwtstamp(struct ixgbe_q_vector *q_vector,
hw = &adapter->hw;
tsyncrxctl = IXGBE_READ_REG(hw, IXGBE_TSYNCRXCTL);
+
+ /* Check if we have a valid timestamp and make sure the skb should
+ * have been timestamped */
+ if (likely(!(tsyncrxctl & IXGBE_TSYNCRXCTL_VALID) ||
+ !ixgbe_ptp_match(skb, adapter->rx_hwtstamp_filter)))
+ return;
+
+ /*
+ * Always read the registers, in order to clear a possible fault
+ * because of stagnant RX timestamp values for a packet that never
+ * reached the queue.
+ */
regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPL);
regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPH) << 32;
/*
- * If this bit is set, then the RX registers contain the time stamp. No
- * other packet will be time stamped until we read these registers, so
- * read the registers to make them available again. Because only one
- * packet can be time stamped at a time, we know that the register
- * values must belong to this one here and therefore we don't need to
- * compare any of the additional attributes stored for it.
+ * If the timestamp bit is set in the packet's descriptor, we know the
+ * timestamp belongs to this packet. No other packet can be
+ * timestamped until the registers for timestamping have been read.
+ * Therefor only one packet with this bit can be in the queue at a
+ * time, and the rx timestamp values that were in the registers belong
+ * to this packet.
*
* If nothing went wrong, then it should have a skb_shared_tx that we
* can turn into a skb_shared_hwtstamps.
*/
- if (!(tsyncrxctl & IXGBE_TSYNCRXCTL_VALID))
+ if (unlikely(!ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_STAT_TS)))
return;
spin_lock_irqsave(&adapter->tmreg_lock, flags);
@@ -598,19 +679,20 @@ int ixgbe_ptp_hwtstamp_ioctl(struct ixgbe_adapter *adapter,
case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_EVENT_V2;
- config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
is_l2 = true;
is_l4 = true;
+ config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
break;
case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
case HWTSTAMP_FILTER_ALL:
default:
/*
- * register RXMTRL must be set, therefore it is not
- * possible to time stamp both V1 Sync and Delay_Req messages
- * and hardware does not support timestamping all packets
- * => return error
+ * register RXMTRL must be set in order to do V1 packets,
+ * therefore it is not possible to time stamp both V1 Sync and
+ * Delay_Req messages and hardware does not support
+ * timestamping all packets => return error
*/
+ config.rx_filter = HWTSTAMP_FILTER_NONE;
return -ERANGE;
}
@@ -620,6 +702,9 @@ int ixgbe_ptp_hwtstamp_ioctl(struct ixgbe_adapter *adapter,
return 0;
}
+ /* Store filter value for later use */
+ adapter->rx_hwtstamp_filter = config.rx_filter;
+
/* define ethertype filter for timestamped packets */
if (is_l2)
IXGBE_WRITE_REG(hw, IXGBE_ETQF(3),
@@ -855,6 +940,10 @@ void ixgbe_ptp_init(struct ixgbe_adapter *adapter)
return;
}
+ /* initialize the ptp filter */
+ if (ptp_filter_init(ptp_filter, ARRAY_SIZE(ptp_filter)))
+ e_dev_warn("ptp_filter_init failed\n");
+
spin_lock_init(&adapter->tmreg_lock);
ixgbe_ptp_start_cyclecounter(adapter);
--
1.7.10.2
^ permalink raw reply related
* Re: Regression on TX throughput when using bonding
From: David Miller @ 2012-06-14 10:31 UTC (permalink / raw)
To: eric.dumazet; +Cc: jhautbois, netdev
In-Reply-To: <1339668471.22704.714.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 14 Jun 2012 12:07:51 +0200
> We should have a way to properly park packets in Qdiscs, and only do the
> orphaning once skb given to real device for 'immediate or so'
> transmission.
Ok.
^ permalink raw reply
* Re: PPPoE performance regression
From: David Woodhouse @ 2012-06-14 10:35 UTC (permalink / raw)
To: Paul Mackerras
Cc: Nathan Williams, Karl Hiramoto, David S. Miller, netdev,
John Crispin, Benjamin LaHaise
In-Reply-To: <20120614061809.GA10453@drongo>
[-- Attachment #1: Type: text/plain, Size: 1065 bytes --]
On Thu, 2012-06-14 at 16:18 +1000, Paul Mackerras wrote:
> Umm, how does ppp_output_wakeup() actually get called?
In fact I'm thinking of eliminating ppp_output_wakeup() in the general
case.
The idea (and it is *just* an idea so far) is to introduce
ppp_sent_queue(), ppp_completed_queue() and ppp_reset_queue()¹ functions
which take a ppp_chan and map onto the corresponding netdev_* functions
for BQL.
Having done that, we should be able to trigger the wakeup automatically
from the ppp_completed_queue() function, and there's no need for channel
drivers to call ppp_output_wakeup() directly. Not only do we get proper
holistic queue length management, we also move the flow control into PPP
and get rid of the horrid dependency on internal PPP locking that's
documented in commit 9d02daf75², and which we'd have to address on the
PPPoX side too.
And the overhead that Ben is concerned about should be fairly minimal.
--
dwmw2
¹ For ppp_reset_queue in the mlppp case it gets moderately non-trivial.
² Look for 'downl'. Ick.
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply
* [PATCH 05/10] netfilter: merge tcpv[4,6]_net_init into tcp_net_init
From: Gao feng @ 2012-06-14 10:07 UTC (permalink / raw)
To: pablo; +Cc: netdev, netfilter-devel, Gao feng
In-Reply-To: <1339668445-23848-1-git-send-email-gaofeng@cn.fujitsu.com>
merge tcpv4_net_init and tcpv6_net_init into tcp_net_init to
reduce the redundancy codes.
and use nf_proto_net.users to identify if it's the first time
we use the nf_proto_net. when it's the first time,we will
initialized it.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
net/netfilter/nf_conntrack_proto_tcp.c | 57 ++++++++------------------------
1 files changed, 14 insertions(+), 43 deletions(-)
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 6db9d3c..e3d5427 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1593,18 +1593,14 @@ static int tcp_kmemdup_compat_sysctl_table(struct nf_proto_net *pn)
return 0;
}
-static int tcpv4_init_net(struct net *net, u_int16_t proto)
+static int tcp_init_net(struct net *net, u_int16_t proto)
{
- int i;
int ret = 0;
struct nf_tcp_net *tn = tcp_pernet(net);
struct nf_proto_net *pn = (struct nf_proto_net *)tn;
-#ifdef CONFIG_SYSCTL
- if (!pn->ctl_table) {
-#else
- if (!pn->users++) {
-#endif
+ if (!pn->users) {
+ int i = 0;
for (i = 0; i < TCP_CONNTRACK_TIMEOUT_MAX; i++)
tn->timeouts[i] = tcp_timeouts[i];
@@ -1613,45 +1609,20 @@ static int tcpv4_init_net(struct net *net, u_int16_t proto)
tn->tcp_max_retrans = nf_ct_tcp_max_retrans;
}
- ret = tcp_kmemdup_compat_sysctl_table(pn);
+ if (proto == AF_INET) {
+ ret = tcp_kmemdup_compat_sysctl_table(pn);
+ if (ret < 0)
+ return ret;
- if (ret < 0)
- return ret;
+ ret = tcp_kmemdup_sysctl_table(pn);
+ if (ret < 0)
+ nf_ct_kfree_compat_sysctl_table(pn);
+ } else
+ ret = tcp_kmemdup_sysctl_table(pn);
- ret = tcp_kmemdup_sysctl_table(pn);
-
-#ifdef CONFIG_SYSCTL
-#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
- if (ret < 0) {
- kfree(pn->ctl_compat_table);
- pn->ctl_compat_table = NULL;
- }
-#endif
-#endif
return ret;
}
-static int tcpv6_init_net(struct net *net, u_int16_t proto)
-{
- int i;
- struct nf_tcp_net *tn = tcp_pernet(net);
- struct nf_proto_net *pn = (struct nf_proto_net *)tn;
-
-#ifdef CONFIG_SYSCTL
- if (!pn->ctl_table) {
-#else
- if (!pn->users++) {
-#endif
- for (i = 0; i < TCP_CONNTRACK_TIMEOUT_MAX; i++)
- tn->timeouts[i] = tcp_timeouts[i];
- tn->tcp_loose = nf_ct_tcp_loose;
- tn->tcp_be_liberal = nf_ct_tcp_be_liberal;
- tn->tcp_max_retrans = nf_ct_tcp_max_retrans;
- }
-
- return tcp_kmemdup_sysctl_table(pn);
-}
-
struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp4 __read_mostly =
{
.l3proto = PF_INET,
@@ -1684,7 +1655,7 @@ struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp4 __read_mostly =
.nla_policy = tcp_timeout_nla_policy,
},
#endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */
- .init_net = tcpv4_init_net,
+ .init_net = tcp_init_net,
};
EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_tcp4);
@@ -1720,6 +1691,6 @@ struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp6 __read_mostly =
.nla_policy = tcp_timeout_nla_policy,
},
#endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */
- .init_net = tcpv6_init_net,
+ .init_net = tcp_init_net,
};
EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_tcp6);
--
1.7.7.6
^ permalink raw reply related
* [PATCH 04/10] netfilter: regard users as refcount for l4proto's per-net data
From: Gao feng @ 2012-06-14 10:07 UTC (permalink / raw)
To: pablo; +Cc: netdev, netfilter-devel, Gao feng
In-Reply-To: <1339668445-23848-1-git-send-email-gaofeng@cn.fujitsu.com>
Now, nf_proto_net's users is confusing.
we should regard it as the refcount for l4proto's per-net data,
because maybe there are two l4protos use the same per-net data.
so increment pn->users when nf_conntrack_l4proto_register
success, and decrement it for nf_conntrack_l4_unregister case.
because nf_conntrack_l3proto_ipv[4|6] don't use the same per-net
data,so we don't need to add a refcnt for their per-net data.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
net/netfilter/nf_conntrack_proto.c | 70 ++++++++++++++++++++++-------------
1 files changed, 44 insertions(+), 26 deletions(-)
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index c9df1b4..63f9430 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -39,16 +39,13 @@ static int
nf_ct_register_sysctl(struct net *net,
struct ctl_table_header **header,
const char *path,
- struct ctl_table *table,
- unsigned int *users)
+ struct ctl_table *table)
{
if (*header == NULL) {
*header = register_net_sysctl(net, path, table);
if (*header == NULL)
return -ENOMEM;
}
- if (users != NULL)
- (*users)++;
return 0;
}
@@ -58,7 +55,7 @@ nf_ct_unregister_sysctl(struct ctl_table_header **header,
struct ctl_table **table,
unsigned int *users)
{
- if (users != NULL && --*users > 0)
+ if (users != NULL && *users > 0)
return;
unregister_net_sysctl_table(*header);
@@ -191,8 +188,7 @@ static int nf_ct_l3proto_register_sysctl(struct net *net,
err = nf_ct_register_sysctl(net,
&in->ctl_table_header,
l3proto->ctl_table_path,
- in->ctl_table,
- NULL);
+ in->ctl_table);
if (err < 0) {
kfree(in->ctl_table);
@@ -330,20 +326,17 @@ static struct nf_proto_net *nf_ct_l4proto_net(struct net *net,
static
int nf_ct_l4proto_register_sysctl(struct net *net,
+ struct nf_proto_net *pn,
struct nf_conntrack_l4proto *l4proto)
{
int err = 0;
- struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
- if (pn == NULL)
- return 0;
#ifdef CONFIG_SYSCTL
if (pn->ctl_table != NULL) {
err = nf_ct_register_sysctl(net,
&pn->ctl_table_header,
"net/netfilter",
- pn->ctl_table,
- &pn->users);
+ pn->ctl_table);
if (err < 0) {
if (!pn->users) {
kfree(pn->ctl_table);
@@ -357,8 +350,7 @@ int nf_ct_l4proto_register_sysctl(struct net *net,
err = nf_ct_register_sysctl(net,
&pn->ctl_compat_header,
"net/ipv4/netfilter",
- pn->ctl_compat_table,
- NULL);
+ pn->ctl_compat_table);
if (err == 0)
goto out;
nf_ct_kfree_compat_sysctl_table(pn);
@@ -374,11 +366,9 @@ out:
static
void nf_ct_l4proto_unregister_sysctl(struct net *net,
+ struct nf_proto_net *pn,
struct nf_conntrack_l4proto *l4proto)
{
- struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
- if (pn == NULL)
- return;
#ifdef CONFIG_SYSCTL
if (pn->ctl_table_header != NULL)
nf_ct_unregister_sysctl(&pn->ctl_table_header,
@@ -391,8 +381,6 @@ void nf_ct_l4proto_unregister_sysctl(struct net *net,
&pn->ctl_compat_table,
NULL);
#endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
-#else
- pn->users--;
#endif /* CONFIG_SYSCTL */
}
@@ -458,22 +446,33 @@ int nf_conntrack_l4proto_register(struct net *net,
struct nf_conntrack_l4proto *l4proto)
{
int ret = 0;
+
+ struct nf_proto_net *pn = NULL;
+
if (l4proto->init_net) {
ret = l4proto->init_net(net, l4proto->l3proto);
if (ret < 0)
- return ret;
+ goto out;
}
- ret = nf_ct_l4proto_register_sysctl(net, l4proto);
+ pn = nf_ct_l4proto_net(net, l4proto);
+ if (pn == NULL)
+ goto out;
+
+ ret = nf_ct_l4proto_register_sysctl(net, pn, l4proto);
if (ret < 0)
- return ret;
+ goto out;
if (net == &init_net) {
ret = nf_conntrack_l4proto_register_net(l4proto);
- if (ret < 0)
- nf_ct_l4proto_unregister_sysctl(net, l4proto);
+ if (ret < 0) {
+ nf_ct_l4proto_unregister_sysctl(net, pn, l4proto);
+ goto out;
+ }
}
-
+ /* increase the nf_proto_net's refcnt */
+ pn->users++;
+out:
return ret;
}
EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_register);
@@ -498,10 +497,18 @@ nf_conntrack_l4proto_unregister_net(struct nf_conntrack_l4proto *l4proto)
void nf_conntrack_l4proto_unregister(struct net *net,
struct nf_conntrack_l4proto *l4proto)
{
+ struct nf_proto_net *pn = NULL;
if (net == &init_net)
nf_conntrack_l4proto_unregister_net(l4proto);
- nf_ct_l4proto_unregister_sysctl(net, l4proto);
+ pn = nf_ct_l4proto_net(net, l4proto);
+ if (pn == NULL)
+ return;
+
+ /* decrease the nf_proto_net's refcnt */
+ pn->users--;
+ nf_ct_l4proto_unregister_sysctl(net, pn, l4proto);
+
/* Remove all contrack entries for this protocol */
rtnl_lock();
nf_ct_iterate_cleanup(net, kill_l4proto, l4proto);
@@ -513,11 +520,14 @@ int nf_conntrack_proto_init(struct net *net)
{
unsigned int i;
int err;
+ struct nf_proto_net *pn = nf_ct_l4proto_net(net,
+ &nf_conntrack_l4proto_generic);
err = nf_conntrack_l4proto_generic.init_net(net,
nf_conntrack_l4proto_generic.l3proto);
if (err < 0)
return err;
err = nf_ct_l4proto_register_sysctl(net,
+ pn,
&nf_conntrack_l4proto_generic);
if (err < 0)
return err;
@@ -527,13 +537,21 @@ int nf_conntrack_proto_init(struct net *net)
rcu_assign_pointer(nf_ct_l3protos[i],
&nf_conntrack_l3proto_generic);
}
+ /* increase generic proto's nf_proto_net refcnt */
+ pn->users++;
+
return 0;
}
void nf_conntrack_proto_fini(struct net *net)
{
unsigned int i;
+ struct nf_proto_net *pn = nf_ct_l4proto_net(net,
+ &nf_conntrack_l4proto_generic);
+ /* decrease generic proto's nf_proto_net refcnt */
+ pn->users--;
nf_ct_l4proto_unregister_sysctl(net,
+ pn,
&nf_conntrack_l4proto_generic);
if (net == &init_net) {
/* free l3proto protocol tables */
--
1.7.7.6
^ permalink raw reply related
* Re: [net-next.git 1/4 (v5)] phy: add the EEE support and the way to access to the MMD registers.
From: Giuseppe CAVALLARO @ 2012-06-14 10:51 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, eric.dumazet, rayagond, davem, yuvalmin
In-Reply-To: <1339630137.2612.83.camel@bwh-desktop.uk.solarflarecom.com>
On 6/14/2012 1:28 AM, Ben Hutchings wrote:
> On Wed, 2012-06-13 at 10:01 +0200, Giuseppe CAVALLARO wrote:
>> This patch adds the support for the Energy-Efficient Ethernet (EEE)
>> to the Physical Abstraction Layer.
>> To support the EEE we have to access to the MMD registers 3.20 and
>> 7.60/61. So two new functions have been added to read/write the MMD
>> registers (clause 45).
>>
>> An Ethernet driver (I tested the stmmac) can invoke the phy_init_eee to properly
>> check if the EEE is supported by the PHYs and it can also set the clock
>> stop enable bit in the 3.0 register.
>> The phy_get_eee_err can be used for reporting the number of time where
>> the PHY failed to complete its normal wake sequence.
>>
>> In the end, this patch also adds the EEE ethtool support implementing:
>> o phy_ethtool_set_eee
>> o phy_ethtool_get_eee
>>
>> v1: initial patch
>> v2: fixed some errors especially on naming convention
>> v3: renamed again the mmd read/write functions thank to Ben's feedback
>> v4: moved file to phy.c and added the ethtool support.
>> v5: fixed phy_adv_to_eee, phy_eee_to_supported, phy_eee_to_adv return
>> values according to ethtool API (thanks to Ben's feedback).
>> Renamed some macros to avoid too long names.
>
> Sorry, I spotted some more little issues:
No problem, I'll fix these too.
Many thanks
Regards
Peppe
^ permalink raw reply
* [PATCH] usbnet: sanitise overlong driver information strings
From: Phil Sutter @ 2012-06-14 11:18 UTC (permalink / raw)
To: netdev; +Cc: davem
As seen on smsc75xx, driver_info->description being longer than 32
characters messes up 'ethtool -i' output.
Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
---
drivers/net/usb/usbnet.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 9f58330..d4f7256 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -876,9 +876,9 @@ void usbnet_get_drvinfo (struct net_device *net, struct ethtool_drvinfo *info)
{
struct usbnet *dev = netdev_priv(net);
- strncpy (info->driver, dev->driver_name, sizeof info->driver);
- strncpy (info->version, DRIVER_VERSION, sizeof info->version);
- strncpy (info->fw_version, dev->driver_info->description,
+ strlcpy (info->driver, dev->driver_name, sizeof info->driver);
+ strlcpy (info->version, DRIVER_VERSION, sizeof info->version);
+ strlcpy (info->fw_version, dev->driver_info->description,
sizeof info->fw_version);
usb_make_path (dev->udev, info->bus_info, sizeof info->bus_info);
}
--
1.7.3.4
^ permalink raw reply related
* [PATCH] c_can_pci: generic module for C_CAN/D_CAN on PCI
From: Federico Vaga @ 2012-06-14 11:43 UTC (permalink / raw)
To: Wolfgang Grandegger, Marc Kleine-Budde
Cc: Giancarlo Asnaghi, Alan Cox, linux-can, netdev, linux-kernel,
Bhupesh SHARMA, AnilKumar Chimata, Alessandro Rubini,
Federico Vaga
Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Alan Cox <alan@linux.intel.com>
---
drivers/net/can/c_can/Kconfig | 7 ++
drivers/net/can/c_can/Makefile | 1 +
drivers/net/can/c_can/c_can_pci.c | 236 +++++++++++++++++++++++++++++++++++++
3 files changed, 244 insertions(+)
create mode 100644 drivers/net/can/c_can/c_can_pci.c
diff --git a/drivers/net/can/c_can/Kconfig b/drivers/net/can/c_can/Kconfig
index 25d371c..3b83baf 100644
--- a/drivers/net/can/c_can/Kconfig
+++ b/drivers/net/can/c_can/Kconfig
@@ -13,4 +13,11 @@ config CAN_C_CAN_PLATFORM
boards from ST Microelectronics (http://www.st.com) like the
SPEAr1310 and SPEAr320 evaluation boards & TI (www.ti.com)
boards like am335x, dm814x, dm813x and dm811x.
+
+config CAN_C_CAN_PCI
+ tristate "Generic PCI Bus based C_CAN/D_CAN driver"
+ depends on PCI
+ ---help---
+ This driver adds support for the C_CAN/D_CAN chips connected
+ to the PCI bus.
endif
diff --git a/drivers/net/can/c_can/Makefile b/drivers/net/can/c_can/Makefile
index 9273f6d..ad1cc84 100644
--- a/drivers/net/can/c_can/Makefile
+++ b/drivers/net/can/c_can/Makefile
@@ -4,5 +4,6 @@
obj-$(CONFIG_CAN_C_CAN) += c_can.o
obj-$(CONFIG_CAN_C_CAN_PLATFORM) += c_can_platform.o
+obj-$(CONFIG_CAN_C_CAN_PCI) += c_can_pci.o
ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
new file mode 100644
index 0000000..7bdb793
--- /dev/null
+++ b/drivers/net/can/c_can/c_can_pci.c
@@ -0,0 +1,236 @@
+/*
+ * PCI bus driver for Bosch C_CAN/D_CAN controller
+ *
+ * Copyright (C) 2012 Federico Vaga <federico.vaga@gmail.com>
+ *
+ * Borrowed from c_can_platform.c
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/clk.h>
+#include <linux/pci.h>
+
+#include <linux/can/dev.h>
+
+#include "c_can.h"
+
+enum c_can_pci_reg_align {
+ C_CAN_REG_ALIGN_16,
+ C_CAN_REG_ALIGN_32,
+};
+
+struct c_can_pci_data {
+ /* Specify if is C_CAN or D_CAN */
+ enum c_can_dev_id type;
+ /* Set the register alignment in the memory */
+ enum c_can_pci_reg_align reg_align;
+ /* Set the frequency if clk is not usable */
+ unsigned int freq;
+};
+
+/*
+ * 16-bit c_can registers can be arranged differently in the memory
+ * architecture of different implementations. For example: 16-bit
+ * registers can be aligned to a 16-bit boundary or 32-bit boundary etc.
+ * Handle the same by providing a common read/write interface.
+ */
+static u16 c_can_pci_read_reg_aligned_to_16bit(struct c_can_priv *priv,
+ enum reg index)
+{
+ return readw(priv->base + priv->regs[index]);
+}
+
+static void c_can_pci_write_reg_aligned_to_16bit(struct c_can_priv *priv,
+ enum reg index, u16 val)
+{
+ writew(val, priv->base + priv->regs[index]);
+}
+
+static u16 c_can_pci_read_reg_aligned_to_32bit(struct c_can_priv *priv,
+ enum reg index)
+{
+ return readw(priv->base + 2 * priv->regs[index]);
+}
+
+static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv *priv,
+ enum reg index, u16 val)
+{
+ writew(val, priv->base + 2 * priv->regs[index]);
+}
+
+static int __devinit c_can_pci_probe(struct pci_dev *pdev,
+ const struct pci_device_id *ent)
+{
+ struct c_can_pci_data *c_can_pci_data = (void *)ent->driver_data;
+ struct c_can_priv *priv;
+ struct net_device *dev;
+ void __iomem *addr;
+ struct clk *clk;
+ int ret;
+
+ ret = pci_enable_device(pdev);
+ if (ret) {
+ dev_err(&pdev->dev, "pci_enable_device FAILED\n");
+ goto out;
+ }
+
+ ret = pci_request_regions(pdev, KBUILD_MODNAME);
+ if (ret) {
+ dev_err(&pdev->dev, "pci_request_regions FAILED\n");
+ goto out_disable_device;
+ }
+
+ pci_set_master(pdev);
+ pci_enable_msi(pdev);
+
+ addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
+ if (!addr) {
+ dev_err(&pdev->dev,
+ "device has no PCI memory resources, "
+ "failing adapter\n");
+ ret = -ENOMEM;
+ goto out_release_regions;
+ }
+
+ /* allocate the c_can device */
+ dev = alloc_c_can_dev();
+ if (!dev) {
+ ret = -ENOMEM;
+ goto out_iounmap;
+ }
+
+ priv = netdev_priv(dev);
+ pci_set_drvdata(pdev, dev);
+ SET_NETDEV_DEV(dev, &pdev->dev);
+
+ dev->irq = pdev->irq;
+ priv->base = addr;
+
+ if (!c_can_pci_data->freq) {
+ /* get the appropriate clk */
+ clk = clk_get(&pdev->dev, NULL);
+ if (IS_ERR(clk)) {
+ dev_err(&pdev->dev, "no clock defined\n");
+ ret = -ENODEV;
+ goto out_free_c_can;
+ }
+ priv->can.clock.freq = clk_get_rate(clk);
+ priv->priv = clk;
+ } else {
+ priv->can.clock.freq = c_can_pci_data->freq;
+ priv->priv = NULL;
+ }
+
+ /* Configure CAN type */
+ switch (c_can_pci_data->type) {
+ case C_CAN_DEVTYPE:
+ priv->regs = reg_map_c_can;
+ break;
+ case D_CAN_DEVTYPE:
+ priv->regs = reg_map_d_can;
+ priv->can.ctrlmode_supported |= CAN_CTRLMODE_3_SAMPLES;
+ break;
+ default:
+ ret = -EINVAL;
+ goto out_free_clock;
+ }
+
+ /* Configure access to registers */
+ switch (c_can_pci_data->reg_align) {
+ case C_CAN_REG_ALIGN_32:
+ priv->read_reg = c_can_pci_read_reg_aligned_to_32bit;
+ priv->write_reg = c_can_pci_write_reg_aligned_to_32bit;
+ break;
+ case C_CAN_REG_ALIGN_16:
+ priv->read_reg = c_can_pci_read_reg_aligned_to_16bit;
+ priv->write_reg = c_can_pci_write_reg_aligned_to_16bit;
+ break;
+ default:
+ ret = -EINVAL;
+ goto out_free_clock;
+ }
+
+ ret = register_c_can_dev(dev);
+ if (ret) {
+ dev_err(&pdev->dev, "registering %s failed (err=%d)\n",
+ KBUILD_MODNAME, ret);
+ goto out_free_clock;
+ }
+
+ dev_dbg(&pdev->dev, "%s device registered (regs=%p, irq=%d)\n",
+ KBUILD_MODNAME, priv->regs, dev->irq);
+
+ return 0;
+
+out_free_clock:
+ if (priv->priv)
+ clk_put(priv->priv);
+out_free_c_can:
+ pci_set_drvdata(pdev, NULL);
+ free_c_can_dev(dev);
+out_iounmap:
+ pci_iounmap(pdev, priv->base);
+out_release_regions:
+ pci_disable_msi(pdev);
+ pci_clear_master(pdev);
+ pci_release_regions(pdev);
+out_disable_device:
+ pci_disable_device(pdev);
+out:
+ return ret;
+}
+
+static void __devexit c_can_pci_remove(struct pci_dev *pdev)
+{
+ struct net_device *dev = pci_get_drvdata(pdev);
+ struct c_can_priv *priv = netdev_priv(dev);
+
+ unregister_c_can_dev(dev);
+
+ if (priv->priv)
+ clk_put(priv->priv);
+
+ pci_set_drvdata(pdev, NULL);
+ free_c_can_dev(dev);
+
+ pci_iounmap(pdev, priv->base);
+ pci_disable_msi(pdev);
+ pci_clear_master(pdev);
+ pci_release_regions(pdev);
+ pci_disable_device(pdev);
+}
+
+static struct c_can_pci_data c_can_sta2x11= {
+ .type = C_CAN_DEVTYPE,
+ .reg_align = C_CAN_REG_ALIGN_32,
+ .freq = 52000000, /* 52 Mhz */
+};
+
+#define C_CAN_ID(_vend, _dev, _driverdata) { \
+ PCI_DEVICE(_vend, _dev), \
+ .driver_data = (unsigned long)&_driverdata, \
+}
+static DEFINE_PCI_DEVICE_TABLE(c_can_pci_tbl) = {
+ C_CAN_ID(PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_CAN,
+ c_can_sta2x11),
+ {},
+};
+static struct pci_driver c_can_pci_driver = {
+ .name = KBUILD_MODNAME,
+ .id_table = c_can_pci_tbl,
+ .probe = c_can_pci_probe,
+ .remove = __devexit_p(c_can_pci_remove),
+};
+
+module_pci_driver(c_can_pci_driver);
+
+MODULE_AUTHOR("Federico Vaga <federico.vaga@gmail.com>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("PCI CAN bus driver for Bosch C_CAN/D_CAN controller");
+MODULE_DEVICE_TABLE(pci, c_can_pci_tbl);
--
1.7.10.2
^ permalink raw reply related
* Re: [PATCH] c_can_pci: generic module for C_CAN/D_CAN on PCI
From: Wolfgang Grandegger @ 2012-06-14 11:56 UTC (permalink / raw)
To: Federico Vaga
Cc: Marc Kleine-Budde, Giancarlo Asnaghi, Alan Cox, linux-can, netdev,
linux-kernel, Bhupesh SHARMA, AnilKumar Chimata,
Alessandro Rubini
In-Reply-To: <1339674222-27699-1-git-send-email-federico.vaga@gmail.com>
On 06/14/2012 01:43 PM, Federico Vaga wrote:
> Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
> Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
> Cc: Alan Cox <alan@linux.intel.com>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Thanks for your contribution.
Wolfgang.
^ permalink raw reply
* RE: [PATCH] c_can_pci: generic module for C_CAN/D_CAN on PCI
From: Bhupesh SHARMA @ 2012-06-14 11:58 UTC (permalink / raw)
To: Federico Vaga, Wolfgang Grandegger, Marc Kleine-Budde
Cc: Giancarlo ASNAGHI, Alan Cox, linux-can@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
AnilKumar Chimata, Alessandro Rubini
In-Reply-To: <1339674222-27699-1-git-send-email-federico.vaga@gmail.com>
Hi Federico,
Thanks for the patch.
> -----Original Message-----
> From: Federico Vaga [mailto:federico.vaga@gmail.com]
> Sent: Thursday, June 14, 2012 5:14 PM
> To: Wolfgang Grandegger; Marc Kleine-Budde
> Cc: Giancarlo ASNAGHI; Alan Cox; linux-can@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Bhupesh SHARMA;
> AnilKumar Chimata; Alessandro Rubini; Federico Vaga
> Subject: [PATCH] c_can_pci: generic module for C_CAN/D_CAN on PCI
>
> Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
> Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
> Cc: Alan Cox <alan@linux.intel.com>
> ---
> drivers/net/can/c_can/Kconfig | 7 ++
> drivers/net/can/c_can/Makefile | 1 +
> drivers/net/can/c_can/c_can_pci.c | 236
> +++++++++++++++++++++++++++++++++++++
> 3 files changed, 244 insertions(+)
> create mode 100644 drivers/net/can/c_can/c_can_pci.c
>
> diff --git a/drivers/net/can/c_can/Kconfig
> b/drivers/net/can/c_can/Kconfig
> index 25d371c..3b83baf 100644
> --- a/drivers/net/can/c_can/Kconfig
> +++ b/drivers/net/can/c_can/Kconfig
> @@ -13,4 +13,11 @@ config CAN_C_CAN_PLATFORM
> boards from ST Microelectronics (http://www.st.com) like the
> SPEAr1310 and SPEAr320 evaluation boards & TI (www.ti.com)
> boards like am335x, dm814x, dm813x and dm811x.
> +
> +config CAN_C_CAN_PCI
> + tristate "Generic PCI Bus based C_CAN/D_CAN driver"
> + depends on PCI
> + ---help---
> + This driver adds support for the C_CAN/D_CAN chips connected
> + to the PCI bus.
> endif
> diff --git a/drivers/net/can/c_can/Makefile
> b/drivers/net/can/c_can/Makefile
> index 9273f6d..ad1cc84 100644
> --- a/drivers/net/can/c_can/Makefile
> +++ b/drivers/net/can/c_can/Makefile
> @@ -4,5 +4,6 @@
>
> obj-$(CONFIG_CAN_C_CAN) += c_can.o
> obj-$(CONFIG_CAN_C_CAN_PLATFORM) += c_can_platform.o
> +obj-$(CONFIG_CAN_C_CAN_PCI) += c_can_pci.o
>
> ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
> diff --git a/drivers/net/can/c_can/c_can_pci.c
> b/drivers/net/can/c_can/c_can_pci.c
> new file mode 100644
> index 0000000..7bdb793
> --- /dev/null
> +++ b/drivers/net/can/c_can/c_can_pci.c
> @@ -0,0 +1,236 @@
> +/*
> + * PCI bus driver for Bosch C_CAN/D_CAN controller
> + *
> + * Copyright (C) 2012 Federico Vaga <federico.vaga@gmail.com>
> + *
> + * Borrowed from c_can_platform.c
> + *
> + * This file is licensed under the terms of the GNU General Public
> + * License version 2. This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +#include <linux/clk.h>
> +#include <linux/pci.h>
> +
> +#include <linux/can/dev.h>
> +
> +#include "c_can.h"
> +
> +enum c_can_pci_reg_align {
> + C_CAN_REG_ALIGN_16,
> + C_CAN_REG_ALIGN_32,
> +};
> +
> +struct c_can_pci_data {
> + /* Specify if is C_CAN or D_CAN */
> + enum c_can_dev_id type;
> + /* Set the register alignment in the memory */
> + enum c_can_pci_reg_align reg_align;
> + /* Set the frequency if clk is not usable */
> + unsigned int freq;
> +};
> +
> +/*
> + * 16-bit c_can registers can be arranged differently in the memory
> + * architecture of different implementations. For example: 16-bit
> + * registers can be aligned to a 16-bit boundary or 32-bit boundary
> etc.
> + * Handle the same by providing a common read/write interface.
> + */
> +static u16 c_can_pci_read_reg_aligned_to_16bit(struct c_can_priv
> *priv,
> + enum reg index)
> +{
> + return readw(priv->base + priv->regs[index]);
> +}
> +
> +static void c_can_pci_write_reg_aligned_to_16bit(struct c_can_priv
> *priv,
> + enum reg index, u16 val)
> +{
> + writew(val, priv->base + priv->regs[index]);
> +}
> +
> +static u16 c_can_pci_read_reg_aligned_to_32bit(struct c_can_priv
> *priv,
> + enum reg index)
> +{
> + return readw(priv->base + 2 * priv->regs[index]);
> +}
> +
> +static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv
> *priv,
> + enum reg index, u16 val)
> +{
> + writew(val, priv->base + 2 * priv->regs[index]);
> +}
> +
> +static int __devinit c_can_pci_probe(struct pci_dev *pdev,
> + const struct pci_device_id *ent)
> +{
> + struct c_can_pci_data *c_can_pci_data = (void *)ent->driver_data;
> + struct c_can_priv *priv;
> + struct net_device *dev;
> + void __iomem *addr;
> + struct clk *clk;
> + int ret;
> +
> + ret = pci_enable_device(pdev);
> + if (ret) {
> + dev_err(&pdev->dev, "pci_enable_device FAILED\n");
> + goto out;
> + }
> +
> + ret = pci_request_regions(pdev, KBUILD_MODNAME);
> + if (ret) {
> + dev_err(&pdev->dev, "pci_request_regions FAILED\n");
> + goto out_disable_device;
> + }
> +
> + pci_set_master(pdev);
> + pci_enable_msi(pdev);
> +
> + addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
> + if (!addr) {
> + dev_err(&pdev->dev,
> + "device has no PCI memory resources, "
> + "failing adapter\n");
> + ret = -ENOMEM;
> + goto out_release_regions;
> + }
> +
> + /* allocate the c_can device */
> + dev = alloc_c_can_dev();
> + if (!dev) {
> + ret = -ENOMEM;
> + goto out_iounmap;
> + }
> +
> + priv = netdev_priv(dev);
> + pci_set_drvdata(pdev, dev);
> + SET_NETDEV_DEV(dev, &pdev->dev);
> +
> + dev->irq = pdev->irq;
> + priv->base = addr;
> +
> + if (!c_can_pci_data->freq) {
> + /* get the appropriate clk */
> + clk = clk_get(&pdev->dev, NULL);
> + if (IS_ERR(clk)) {
> + dev_err(&pdev->dev, "no clock defined\n");
> + ret = -ENODEV;
> + goto out_free_c_can;
> + }
> + priv->can.clock.freq = clk_get_rate(clk);
> + priv->priv = clk;
> + } else {
> + priv->can.clock.freq = c_can_pci_data->freq;
> + priv->priv = NULL;
> + }
> +
> + /* Configure CAN type */
> + switch (c_can_pci_data->type) {
> + case C_CAN_DEVTYPE:
> + priv->regs = reg_map_c_can;
> + break;
> + case D_CAN_DEVTYPE:
> + priv->regs = reg_map_d_can;
> + priv->can.ctrlmode_supported |= CAN_CTRLMODE_3_SAMPLES;
> + break;
> + default:
> + ret = -EINVAL;
> + goto out_free_clock;
> + }
> +
> + /* Configure access to registers */
> + switch (c_can_pci_data->reg_align) {
> + case C_CAN_REG_ALIGN_32:
> + priv->read_reg = c_can_pci_read_reg_aligned_to_32bit;
> + priv->write_reg = c_can_pci_write_reg_aligned_to_32bit;
> + break;
> + case C_CAN_REG_ALIGN_16:
> + priv->read_reg = c_can_pci_read_reg_aligned_to_16bit;
> + priv->write_reg = c_can_pci_write_reg_aligned_to_16bit;
> + break;
> + default:
> + ret = -EINVAL;
> + goto out_free_clock;
> + }
> +
> + ret = register_c_can_dev(dev);
> + if (ret) {
> + dev_err(&pdev->dev, "registering %s failed (err=%d)\n",
> + KBUILD_MODNAME, ret);
> + goto out_free_clock;
> + }
> +
> + dev_dbg(&pdev->dev, "%s device registered (regs=%p, irq=%d)\n",
> + KBUILD_MODNAME, priv->regs, dev->irq);
> +
> + return 0;
> +
> +out_free_clock:
> + if (priv->priv)
> + clk_put(priv->priv);
> +out_free_c_can:
> + pci_set_drvdata(pdev, NULL);
> + free_c_can_dev(dev);
> +out_iounmap:
> + pci_iounmap(pdev, priv->base);
> +out_release_regions:
> + pci_disable_msi(pdev);
> + pci_clear_master(pdev);
> + pci_release_regions(pdev);
> +out_disable_device:
> + pci_disable_device(pdev);
> +out:
> + return ret;
> +}
> +
> +static void __devexit c_can_pci_remove(struct pci_dev *pdev)
> +{
> + struct net_device *dev = pci_get_drvdata(pdev);
> + struct c_can_priv *priv = netdev_priv(dev);
> +
> + unregister_c_can_dev(dev);
> +
> + if (priv->priv)
> + clk_put(priv->priv);
> +
> + pci_set_drvdata(pdev, NULL);
> + free_c_can_dev(dev);
> +
> + pci_iounmap(pdev, priv->base);
> + pci_disable_msi(pdev);
> + pci_clear_master(pdev);
> + pci_release_regions(pdev);
> + pci_disable_device(pdev);
> +}
> +
> +static struct c_can_pci_data c_can_sta2x11= {
> + .type = C_CAN_DEVTYPE,
> + .reg_align = C_CAN_REG_ALIGN_32,
> + .freq = 52000000, /* 52 Mhz */
> +};
> +
> +#define C_CAN_ID(_vend, _dev, _driverdata) { \
> + PCI_DEVICE(_vend, _dev), \
> + .driver_data = (unsigned long)&_driverdata, \
> +}
> +static DEFINE_PCI_DEVICE_TABLE(c_can_pci_tbl) = {
> + C_CAN_ID(PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_CAN,
> + c_can_sta2x11),
> + {},
> +};
> +static struct pci_driver c_can_pci_driver = {
> + .name = KBUILD_MODNAME,
> + .id_table = c_can_pci_tbl,
> + .probe = c_can_pci_probe,
> + .remove = __devexit_p(c_can_pci_remove),
> +};
> +
> +module_pci_driver(c_can_pci_driver);
> +
> +MODULE_AUTHOR("Federico Vaga <federico.vaga@gmail.com>");
> +MODULE_LICENSE("GPL v2");
> +MODULE_DESCRIPTION("PCI CAN bus driver for Bosch C_CAN/D_CAN
> controller");
> +MODULE_DEVICE_TABLE(pci, c_can_pci_tbl);
> --
Acked-by: Bhupesh Sharma <bhupesh.sharma@st.com>
Regards,
Bhupesh
^ permalink raw reply
* Re: [net-next patch 8/12] bnx2x: Allow up to 63 RSS queues default 8 queues
From: Merav Sicron @ 2012-06-14 15:34 UTC (permalink / raw)
To: David Miller; +Cc: eilong, eric.dumazet, netdev
In-Reply-To: <20120613.153517.901328280865603627.davem@davemloft.net>
On Wed, 2012-06-13 at 15:35 -0700, David Miller wrote:
> From: "Eilon Greenstein" <eilong@broadcom.com>
> Date: Wed, 13 Jun 2012 16:53:29 +0300
>
> > Just to emphasis, since this is the patch series that enable the users
> > to control the number of queues, we can reduce the default number and
> > allow the user to increase it if he has a setup that needs more than 8
> > parallel CPUs to receive the traffic. When using a new FW on the board,
> > the number can be increased up to 64, so using the maximal number can be
> > an overkill (even if the machine has 64 CPUs, it does not mean that the
> > user would like us to consume 64 MSI-X vectors and all the memory to set
> > up 64 queues) - so a lower default value can be used to satisfy most
> > users while allowing them to increase the number if they wish.
>
> I think you should look at other drivers for guidance in this area.
>
> There is zero value in each and every driver author deciding what
> is a good default strategy, because this means the user gets a
> very inconsistent experience based purely upon the driver author's
> whims.
>
We looked at few other drivers - their current behavior is similar to
what bnx2x had before this change: Minimum between the number of CPU and
a defined maximum (probably the HW limit).
bnx2x HW limit is 64 (in most other drivers it is smaller, but this can
change). The number of CPUs in new systems becomes bigger and bigger,
and allocaintg so many RSS queues seems like a waste. With the
relatively new ethtool -L feature the user can change the number of
queues. That's why we think (and so does Eric Dumazet) that it is better
to have a smaller default number which is good for most cases.
Do you agree with that?
Thanks,
Merav
^ permalink raw reply
* Re: linux-next: manual merge of the net-next tree with the wireless tree
From: John W. Linville @ 2012-06-14 12:59 UTC (permalink / raw)
To: Mohammed Shafi Shajakhan
Cc: Stephen Rothwell, David Miller, netdev, linux-next, linux-kernel,
Sujith Manoharan
In-Reply-To: <4FD96D85.4010505@qca.qualcomm.com>
On Thu, Jun 14, 2012 at 10:20:13AM +0530, Mohammed Shafi Shajakhan wrote:
> Hi Stephen,
>
> On Thursday 14 June 2012 08:42 AM, Stephen Rothwell wrote:
> >Hi all,
> >
> >Today's linux-next merge of the net-next tree got a conflict in
> >drivers/net/wireless/ath/ath9k/main.c between commit bcb7ad7bcbef
> >("ath9k: Fix softlockup in AR9485") from the wireless tree and commit
> >ef1b6cd9a1ba ("ath9k: Group link monitoring logic") from the net-next
> >tree.
> >
> >The latter removes the code modified by the former, so I did that. The
> >fix from the former patch may be needed elsewhere now.
That sounds right.
> the back ported version of this patch is recently sent
> http://www.spinics.net/lists/linux-wireless/msg92125.html
I have it in wireless-next already...thanks!
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* ipsec and snat: mtu question
From: Marco Berizzi @ 2012-06-14 13:48 UTC (permalink / raw)
To: netdev
Hello everybody.
Kindly, I would like to ask for explanations
about a linux ipsec gateway snatting packets.
Here is the network schema.
customer private network 10.16.0.0/16
|
|
+ipsec customer gateway (checkpoint)
||
||---ipsec tunnel 10.16.0.0/16<->172.16.128.0/28 (des3/md5)
|| mtu=1446
||
++ linux_gw_snat ipsec gateway (SNAT all packets from 172.22.1.0/24 to 172.16.128.1)
||
||---ipsec tunnel 10.16.0.0/16<->172.22.1.0/24 (aes/sha1/ipcomp)
|| mtu=1430
||
+linux_final ipsec gateway
|
|
client 172.22.1.50
SYN packet start behind the linux_final (172.22.1.50)
for 10.16.237.66 customer network. MSS is 1460 byte.
DF flag is set on outgoing packets.
Packet travel inside the ipsec tunnel: tunnel mtu is
1430
At the linux_gw_snat, the packet get decryped, snatted
(ip src change from 172.22.1.50 to 172.16.128.1) and
encryped again.
Packets are delivered to the checkpoint: tunnel mtu is
1446
Checkpoint deliver the decryped packet to 10.16.237.66
So far, so good.
At some point, 10.16.237.66 will send a 1500 byte
packet for 172.16.128.1: checkpoint will reply with
an icmp packet too large need to frag: mtu is 1446
10.16.237.66 will send back a 1446 byte packet to
the checkpoint which will encrypt and deliver to the
linux_gw_snat which will decrypt and deSNAT. Now
linux_gw_snat must send this 1446 byte packet to
172.22.1.50 but mtu is only 1430: packet will be
dropped (DF is set).
Now, IMHO, linux_gw_snat should send an imcp message
to 10.16.237.66 telling that max mtu is 1430, but I
don't see any icmp packet.
Is this the expected behaviour?
TIA
PS: linux_gw_snat is 3.3.5
^ permalink raw reply
* Re: Regression on TX throughput when using bonding
From: Jean-Michel Hautbois @ 2012-06-14 14:14 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <CAL8zT=icffEiz9MeaNwMtzteQnXiaT4k++s0TPWt5zsnHxFbmw@mail.gmail.com>
2012/6/14 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/14 Eric Dumazet <eric.dumazet@gmail.com>:
>> On Thu, 2012-06-14 at 11:22 +0200, Eric Dumazet wrote:
>>
>>> So you are saying that if you make skb_orphan_try() doing nothing, it
>>> solves your problem ?
>>
>> It probably does, if your application does an UDP flood, trying to send
>> more than the link bandwidth. I guess only benchmarks workloads ever try
>> to do that.
>>
>> bonding has no way to give congestion back, it has no Qdisc by default.
>>
>> We probably can defer the skb_orphan_try() for bonding master, a bit
>> like the IFF_XMIT_DST_RELEASE
>>
>> drivers/net/bonding/bond_main.c | 2 +-
>> include/linux/if.h | 3 +++
>> net/core/dev.c | 5 +++--
>> 3 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 2ee8cf9..1b1e9c8 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -4343,7 +4343,7 @@ static void bond_setup(struct net_device *bond_dev)
>> bond_dev->tx_queue_len = 0;
>> bond_dev->flags |= IFF_MASTER|IFF_MULTICAST;
>> bond_dev->priv_flags |= IFF_BONDING;
>> - bond_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
>> + bond_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING | IFF_XMIT_ORPHAN);
>>
>> /* At first, we block adding VLANs. That's the only way to
>> * prevent problems that occur when adding VLANs over an
>> diff --git a/include/linux/if.h b/include/linux/if.h
>> index f995c66..a788e7b 100644
>> --- a/include/linux/if.h
>> +++ b/include/linux/if.h
>> @@ -81,6 +81,9 @@
>> #define IFF_UNICAST_FLT 0x20000 /* Supports unicast filtering */
>> #define IFF_TEAM_PORT 0x40000 /* device used as team port */
>> #define IFF_SUPP_NOFCS 0x80000 /* device supports sending custom FCS */
>> +#define IFF_XMIT_ORPHAN 0x100000 /* dev_hard_start_xmit() is allowed to
>> + * orphan skb
>> + */
>>
>>
>> #define IF_GET_IFACE 0x0001 /* for querying only */
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index cd09819..3435463 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2193,7 +2193,8 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>> if (!list_empty(&ptype_all))
>> dev_queue_xmit_nit(skb, dev);
>>
>> - skb_orphan_try(skb);
>> + if (dev->priv_flags & IFF_XMIT_ORPHAN)
>> + skb_orphan_try(skb);
>>
>> features = netif_skb_features(skb);
>>
>> @@ -5929,7 +5930,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
>> INIT_LIST_HEAD(&dev->napi_list);
>> INIT_LIST_HEAD(&dev->unreg_list);
>> INIT_LIST_HEAD(&dev->link_watch_list);
>> - dev->priv_flags = IFF_XMIT_DST_RELEASE;
>> + dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_ORPHAN;
>> setup(dev);
>>
>> dev->num_tx_queues = txqs;
>>
>>
>
> It works
For your information :
~# tc -s -d qdisc show dev eth1 > before_tc && sleep 10 && tc -s -d
qdisc show dev eth1 > after_tc && ./beforeafter before_tc after_tc
qdisc mq 0: root
Sent 3185900568 bytes 788681 pkt (dropped 0, overlimits 0 requeues 620)
backlog 0b 0p requeues 620
As you can see, 2.5Gbps without any difficulties :).
Thanks,
JM
^ permalink raw reply
* Re: [PATCH 2/5] drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function
From: Takuya Yoshikawa @ 2012-06-14 14:28 UTC (permalink / raw)
To: Akinobu Mita
Cc: Grant Grundler, Takuya Yoshikawa, akpm, bhutchings, grundler,
arnd, benh, avi, mtosatti, linux-net-drivers, netdev,
linux-kernel, linux-arch, kvm
In-Reply-To: <CAC5umyjkdDqavCo3Dk+WOggCnH+_CZz5jrOr3SougS4HSgV3OA@mail.gmail.com>
On Thu, 14 Jun 2012 18:36:42 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:
> >> 1) while I agree with Akinobu and thank him for pointing out a
> >> _potential_ alignment problem, this is a separate issue and your
> >> existing patch should go in anyway. There are probably other drivers
> >> with _potential_ alignment issues. Akinobu could get credit for
> >> finding them by submitting patches after reviewing calls to set_bit
> >> and set_bit_le() - similar to what you are doing now.
> >
> > I prefer approach 1.
> >
> > hash_table is local in build_setup_frame_hash(), so if further
> > improvement is also required, we can do that locally there later.
>
> This potential alignment problem is introduced by this patch. Because
> the original set_bit_le() in tulip driver can handle unaligned bitmap.
> This is why I recommended it should be fixed in this patch.
The original set_bit_le() was used only in build_setup_frame_hash().
If it's clear that the table is aligned locally in the function, I do
not think the __potential__ problem is introduced by this patch.
As you can see from my response to Arnd in v1 thread, I knew the
alignment requirement at that time and checked the definition of
hash_table before using __set_bit_le().
> But please just ignore me if I'm too much paranoid. And I'll handle
> this issue if no one wants to do it.
I'm open to suggestions.
But now that the maintainer who can test the driver on real hardware
has suggested this patch should go in, I won't change the patch without
any real issue.
I would thank you if you improve this driver later on top of that.
Thanks,
Takuya
^ permalink raw reply
* Re: Regression on TX throughput when using bonding
From: Eric Dumazet @ 2012-06-14 14:29 UTC (permalink / raw)
To: Jean-Michel Hautbois; +Cc: netdev
In-Reply-To: <CAL8zT=huqtqBKzH3DDwid_C8jH16SH=kjYEK6zjxp_spfnLxXA@mail.gmail.com>
On Thu, 2012-06-14 at 16:14 +0200, Jean-Michel Hautbois wrote:
> ~# tc -s -d qdisc show dev eth1 > before_tc && sleep 10 && tc -s -d
> qdisc show dev eth1 > after_tc && ./beforeafter before_tc after_tc
> qdisc mq 0: root
> Sent 3185900568 bytes 788681 pkt (dropped 0, overlimits 0 requeues 620)
> backlog 0b 0p requeues 620
>
> As you can see, 2.5Gbps without any difficulties :).
>
> Thanks,
> JM
I have no idea why throughput on ethernet link is changed.
There is another bug elsewhere. Use a thousand of sockets instead of
few, and you'll hit the bug.
Orphaning skbs should not lower speed of the device, only drops excess
packets, instead of blocking the application, waiting the socket wmem
alloc being freed by destructors.
Are you playing with process priorities ?
If the ksoftirqd cannot run, this could explain the problem.
^ permalink raw reply
* Re: [PATCH] leds: Rename led_brightness_set() to led_set_brightness()
From: Shuah Khan @ 2012-06-14 14:52 UTC (permalink / raw)
To: bryan.wu
Cc: shuahkhan, rpurdie, johannes, linville, davem, LKML,
linux-wireless, netdev, linux-leds
In-Reply-To: <1339619670.13326.22.camel@lorien2>
On Wed, 2012-06-13 at 14:34 -0600, Shuah Khan wrote:
> Rename leds external interface led_brightness_set() to led_set_brightness().
> This is the second phase of the change to reduce confusion between the
> leds internal and external interfaces that set brightness. With this change,
> now the external interface is led_set_brightness(). The first phase renamed
> the internal interface led_set_brightness() to __led_set_brightness().
> There are no changes to the interface implementations.
>
> Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Bryan,
Just in case you didn't see this patch. cc'ing linux-leds
-- Shuah
> ---
> drivers/leds/led-class.c | 2 +-
> drivers/leds/led-core.c | 4 ++--
> drivers/leds/led-triggers.c | 2 +-
> drivers/leds/ledtrig-oneshot.c | 2 +-
> drivers/leds/ledtrig-timer.c | 2 +-
> include/linux/leds.h | 4 ++--
> net/mac80211/led.c | 2 +-
> 7 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
> index cb0a6eb..c599095 100644
> --- a/drivers/leds/led-class.c
> +++ b/drivers/leds/led-class.c
> @@ -222,7 +222,7 @@ void led_classdev_unregister(struct led_classdev *led_cdev)
> #endif
>
> /* Stop blinking */
> - led_brightness_set(led_cdev, LED_OFF);
> + led_set_brightness(led_cdev, LED_OFF);
>
> device_unregister(led_cdev->dev);
>
> diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
> index 176961b..8a09c5f 100644
> --- a/drivers/leds/led-core.c
> +++ b/drivers/leds/led-core.c
> @@ -103,7 +103,7 @@ void led_blink_set_oneshot(struct led_classdev *led_cdev,
> }
> EXPORT_SYMBOL(led_blink_set_oneshot);
>
> -void led_brightness_set(struct led_classdev *led_cdev,
> +void led_set_brightness(struct led_classdev *led_cdev,
> enum led_brightness brightness)
> {
> /* stop and clear soft-blink timer */
> @@ -113,4 +113,4 @@ void led_brightness_set(struct led_classdev *led_cdev,
>
> __led_set_brightness(led_cdev, brightness);
> }
> -EXPORT_SYMBOL(led_brightness_set);
> +EXPORT_SYMBOL(led_set_brightness);
> diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
> index f8b14dd..57721f2 100644
> --- a/drivers/leds/led-triggers.c
> +++ b/drivers/leds/led-triggers.c
> @@ -112,7 +112,7 @@ void led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
> if (led_cdev->trigger->deactivate)
> led_cdev->trigger->deactivate(led_cdev);
> led_cdev->trigger = NULL;
> - led_brightness_set(led_cdev, LED_OFF);
> + led_set_brightness(led_cdev, LED_OFF);
> }
> if (trig) {
> write_lock_irqsave(&trig->leddev_list_lock, flags);
> diff --git a/drivers/leds/ledtrig-oneshot.c b/drivers/leds/ledtrig-oneshot.c
> index 5cbab41..2c029aa 100644
> --- a/drivers/leds/ledtrig-oneshot.c
> +++ b/drivers/leds/ledtrig-oneshot.c
> @@ -177,7 +177,7 @@ static void oneshot_trig_deactivate(struct led_classdev *led_cdev)
> }
>
> /* Stop blinking */
> - led_brightness_set(led_cdev, LED_OFF);
> + led_set_brightness(led_cdev, LED_OFF);
> }
>
> static struct led_trigger oneshot_led_trigger = {
> diff --git a/drivers/leds/ledtrig-timer.c b/drivers/leds/ledtrig-timer.c
> index 9010f7a..f774d05 100644
> --- a/drivers/leds/ledtrig-timer.c
> +++ b/drivers/leds/ledtrig-timer.c
> @@ -104,7 +104,7 @@ static void timer_trig_deactivate(struct led_classdev *led_cdev)
> }
>
> /* Stop blinking */
> - led_brightness_set(led_cdev, LED_OFF);
> + led_set_brightness(led_cdev, LED_OFF);
> }
>
> static struct led_trigger timer_led_trigger = {
> diff --git a/include/linux/leds.h b/include/linux/leds.h
> index dd93a22..3aade1d 100644
> --- a/include/linux/leds.h
> +++ b/include/linux/leds.h
> @@ -124,7 +124,7 @@ extern void led_blink_set_oneshot(struct led_classdev *led_cdev,
> unsigned long *delay_off,
> int invert);
> /**
> - * led_brightness_set - set LED brightness
> + * led_set_brightness - set LED brightness
> * @led_cdev: the LED to set
> * @brightness: the brightness to set it to
> *
> @@ -132,7 +132,7 @@ extern void led_blink_set_oneshot(struct led_classdev *led_cdev,
> * software blink timer that implements blinking when the
> * hardware doesn't.
> */
> -extern void led_brightness_set(struct led_classdev *led_cdev,
> +extern void led_set_brightness(struct led_classdev *led_cdev,
> enum led_brightness brightness);
>
> /*
> diff --git a/net/mac80211/led.c b/net/mac80211/led.c
> index 1bf7903..bcffa69 100644
> --- a/net/mac80211/led.c
> +++ b/net/mac80211/led.c
> @@ -276,7 +276,7 @@ static void ieee80211_stop_tpt_led_trig(struct ieee80211_local *local)
>
> read_lock(&tpt_trig->trig.leddev_list_lock);
> list_for_each_entry(led_cdev, &tpt_trig->trig.led_cdevs, trig_list)
> - led_brightness_set(led_cdev, LED_OFF);
> + led_set_brightness(led_cdev, LED_OFF);
> read_unlock(&tpt_trig->trig.leddev_list_lock);
> }
>
^ permalink raw reply
* Server Rental services in Hong Kong
From: trtr678678 @ 2012-06-14 15:22 UTC (permalink / raw)
Dear All,
We have our own datacenter in Hong Kong & provide email/application/web rental service to clients.We are APNIC member & provide clean IP to clients.
Dell? PowerEdge? EnterpriseRack Mount Server
-Intel(R) Xeon(R) E3-1240 Processor (3.3GHz, 8M Cache, Turbo, 4C/8T, 80W)
-8GB RAM, 2x4GB, 1333MHz, DDR-3, Dual Ranked UDIMMs
-500GB, 3.5", 6Gbps SAS x 2
-Raid 1 Mirroring Protection
-Remote KVM (iDRAC6 Enterprise)
Dell(TM) PowerEdge(TM) R410 Rack Mount Server
-Intel(R) Quad Core E5606 Xeon(R) CPU, 2.13GHz, 4M Cache, 4.86 GT/s QPI
-4GB Memory (2x2GB), 1333MHz Dual Ranked RDIMMs Fully-Buffered
-500GB 7.2K RPM SATAII 3.5" Hard Drive x 2
-iDRAC6 Enterprise or Express (Remote KVM Management)
Every Dedicated Server Hosting Solution Also Includes:
Software Specification
- CentOS / Fedora / Debian / FreeBSD / Ubuntu / Redhat Linux
- Full root-level access
- Data Center Facilities
- Shared Local & International Bandwidth
- 2 IP Addresses Allocation
- Un-interruptible Power Supply (UPS) backed up by private diesel generator
- FM200¡§based fire suppression system
- 24x7 CRAC Air Conditioning and Humidity Control
- 24x7 Security Control
- 24x7 Remote Hand Service
Pls send us email for further information.Thanks,
Ron
trtr678678@gmail.com
If you do not wish to further receive this event message, email "trtr789789@gmail.com" to unsubscribe this message or remove your email from the list.
^ permalink raw reply
* Server Rental services in Hong Kong
From: trtr678678 @ 2012-06-14 15:22 UTC (permalink / raw)
Dear All,
We have our own datacenter in Hong Kong & provide email/application/web rental service to clients.We are APNIC member & provide clean IP to clients.
Dell? PowerEdge? EnterpriseRack Mount Server
-Intel(R) Xeon(R) E3-1240 Processor (3.3GHz, 8M Cache, Turbo, 4C/8T, 80W)
-8GB RAM, 2x4GB, 1333MHz, DDR-3, Dual Ranked UDIMMs
-500GB, 3.5", 6Gbps SAS x 2
-Raid 1 Mirroring Protection
-Remote KVM (iDRAC6 Enterprise)
Dell(TM) PowerEdge(TM) R410 Rack Mount Server
-Intel(R) Quad Core E5606 Xeon(R) CPU, 2.13GHz, 4M Cache, 4.86 GT/s QPI
-4GB Memory (2x2GB), 1333MHz Dual Ranked RDIMMs Fully-Buffered
-500GB 7.2K RPM SATAII 3.5" Hard Drive x 2
-iDRAC6 Enterprise or Express (Remote KVM Management)
Every Dedicated Server Hosting Solution Also Includes:
Software Specification
- CentOS / Fedora / Debian / FreeBSD / Ubuntu / Redhat Linux
- Full root-level access
- Data Center Facilities
- Shared Local & International Bandwidth
- 2 IP Addresses Allocation
- Un-interruptible Power Supply (UPS) backed up by private diesel generator
- FM200¡§based fire suppression system
- 24x7 CRAC Air Conditioning and Humidity Control
- 24x7 Security Control
- 24x7 Remote Hand Service
Pls send us email for further information.Thanks,
Ron
trtr678678@gmail.com
If you do not wish to further receive this event message, email "trtr789789@gmail.com" to unsubscribe this message or remove your email from the list.
^ permalink raw reply
* Re: Regression on TX throughput when using bonding
From: Jean-Michel Hautbois @ 2012-06-14 15:43 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1339684157.22704.722.camel@edumazet-glaptop>
2012/6/14 Eric Dumazet <eric.dumazet@gmail.com>:
> On Thu, 2012-06-14 at 16:14 +0200, Jean-Michel Hautbois wrote:
>
>> ~# tc -s -d qdisc show dev eth1 > before_tc && sleep 10 && tc -s -d
>> qdisc show dev eth1 > after_tc && ./beforeafter before_tc after_tc
>> qdisc mq 0: root
>> Sent 3185900568 bytes 788681 pkt (dropped 0, overlimits 0 requeues 620)
>> backlog 0b 0p requeues 620
>>
>> As you can see, 2.5Gbps without any difficulties :).
>>
>> Thanks,
>> JM
>
> I have no idea why throughput on ethernet link is changed.
>
> There is another bug elsewhere. Use a thousand of sockets instead of
> few, and you'll hit the bug.
>
> Orphaning skbs should not lower speed of the device, only drops excess
> packets, instead of blocking the application, waiting the socket wmem
> alloc being freed by destructors.
>
> Are you playing with process priorities ?
>
> If the ksoftirqd cannot run, this could explain the problem.
>
As suggested by Eric, here is a description I wish to be as precise as possible.
I send three RAW video frames, 1920x1088@30fps on three udp sockets to
the same NIC.
Each sending is in a thread, so I will focus on the numbers for one thread.
This generates burst of send(), as this : each 1/30s send 3.133.440
bytes to the ethernet interface.
This is in fact something similar to this :
while (n != 0)
{
sendto(socket, packet, 4000);
n -= 4000;
packet += 4000
}
My interface is a bond with a 10Gbps interface and MTU set to 4096.
This means I have 784 packets each 1/30s which are sent on my
interface by one thread, then I wait for the next burst, and so on.
The videos are not necessarily the same video, so the threads may send
simultaneously or not...
My socket is in blocking mode.
JM
^ permalink raw reply
* Re: [PATCH 0/8] dcbnl: Major simplifications
From: John Fastabend @ 2012-06-14 16:06 UTC (permalink / raw)
To: tgraf, alexander.h.duyck; +Cc: David Miller, netdev, lucy.liu
In-Reply-To: <20120614075435.GA29185@canuck.infradead.org>
On 6/14/2012 12:54 AM, Thomas Graf wrote:
> On Wed, Jun 13, 2012 at 03:55:41PM -0700, David Miller wrote:
>> Lots of deleted code, I like it :-)
>>
>> Applied, but could you send a follow-on patch to use BUG_ON() instead
>> of that "if (!ptr) { /* ... */ BUG(); }" construct?
>
> Sure, I must have had a weak moment right there :)
>
Nice! I'm a bit late but dumped this into my dcbnl netlink test kit
and everything looks good so...
Tested-by: john.r.fastabend@intel.com
^ permalink raw reply
* [PATCH] net: remove skb_orphan_try()
From: Eric Dumazet @ 2012-06-14 16:42 UTC (permalink / raw)
To: David Miller; +Cc: jhautbois, netdev
In-Reply-To: <20120614.033153.258221733380821664.davem@davemloft.net>
From: Eric Dumazet <edumazet@google.com>
On Thu, 2012-06-14 at 03:31 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> > We should have a way to properly park packets in Qdiscs, and only do the
> > orphaning once skb given to real device for 'immediate or so'
> > transmission.
>
> Ok.
In the other hand, all this stuff happens too late with BQL, since more
packets are parked in a Qdisc instead of being delivered with hot
caches.
Doing the orphaning once packet was enqueued, then dequeued, is probably
not worth adding yet another test in fast path.
[PATCH] net: remove skb_orphan_try()
Orphaning skb in dev_hard_start_xmit() makes bonding behavior
unfriendly for applications sending big UDP bursts : Once packets
pass the bonding device and come to real device, they might hit a full
qdisc and be dropped. Without orphaning, the sender is automatically
throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming
sk_sndbuf is not too big)
We could try to defer the orphaning adding another test in
dev_hard_start_xmit(), but all this seems of little gain,
now that BQL tends to make packets more likely to be parked
in Qdisc queues instead of NIC TX ring, in cases where performance
matters.
Reverts commits :
fc6055a5ba31 net: Introduce skb_orphan_try()
87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try()
and removes SKBTX_DRV_NEEDS_SK_REF flag
Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/skbuff.h | 7 ++-----
net/can/raw.c | 3 ---
net/core/dev.c | 23 +----------------------
net/iucv/af_iucv.c | 1 -
4 files changed, 3 insertions(+), 31 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b534a1b..642cb73 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -225,14 +225,11 @@ enum {
/* device driver is going to provide hardware time stamp */
SKBTX_IN_PROGRESS = 1 << 2,
- /* ensure the originating sk reference is available on driver level */
- SKBTX_DRV_NEEDS_SK_REF = 1 << 3,
-
/* device driver supports TX zero-copy buffers */
- SKBTX_DEV_ZEROCOPY = 1 << 4,
+ SKBTX_DEV_ZEROCOPY = 1 << 3,
/* generate wifi status information (where possible) */
- SKBTX_WIFI_STATUS = 1 << 5,
+ SKBTX_WIFI_STATUS = 1 << 4,
};
/*
diff --git a/net/can/raw.c b/net/can/raw.c
index cde1b4a..46cca3a 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -681,9 +681,6 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
if (err < 0)
goto free_skb;
- /* to be able to check the received tx sock reference in raw_rcv() */
- skb_shinfo(skb)->tx_flags |= SKBTX_DRV_NEEDS_SK_REF;
-
skb->dev = dev;
skb->sk = sk;
diff --git a/net/core/dev.c b/net/core/dev.c
index cd09819..6df2140 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2089,25 +2089,6 @@ static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features)
return 0;
}
-/*
- * Try to orphan skb early, right before transmission by the device.
- * We cannot orphan skb if tx timestamp is requested or the sk-reference
- * is needed on driver level for other reasons, e.g. see net/can/raw.c
- */
-static inline void skb_orphan_try(struct sk_buff *skb)
-{
- struct sock *sk = skb->sk;
-
- if (sk && !skb_shinfo(skb)->tx_flags) {
- /* skb_tx_hash() wont be able to get sk.
- * We copy sk_hash into skb->rxhash
- */
- if (!skb->rxhash)
- skb->rxhash = sk->sk_hash;
- skb_orphan(skb);
- }
-}
-
static bool can_checksum_protocol(netdev_features_t features, __be16 protocol)
{
return ((features & NETIF_F_GEN_CSUM) ||
@@ -2193,8 +2174,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
if (!list_empty(&ptype_all))
dev_queue_xmit_nit(skb, dev);
- skb_orphan_try(skb);
-
features = netif_skb_features(skb);
if (vlan_tx_tag_present(skb) &&
@@ -2304,7 +2283,7 @@ u16 __skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb,
if (skb->sk && skb->sk->sk_hash)
hash = skb->sk->sk_hash;
else
- hash = (__force u16) skb->protocol ^ skb->rxhash;
+ hash = (__force u16) skb->protocol;
hash = jhash_1word(hash, hashrnd);
return (u16) (((u64) hash * qcount) >> 32) + qoffset;
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 07d7d55..cd6f7a9 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -372,7 +372,6 @@ static int afiucv_hs_send(struct iucv_message *imsg, struct sock *sock,
skb_trim(skb, skb->dev->mtu);
}
skb->protocol = ETH_P_AF_IUCV;
- skb_shinfo(skb)->tx_flags |= SKBTX_DRV_NEEDS_SK_REF;
nskb = skb_clone(skb, GFP_ATOMIC);
if (!nskb)
return -ENOMEM;
^ permalink raw reply related
* Re: Regression on TX throughput when using bonding
From: Rick Jones @ 2012-06-14 17:46 UTC (permalink / raw)
To: Jean-Michel Hautbois; +Cc: Eric Dumazet, netdev
In-Reply-To: <CAL8zT=joBA5pgXB7QfDM5qhOizmdneghXsSnwN5G74-yoGzg_Q@mail.gmail.com>
On 06/14/2012 08:43 AM, Jean-Michel Hautbois wrote:
> As suggested by Eric, here is a description I wish to be as precise as possible.
> I send three RAW video frames, 1920x1088@30fps on three udp sockets to
> the same NIC.
> Each sending is in a thread, so I will focus on the numbers for one thread.
>
> This generates burst of send(), as this : each 1/30s send 3.133.440
> bytes to the ethernet interface.
> This is in fact something similar to this :
> while (n != 0)
> {
> sendto(socket, packet, 4000);
> n -= 4000;
> packet += 4000
> }
>
> My interface is a bond with a 10Gbps interface and MTU set to 4096.
> This means I have 784 packets each 1/30s which are sent on my
> interface by one thread, then I wait for the next burst, and so on.
> The videos are not necessarily the same video, so the threads may send
> simultaneously or not...
>
> My socket is in blocking mode.
If desired, here is how to simulate that with netperf:
./configure --enable-intervals
make
And an example over loopback:
raj@tardy:~/netperf2_trunk$ src/netperf -l 10 -t UDP_STREAM -H localhost
-w 33 -b 783 -- -s 1M -S 1M -m 4000
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to tardy (::1)
port 0 AF_INET6 : interval
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
2097152 4000 9.99 260739 0 835.18
2097152 9.99 260442 834.23
Adjust the -s and/or -S options to match what Jean-Michel's application
uses for socket buffer sizes. Run another two simultaneous instances to
get the three streams. Adjust the run length with the -l option.
happy benchmarking,
rick jones
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox