* Re: [PATCH 2/2 fixed] ping6: Fix -F switch.
From: Jan Synacek @ 2012-12-10 10:12 UTC (permalink / raw)
To: yoshfuji; +Cc: netdev, jsynacek
In-Reply-To: <50C5B24F.5030900@redhat.com>
Even when the flowlabel is set correctly, ping6 exits with a warning. The errno
should be checked only if the previous call returned a negative value. In this
case, there is no need to check errno, checking for a negative value is enough.
Signed-off-by: Jan Synacek <jsynacek@redhat.com>
---
ping6.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/ping6.c b/ping6.c
index 358a035..94a24b0 100644
--- a/ping6.c
+++ b/ping6.c
@@ -724,8 +724,9 @@ int main(int argc, char *argv[])
while ((ch = getopt(argc, argv, COMMON_OPTSTR "F:N:")) != EOF) {
switch(ch) {
case 'F':
- flowlabel = hextoui(optarg);
- if (errno || (flowlabel & ~IPV6_FLOWINFO_FLOWLABEL)) {
+ err = hextoui(optarg);
+ flowlabel = (__u32)err;
+ if (err < 0 || (flowlabel & ~IPV6_FLOWINFO_FLOWLABEL)) {
fprintf(stderr, "ping: Invalid flowinfo %s\n", optarg);
exit(2);
}
--
1.8.0.1
^ permalink raw reply related
* Re: [PATCH net-next v3] tipc: sk_recv_queue size check only for connectionless sockets
From: Jon Maloy @ 2012-12-10 10:13 UTC (permalink / raw)
To: Ying Xue; +Cc: Paul.Gortmaker, tipc-discussion, nhorman, netdev
In-Reply-To: <1355131380-8542-1-git-send-email-ying.xue@windriver.com>
On 12/10/2012 04:23 AM, Ying Xue wrote:
> The sk_receive_queue limit control is currently performed for all
> arriving messages, disregarding socket and message type. But for
> connectionless sockets this check is redundant, since the protocol
> flow already makes queue overflow impossible.
>
> We move the sk_receive_queue limit control so that it's only performed
> for connectionless sockets, i.e. SOCK_RDM and SOCK_DGRAM type sockets.
>
> However, as Neil Horman specified, we cannot simply force the socket
> receive queue limit against connectionless sockets as it may create a
> DoS vulnerability. For example, if a sender floods a receiver with
> messages containing an invalid set of message importance bits or
> CRITICAL importance, we will queue messages indefinitely.
>
> To avoid DoS attack, socket receive queue will be marked as overflow
> if we receive messages with invalid message importances, meanwhile,
> we also set one new threshold for CRITICAL importance messages.
>
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> v3 changes:
> - set new threshold for CRITICAL message
> - defined an importance factor table to avoid multiplication and
> division operations in rx_queue_full().
> - changed return value of rx_queue_full() from integer to boolean.
>
> net/tipc/socket.c | 44 +++++++++++++++++++-------------------------
> 1 files changed, 19 insertions(+), 25 deletions(-)
>
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 9b4e483..a18a757 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -43,7 +43,7 @@
> #define SS_LISTENING -1 /* socket is listening */
> #define SS_READY -2 /* socket is connectionless */
>
> -#define OVERLOAD_LIMIT_BASE 10000
> +#define OVERLOAD_LIMIT_BASE 5000
> #define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */
>
> struct tipc_sock {
> @@ -73,6 +73,13 @@ static struct proto tipc_proto;
>
> static int sockets_enabled;
>
> +static const u32 msg_importance_factor[] = {
> + OVERLOAD_LIMIT_BASE, /* TIPC_LOW_IMPORTANCE limit */
> + OVERLOAD_LIMIT_BASE * 2, /* TIPC_MEDIUM_IMPORTANCE limit */
> + OVERLOAD_LIMIT_BASE * 100, /* TIPC_HIGH_IMPORTANCE limit */
> + OVERLOAD_LIMIT_BASE * 200 /* TIPC_CRITICAL_IMPORTANCE limit */
> + };
> +
> /*
> * Revised TIPC socket locking policy:
> *
> @@ -1158,28 +1165,17 @@ static void tipc_data_ready(struct sock *sk, int len)
> * rx_queue_full - determine if receive queue can accept another message
> * @msg: message to be added to queue
> * @queue_size: current size of queue
> - * @base: nominal maximum size of queue
> *
> - * Returns 1 if queue is unable to accept message, 0 otherwise
> + * Returns true if queue is unable to accept message, false otherwise
> */
> -static int rx_queue_full(struct tipc_msg *msg, u32 queue_size, u32 base)
> +static bool rx_queue_full(struct tipc_msg *msg, u32 queue_size)
> {
> - u32 threshold;
> u32 imp = msg_importance(msg);
>
> - if (imp == TIPC_LOW_IMPORTANCE)
> - threshold = base;
> - else if (imp == TIPC_MEDIUM_IMPORTANCE)
> - threshold = base * 2;
> - else if (imp == TIPC_HIGH_IMPORTANCE)
> - threshold = base * 100;
> - else
> - return 0;
> + if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE))
> + return true;
This test is not necessary. Such messages have already been filtered out
in tipc_recv_msg() at link level.
The test msg_isdata(), which determines if a message should be sent up to
the port/socket level, is also an implicit test that
importance < TIPC_CRITICAL_IMPORTANCE.
>
> - if (msg_connected(msg))
> - threshold *= 4;
> -
> - return queue_size >= threshold;
> + return queue_size >= msg_importance_factor[imp];
Ok. Less optimal than my suggestion, but also lower risk until we know
the consequences of changing the multiplication factors.
> }
>
> /**
> @@ -1275,7 +1271,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
> {
> struct socket *sock = sk->sk_socket;
> struct tipc_msg *msg = buf_msg(buf);
> - u32 recv_q_len;
> u32 res = TIPC_OK;
>
> /* Reject message if it is wrong sort of message for socket */
> @@ -1285,19 +1280,18 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
> if (sock->state == SS_READY) {
> if (msg_connected(msg))
> return TIPC_ERR_NO_PORT;
> + /* Reject SOCK_DGRAM and SOCK_RDM message if there isn't room
> + * to queue it
> + */
> + if (unlikely(rx_queue_full(msg,
> + skb_queue_len(&sk->sk_receive_queue))))
> + return TIPC_ERR_OVERLOAD;
> } else {
> res = filter_connect(tipc_sk(sk), &buf);
> if (res != TIPC_OK || buf == NULL)
> return res;
> }
>
> - /* Reject message if there isn't room to queue it */
> - recv_q_len = skb_queue_len(&sk->sk_receive_queue);
> - if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
> - if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
> - return TIPC_ERR_OVERLOAD;
> - }
> -
> /* Enqueue message (finally!) */
> TIPC_SKB_CB(buf)->handle = 0;
> __skb_queue_tail(&sk->sk_receive_queue, buf);
>
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
^ permalink raw reply
* Re: [PATCHv6] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-12-10 10:36 UTC (permalink / raw)
To: Rusty Russell; +Cc: Jason Wang, virtualization, netdev, kvm, bhutchings
In-Reply-To: <87ip8aeqi7.fsf@rustcorp.com.au>
On Mon, Dec 10, 2012 at 10:18:32AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > Add multiqueue support to virtio network device.
> > Add a new feature flag VIRTIO_NET_F_MQ for this feature, a new
> > configuration field max_virtqueue_pairs to detect supported number of
> > virtqueues as well as a new command VIRTIO_NET_CTRL_MQ to program
> > packet steering for unidirectional protocols.
>
> One trivial change: alter "8000h" to "0x8000" for consistency in the
> text.
>
> Could I have a Signed-off-by so I can apply it please?
>
> Thanks,
> Rusty.
Right away.
--
MST
^ permalink raw reply
* [PATCHv7] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-12-10 10:40 UTC (permalink / raw)
To: Rusty Russell; +Cc: bhutchings, netdev, kvm, virtualization
Add multiqueue support to virtio network device.
Add a new feature flag VIRTIO_NET_F_MQ for this feature, a new
configuration field max_virtqueue_pairs to detect supported number of
virtqueues as well as a new command VIRTIO_NET_CTRL_MQ to program
packet steering for unidirectional protocols.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes in v7:
- 8000h -> 0x8000 at Rusty's request
Changes in v6:
- rename RFS -> multiqueue to avoid confusion with RFS in linux
mention automatic receive steering as Rusty suggested
Changes in v5:
- Address Rusty's comments.
Changes are only in the text, not the ideas.
- Some minor formatting changes.
Changes in v4:
- address Jason's comments
- have configuration specify the number of VQ pairs and not pairs - 1
Changes in v3:
- rename multiqueue -> rfs this is what we support
- Be more explicit about what driver should do.
- Simplify layout making VQs functionality depend on feature.
- Remove unused commands, only leave in programming # of queues
Changes in v2:
Address Jason's comments on v2:
- Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
this is both clearer and easier to support.
It does not look like we need a separate steering command
since host can just watch tx packets as they go.
- Moved RX and TX steering sections near each other.
- Add motivation for other changes in v2
Changes in v1 (from Jason's rfc):
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
looks nicer to me.
- documented packet steering, added a generalized steering programming
command. Current modes are single queue and host driven multiqueue,
but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
but makes it more efficient to switch between modes as
we can avoid this causing packet reordering.
diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 83f2771..6c09180 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -59,6 +59,7 @@
\author -608949062 "Rusty Russell,,,"
\author -385801441 "Cornelia Huck" cornelia.huck@de.ibm.com
\author 1531152142 "Paolo Bonzini,,,"
+\author 1986246365 "Michael S. Tsirkin"
\end_header
\begin_body
@@ -4170,9 +4171,46 @@ ID 1
\end_layout
\begin_layout Description
-Virtqueues 0:receiveq.
- 1:transmitq.
- 2:controlq
+Virtqueues 0:receiveq
+\change_inserted 1986246365 1352742829
+0
+\change_unchanged
+.
+ 1:transmitq
+\change_inserted 1986246365 1352742832
+0
+\change_deleted 1986246365 1352742947
+.
+
+\change_inserted 1986246365 1352742952
+.
+ ....
+ 2N
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531595
+N=0 if VIRTIO_NET_F_MQ is not negotiated, otherwise N is derived from
+\emph on
+max_virtqueue_pairs
+\emph default
+ control
+\emph on
+
+\emph default
+field.
+
+\end_layout
+
+\end_inset
+
+: receivqN.
+ 2N+1: transmitqN.
+ 2N+
+\change_unchanged
+2:controlq
\begin_inset Foot
status open
@@ -4343,6 +4381,16 @@ VIRTIO_NET_F_CTRL_VLAN
\begin_layout Description
VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1352742767
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1352742808
+VIRTIO_NET_F_MQ(22) Device supports multiqueue with automatic receive steering.
+\change_unchanged
+
\end_layout
\end_deeper
@@ -4355,11 +4403,45 @@ configuration
\begin_inset space ~
\end_inset
-layout Two configuration fields are currently defined.
+layout
+\change_deleted 1986246365 1352743300
+Two
+\change_inserted 1986246365 1354531413
+Three
+\change_unchanged
+ configuration fields are currently defined.
The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1354531470
+ The following read-only field,
+\emph on
+max_virtqueue_pairs
+\emph default
+ only exists if VIRTIO_NET_F_MQ is set.
+ This field specifies the maximum number of each of transmit and receive
+ virtqueues (receiveq0..receiveq
+\emph on
+N
+\emph default
+ and transmitq0..transmitq
+\emph on
+N
+\emph default
+ respectively;
+\emph on
+N
+\emph default
+=
+\emph on
+max_virtqueue_pairs - 1
+\emph default
+) that can be configured once VIRTIO_NET_F_MQ is negotiated.
+ Legal values for this field are 1 to 0x8000.
+
+\change_unchanged
\begin_inset listings
inline false
@@ -4392,6 +4474,17 @@ struct virtio_net_config {
\begin_layout Plain Layout
u16 status;
+\change_inserted 1986246365 1354531427
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531437
+
+ u16 max_virtqueue_pairs;
+\change_unchanged
+
\end_layout
\begin_layout Plain Layout
@@ -4410,7 +4503,24 @@ Device Initialization
\begin_layout Enumerate
The initialization routine should identify the receive and transmission
- virtqueues.
+ virtqueues
+\change_inserted 1986246365 1352744077
+, up to N+1 of each kind
+\change_unchanged
+.
+
+\change_inserted 1986246365 1352743942
+ If VIRTIO_NET_F_MQ feature bit is negotiated,
+\emph on
+N=max_virtqueue_pairs-1
+\emph default
+, otherwise identify
+\emph on
+N=0
+\emph default
+.
+\change_unchanged
+
\end_layout
\begin_layout Enumerate
@@ -4452,10 +4562,33 @@ status
config field.
Otherwise, the link should be assumed active.
+\change_inserted 1986246365 1354529306
+
\end_layout
\begin_layout Enumerate
-The receive virtqueue should be filled with receive buffers.
+
+\change_inserted 1986246365 1354531717
+Only receiveq0, transmitq0 and controlq are used by default.
+ To use more queues driver must negotiate the VIRTIO_NET_F_MQ feature;
+ initialize up to
+\emph on
+max_virtqueue_pairs
+\emph default
+ of each of transmit and receive queues; execute_VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SE
+T command specifying the number of the transmit and receive queues that
+ is going to be used and wait until the device consumes the controlq buffer
+ and acks this command.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Enumerate
+The receive virtqueue
+\change_inserted 1986246365 1352743953
+s
+\change_unchanged
+ should be filled with receive buffers.
This is described in detail below in
\begin_inset Quotes eld
\end_inset
@@ -4550,8 +4683,15 @@ Device Operation
\end_layout
\begin_layout Standard
-Packets are transmitted by placing them in the transmitq, and buffers for
- incoming packets are placed in the receiveq.
+Packets are transmitted by placing them in the transmitq
+\change_inserted 1986246365 1353593685
+0..transmitqN
+\change_unchanged
+, and buffers for incoming packets are placed in the receiveq
+\change_inserted 1986246365 1353593692
+0..receiveqN
+\change_unchanged
+.
In each case, the packet itself is preceeded by a header:
\end_layout
@@ -4861,6 +5001,17 @@ If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at least the
struct virtio_net_hdr
\family default
.
+\change_inserted 1986246365 1353594518
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594638
+If VIRTIO_NET_F_MQ is negotiated, each of receiveq0...receiveqN that will
+ be used should be populated with receive buffers.
+\change_unchanged
+
\end_layout
\begin_layout Subsection*
@@ -5293,8 +5444,151 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
\end_layout
-\begin_layout Enumerate
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1353593879
+Automatic receive steering in multiqueue mode
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354528882
+If the driver negotiates the VIRTIO_NET_F_MQ feature bit (depends on VIRTIO_NET
+_F_CTRL_VQ), it can transmit outgoing packets on one of the multiple transmitq0..t
+ransmitqN and ask the device to queue incoming packets into one the multiple
+ receiveq0..receiveqN depending on the packet flow.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594292
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594178
+
+struct virtio_net_ctrl_mq {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594212
+
+ u16 virtqueue_pairs;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594263
+
+#define VIRTIO_NET_CTRL_MQ 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX 0x8000
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354531492
+Multiqueue is disabled by default.
+ Driver enables multiqueue by executing the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command,
+ specifying the number of the transmit and receive queues that will be used;
+ thus transmitq0..transmitqn and receiveq0..receiveqn where
+\emph on
+n=virtqueue_pairs-1
+\emph default
+ will be used.
+ All these virtqueues must have been pre-configured in advance.
+ The range of legal values for the
+\emph on
+ virtqueue_pairs
+\emph off
+ field is between 1 and
+\emph on
+max_virtqueue_pairs
+\emph off
+.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353595328
+When multiqueue is enabled, device uses automatic receive
+steering based on packet flow.
+Programming of the receive steering classificator is implicit.
+ Transmitting a packet of a specific flow on transmitqX will cause incoming
+ packets for this flow to be steered to receiveqX.
+ For uni-directional protocols, or where no packets have been transmitted
+ yet, device will steer a packet to a random queue out of the specified
+ receiveq0..receiveqn.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354528710
+Multiqueue is disabled by setting
+\emph on
+virtqueue_pairs = 1
+\emph default
+ (this is the default).
+ After the command is consumed by the device, the device will not steer
+ new packets on virtqueues receveq1..receiveqN (i.e.
+ other than receiveq0) nor read from transmitq1..transmitqN (i.e.
+ other than transmitq0); accordingly, driver should not transmit new packets
+ on virtqueues other than transmitq0.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_deleted 1986246365 1353593873
.
+
+\change_unchanged
\end_layout
^ permalink raw reply related
* [PATCH] smsc75xx: only set mac address once on bind
From: Steve Glendinning @ 2012-12-10 11:01 UTC (permalink / raw)
To: netdev; +Cc: Steve Glendinning, Bjorn Mork, Dan Williams
This patch changes when we decide what the device's MAC address
is from per ifconfig up to once when the device is connected.
Without this patch, a manually forced device MAC is overwritten
on ifconfig down/up. Also devices that have no EEPROM are
assigned a new random address on ifconfig down/up instead of
persisting the same one.
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Reported-by: Robert Cunningham <rcunningham@nsmsurveillance.com>
Cc: Bjorn Mork <bjorn@mork.no>
Cc: Dan Williams <dcbw@redhat.com>
---
drivers/net/usb/smsc75xx.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index 1cbd936..251a335 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -1054,8 +1054,6 @@ static int smsc75xx_reset(struct usbnet *dev)
netif_dbg(dev, ifup, dev->net, "PHY reset complete\n");
- smsc75xx_init_mac_address(dev);
-
ret = smsc75xx_set_mac_address(dev);
if (ret < 0) {
netdev_warn(dev->net, "Failed to set mac address\n");
@@ -1422,6 +1420,14 @@ static int smsc75xx_bind(struct usbnet *dev, struct usb_interface *intf)
dev->net->hw_features = NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_RXCSUM;
+ ret = smsc75xx_wait_ready(dev, 0);
+ if (ret < 0) {
+ netdev_warn(dev->net, "device not ready in smsc75xx_bind\n");
+ return ret;
+ }
+
+ smsc75xx_init_mac_address(dev);
+
/* Init all registers */
ret = smsc75xx_reset(dev);
if (ret < 0) {
--
1.7.10.4
^ permalink raw reply related
* [PATCH 1/2] smsc95xx: fix register dump of last register
From: Steve Glendinning @ 2012-12-10 11:03 UTC (permalink / raw)
To: netdev; +Cc: Steve Glendinning
This patch fixes the ethtool register dump for smsc95xx to dump
all 4 bytes of the final register (COE_CR) instead of just the
first byte.
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
drivers/net/usb/smsc95xx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index f7e1e18..a00dcc4 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -691,7 +691,7 @@ static int smsc95xx_ethtool_set_eeprom(struct net_device *netdev,
static int smsc95xx_ethtool_getregslen(struct net_device *netdev)
{
/* all smsc95xx registers */
- return COE_CR - ID_REV + 1;
+ return COE_CR - ID_REV + sizeof(u32);
}
static void
--
1.7.10.4
^ permalink raw reply related
* [PATCH 2/2] smsc95xx: fix async register writes on big endian platforms
From: Steve Glendinning @ 2012-12-10 11:03 UTC (permalink / raw)
To: netdev; +Cc: Steve Glendinning
In-Reply-To: <1355137388-2938-1-git-send-email-steve.glendinning@shawell.net>
This patch fixes a missing endian conversion which results in the
interface failing to come up on BE platforms.
It also removes an unnecessary pointer dereference from this
function.
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
drivers/net/usb/smsc95xx.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index a00dcc4..9b73670 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -421,15 +421,19 @@ static int smsc95xx_write_eeprom(struct usbnet *dev, u32 offset, u32 length,
}
static int __must_check smsc95xx_write_reg_async(struct usbnet *dev, u16 index,
- u32 *data)
+ u32 data)
{
const u16 size = 4;
+ u32 buf;
int ret;
+ buf = data;
+ cpu_to_le32s(&buf);
+
ret = usbnet_write_cmd_async(dev, USB_VENDOR_REQUEST_WRITE_REGISTER,
USB_DIR_OUT | USB_TYPE_VENDOR |
USB_RECIP_DEVICE,
- 0, index, data, size);
+ 0, index, &buf, size);
if (ret < 0)
netdev_warn(dev->net, "Error write async cmd, sts=%d\n",
ret);
@@ -490,15 +494,15 @@ static void smsc95xx_set_multicast(struct net_device *netdev)
spin_unlock_irqrestore(&pdata->mac_cr_lock, flags);
/* Initiate async writes, as we can't wait for completion here */
- ret = smsc95xx_write_reg_async(dev, HASHH, &pdata->hash_hi);
+ ret = smsc95xx_write_reg_async(dev, HASHH, pdata->hash_hi);
if (ret < 0)
netdev_warn(dev->net, "failed to initiate async write to HASHH\n");
- ret = smsc95xx_write_reg_async(dev, HASHL, &pdata->hash_lo);
+ ret = smsc95xx_write_reg_async(dev, HASHL, pdata->hash_lo);
if (ret < 0)
netdev_warn(dev->net, "failed to initiate async write to HASHL\n");
- ret = smsc95xx_write_reg_async(dev, MAC_CR, &pdata->mac_cr);
+ ret = smsc95xx_write_reg_async(dev, MAC_CR, pdata->mac_cr);
if (ret < 0)
netdev_warn(dev->net, "failed to initiate async write to MAC_CR\n");
}
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCH] ipv4: ip_check_defrag must not modify skb before unsharing
From: Eric Leblond @ 2012-12-10 11:02 UTC (permalink / raw)
To: Johannes Berg
Cc: David Miller, netdev, linux-wireless, linville, Eric Dumazet
In-Reply-To: <1355132466.9857.6.camel@jlt4.sipsolutions.net>
Hello,
On Mon, 2012-12-10 at 10:41 +0100, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
>
> ip_check_defrag() might be called from af_packet within the
> RX path where shared SKBs are used, so it must not modify
> the input SKB before it has unshared it for defragmentation.
> Use skb_copy_bits() to get the IP header and only pull in
> everything later.
>
> The same is true for the other caller in macvlan as it is
> called from dev->rx_handler which can also get a shared SKB.
I've applied the patch and built a new kernel. I did not manage to get
it crashed when using the two techniques (suspend to ram and down/up
interface) that were working well to crash kernel without the patch.
BR,
> Reported-by: Eric Leblond <eric@regit.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
> For some versions of the kernel, this code goes into af_packet.c
>
> net/ipv4/ip_fragment.c | 19 +++++++++----------
> 1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 448e685..8d5cc75 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -707,28 +707,27 @@ EXPORT_SYMBOL(ip_defrag);
>
> struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user)
> {
> - const struct iphdr *iph;
> + struct iphdr iph;
> u32 len;
>
> if (skb->protocol != htons(ETH_P_IP))
> return skb;
>
> - if (!pskb_may_pull(skb, sizeof(struct iphdr)))
> + if (!skb_copy_bits(skb, 0, &iph, sizeof(iph)))
> return skb;
>
> - iph = ip_hdr(skb);
> - if (iph->ihl < 5 || iph->version != 4)
> + if (iph.ihl < 5 || iph.version != 4)
> return skb;
> - if (!pskb_may_pull(skb, iph->ihl*4))
> - return skb;
> - iph = ip_hdr(skb);
> - len = ntohs(iph->tot_len);
> - if (skb->len < len || len < (iph->ihl * 4))
> +
> + len = ntohs(iph.tot_len);
> + if (skb->len < len || len < (iph.ihl * 4))
> return skb;
>
> - if (ip_is_fragment(ip_hdr(skb))) {
> + if (ip_is_fragment(&iph)) {
> skb = skb_share_check(skb, GFP_ATOMIC);
> if (skb) {
> + if (!pskb_may_pull(skb, iph.ihl*4))
> + return skb;
> if (pskb_trim_rcsum(skb, len))
> return skb;
> memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
--
Eric Leblond <eric@regit.org>
Blog: https://home.regit.org/
^ permalink raw reply
* [net-next:master 204/207] drivers/net/virtio_net.c:1312 virtnet_alloc_queues() error: potential null dereference 'vi->rq'. (kzalloc returns null)
From: kbuild test robot @ 2012-12-10 11:14 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, Krishna Kumar
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head: 65d2897c0f1b240420d657f41e561239fa10ba94
commit: 986a4f4d452dec004697f667439d27c3fda9c928 [204/207] virtio_net: multiqueue support
smatch warnings:
+ drivers/net/virtio_net.c:1312 virtnet_alloc_queues() error: potential null dereference 'vi->rq'. (kzalloc returns null)
vim +1312 drivers/net/virtio_net.c
986a4f4d Jason Wang 2012-12-07 1296 return ret;
986a4f4d Jason Wang 2012-12-07 1297 }
986a4f4d Jason Wang 2012-12-07 1298
986a4f4d Jason Wang 2012-12-07 1299 static int virtnet_alloc_queues(struct virtnet_info *vi)
986a4f4d Jason Wang 2012-12-07 1300 {
986a4f4d Jason Wang 2012-12-07 1301 int i;
986a4f4d Jason Wang 2012-12-07 1302
986a4f4d Jason Wang 2012-12-07 1303 vi->sq = kzalloc(sizeof(*vi->sq) * vi->max_queue_pairs, GFP_KERNEL);
986a4f4d Jason Wang 2012-12-07 1304 if (!vi->sq)
986a4f4d Jason Wang 2012-12-07 1305 goto err_sq;
986a4f4d Jason Wang 2012-12-07 1306 vi->rq = kzalloc(sizeof(*vi->rq) * vi->max_queue_pairs, GFP_KERNEL);
986a4f4d Jason Wang 2012-12-07 1307 if (!vi->sq)
986a4f4d Jason Wang 2012-12-07 1308 goto err_rq;
986a4f4d Jason Wang 2012-12-07 1309
986a4f4d Jason Wang 2012-12-07 1310 INIT_DELAYED_WORK(&vi->refill, refill_work);
986a4f4d Jason Wang 2012-12-07 1311 for (i = 0; i < vi->max_queue_pairs; i++) {
986a4f4d Jason Wang 2012-12-07 @1312 vi->rq[i].pages = NULL;
986a4f4d Jason Wang 2012-12-07 1313 netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
986a4f4d Jason Wang 2012-12-07 1314 napi_weight);
986a4f4d Jason Wang 2012-12-07 1315
986a4f4d Jason Wang 2012-12-07 1316 sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
986a4f4d Jason Wang 2012-12-07 1317 sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
986a4f4d Jason Wang 2012-12-07 1318 }
986a4f4d Jason Wang 2012-12-07 1319
986a4f4d Jason Wang 2012-12-07 1320 return 0;
---
0-DAY kernel build testing backend Open Source Technology Center
Fengguang Wu, Yuanhan Liu Intel Corporation
^ permalink raw reply
* Re: [PATCH 1/1] net: ethernet: davinci_cpdma: Add boundary for rx and tx descriptors
From: Mugunthan V N @ 2012-12-10 11:29 UTC (permalink / raw)
To: Christian Riesch; +Cc: netdev, davem, linux-arm-kernel, linux-omap, s.hauer
In-Reply-To: <CABkLObqOEMKMD3df5pEDJWsGPA=oAj34Yz867D7p2hGdTr0WCA@mail.gmail.com>
On 12/10/2012 1:54 PM, Christian Riesch wrote:
> Hi again,
>
> On Mon, Dec 10, 2012 at 8:37 AM, Mugunthan V N <mugunthanvnm@ti.com> wrote:
>> When there is heavy transmission traffic in the CPDMA, then Rx descriptors
>> memory is also utilized as tx desc memory this leads to reduced rx desc memory
>> which leads to poor performance.
>>
> "poor performance" is an understatement, see Sascha's description of
> his patch. At initialization of the driver, half of the descriptors in
> the pool are allocated for rx. When a packet arrives, one of the rx
> descriptors is released and a new one is allocated. If tx allocates
> this descriptor in the meantime, it is lost for rx forever! If tx
> consumes all rx descriptors this way, the rx channel is dead!
>
> Regards, Christian
>
>> This patch adds boundary for tx and rx descriptors in bd ram dividing the
>> descriptor memory to ensure that during heavy transmission tx doesn't use
>> rx descriptors.
>>
>> This patch is already applied to davinci_emac driver, since CPSW and
>> davici_dmac uses the same CPDMA, moving the boundry seperation from
>> Davinci EMAC driver to CPDMA driver which was done in the following
>> commit
>>
>> commit 86d8c07ff2448eb4e860e50f34ef6ee78e45c40c
>> Author: Sascha Hauer <s.hauer@pengutronix.de>
>> Date: Tue Jan 3 05:27:47 2012 +0000
>>
>> net/davinci: do not use all descriptors for tx packets
>>
>> The driver uses a shared pool for both rx and tx descriptors.
>> During open it queues fixed number of 128 descriptors for receive
>> packets. For each received packet it tries to queue another
>> descriptor. If this fails the descriptor is lost for rx.
>> The driver has no limitation on tx descriptors to use, so it
>> can happen during a nmap / ping -f attack that the driver
>> allocates all descriptors for tx and looses all rx descriptors.
>> The driver stops working then.
>> To fix this limit the number of tx descriptors used to half of
>> the descriptors available, the rx path uses the other half.
>>
>> Tested on a custom board using nmap / ping -f to the board from
>> two different hosts.
>>
>> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Will change the commit description and resubmit the patch.
Regards
Mugunthan V N
^ permalink raw reply
* Re: [PATCH 1/1] net: ethernet: davinci_cpdma: Add boundary for rx and tx descriptors
From: Mugunthan V N @ 2012-12-10 11:28 UTC (permalink / raw)
To: Christian Riesch; +Cc: netdev, davem, linux-arm-kernel, linux-omap, s.hauer
In-Reply-To: <CABkLObp1cZrNR65KJmOJXCB+W_1ZriCJQdL1TEmkJCptXHFDXw@mail.gmail.com>
On 12/10/2012 1:41 PM, Christian Riesch wrote:
> Hi,
>
> On Mon, Dec 10, 2012 at 8:37 AM, Mugunthan V N <mugunthanvnm@ti.com> wrote:
>> When there is heavy transmission traffic in the CPDMA, then Rx descriptors
>> memory is also utilized as tx desc memory this leads to reduced rx desc memory
>> which leads to poor performance.
>>
>> This patch adds boundary for tx and rx descriptors in bd ram dividing the
>> descriptor memory to ensure that during heavy transmission tx doesn't use
>> rx descriptors.
>>
>> This patch is already applied to davinci_emac driver, since CPSW and
>> davici_dmac uses the same CPDMA, moving the boundry seperation from
>> Davinci EMAC driver to CPDMA driver which was done in the following
>> commit
>>
>> commit 86d8c07ff2448eb4e860e50f34ef6ee78e45c40c
>> Author: Sascha Hauer <s.hauer@pengutronix.de>
>> Date: Tue Jan 3 05:27:47 2012 +0000
>>
>> net/davinci: do not use all descriptors for tx packets
>>
>> The driver uses a shared pool for both rx and tx descriptors.
>> During open it queues fixed number of 128 descriptors for receive
>> packets. For each received packet it tries to queue another
>> descriptor. If this fails the descriptor is lost for rx.
>> The driver has no limitation on tx descriptors to use, so it
>> can happen during a nmap / ping -f attack that the driver
>> allocates all descriptors for tx and looses all rx descriptors.
>> The driver stops working then.
>> To fix this limit the number of tx descriptors used to half of
>> the descriptors available, the rx path uses the other half.
>>
>> Tested on a custom board using nmap / ping -f to the board from
>> two different hosts.
>>
>> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
>> ---
>> drivers/net/ethernet/ti/davinci_cpdma.c | 20 ++++++++++++++------
>> drivers/net/ethernet/ti/davinci_emac.c | 8 --------
>> 2 files changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
>> index 4995673..d37f546 100644
>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
>> @@ -105,13 +105,13 @@ struct cpdma_ctlr {
>> };
>>
>> struct cpdma_chan {
>> + struct cpdma_desc __iomem *head, *tail;
>> + void __iomem *hdp, *cp, *rxfree;
>> enum cpdma_state state;
>> struct cpdma_ctlr *ctlr;
>> int chan_num;
>> spinlock_t lock;
>> - struct cpdma_desc __iomem *head, *tail;
>> int count;
>> - void __iomem *hdp, *cp, *rxfree;
> Why?
Its just a code clean-up to have iomem variables at one place.
>
>> u32 mask;
>> cpdma_handler_fn handler;
>> enum dma_data_direction dir;
>> @@ -217,7 +217,7 @@ desc_from_phys(struct cpdma_desc_pool *pool, dma_addr_t dma)
>> }
>>
>> static struct cpdma_desc __iomem *
>> -cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc)
>> +cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc, bool is_rx)
>> {
>> unsigned long flags;
>> int index;
>> @@ -225,8 +225,14 @@ cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc)
>>
>> spin_lock_irqsave(&pool->lock, flags);
>>
>> - index = bitmap_find_next_zero_area(pool->bitmap, pool->num_desc, 0,
>> - num_desc, 0);
>> + if (is_rx) {
>> + index = bitmap_find_next_zero_area(pool->bitmap,
>> + pool->num_desc/2, 0, num_desc, 0);
>> + } else {
>> + index = bitmap_find_next_zero_area(pool->bitmap,
>> + pool->num_desc, pool->num_desc/2, num_desc, 0);
>> + }
> Would it make sense to use two separate pools for rx and tx instead,
> struct cpdma_desc_pool *rxpool, *txpool? It would result in using
> separate spinlocks for rx and tx, could this be an advantage? (I am a
> newbie in this field...)
I don't think separating pool will give an advantage, the same is
achieved by separating
the pool into first half and last half.
Regards
Mugunthan V N
^ permalink raw reply
* RE: ipgre rss is broken since gro
From: Dmitry Kravkov @ 2012-12-10 11:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <CANn89iKtWuN=CRn-JBzmmo8jCCJQhZjAT8jA+9dgfmup99O03g@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2635 bytes --]
> -----Original Message-----
> From: Eric Dumazet [mailto:edumazet@google.com]
> Sent: Monday, December 10, 2012 1:27 AM
> To: Dmitry Kravkov
> Cc: netdev@vger.kernel.org
> Subject: Re: ipgre rss is broken since gro
>
> On Sun, Dec 9, 2012 at 12:49 PM, Dmitry Kravkov <dmitry@broadcom.com>
> wrote:
>
> > for this item: drop_watch does not show any drops (i've disable all
> > other interfaces for clear env)
> > I will explain a little bit more the setup:
> > bnx2x device (under testing) is configured for RSS for IPGRE packets.
> > Sending multiple (3) TCP_STREAM causes ip_gre interface to disappear
> > packets (even ICMP).
> > This is not happening with single TCP_STREAM, or before gro_cell
> > introduction.
> >
> I dont know, I tried a bnx2x setup, and 100 tcp flows, no special problem.
Current bnx2x do not apply RSS for GRE, non GRE RSS is working w/o problem.
>
> If you receive a lot of packets on a single RX queue, they might be
> dropped because cpu cant cope with the load
> (This has nothing to do with GRE or GRO )
>
CPU is not loaded at all
> cat /proc/net/softnet_stat
Please find attached.
For gre interface RX and DROP statistics are advancing simultaneously (by one each ICMP request):
[root@ ~]# ifconfig gre
gre Link encap:UNSPEC HWaddr C0-A8-0A-40-73-72-83-D2-00-00-00-00-00-00-00-00
inet addr:8.0.0.1 P-t-P:8.0.0.1 Mask:255.255.255.0
inet6 addr: fe80::5efe:c0a8:a40/64 Scope:Link
UP POINTOPOINT RUNNING NOARP MTU:1476 Metric:1
RX packets:1646824 errors:0 dropped:51610 overruns:0 frame:0
TX packets:140519 errors:1 dropped:0 overruns:0 carrier:1
collisions:0 txqueuelen:0
RX bytes:2357650904 (2.1 GiB) TX bytes:7309072 (6.9 MiB)
[root@ ~]# ifconfig gre
gre Link encap:UNSPEC HWaddr C0-A8-0A-40-73-72-83-82-00-00-00-00-00-00-00-00
inet addr:8.0.0.1 P-t-P:8.0.0.1 Mask:255.255.255.0
inet6 addr: fe80::5efe:c0a8:a40/64 Scope:Link
UP POINTOPOINT RUNNING NOARP MTU:1476 Metric:1
RX packets:1646826 errors:0 dropped:51612 overruns:0 frame:0
TX packets:140519 errors:1 dropped:0 overruns:0 carrier:1
collisions:0 txqueuelen:0
RX bytes:2357651072 (2.1 GiB) TX bytes:7309072 (6.9 MiB)
[root@ ~]# tcpdump -i gre
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on gre, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
2 packets dropped by interface
[-- Attachment #2: stat1 --]
[-- Type: application/octet-stream, Size: 3600 bytes --]
001b96a3 00000000 00001cb5 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000036f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000181 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000001e6 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000012e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000007b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000a 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000029e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000043 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000007 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00008192 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00005682 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000066bd 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00002ab5 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001d11 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000165cf 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000203 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000009b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000c914 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000003c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000788 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001dab 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000972f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[-- Attachment #3: stat2 --]
[-- Type: application/octet-stream, Size: 3600 bytes --]
001b96a3 00000000 00001cb5 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000036f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000181 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000001e6 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000012e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000007b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000a 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000029e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000043 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000007 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00008192 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00005682 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000066bd 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00002ab5 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001d11 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000165cf 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000203 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000009b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000c919 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000003c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000789 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001dab 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000972f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
^ permalink raw reply
* [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Steve Glendinning @ 2012-12-10 11:51 UTC (permalink / raw)
To: netdev; +Cc: Ming Lei, Oliver Neukum, linux-usb, gregkh, Steve Glendinning
In-Reply-To: <1353607526-19307-6-git-send-email-steve.glendinning@shawell.net>
This is a work in-progress patch. It's not yet complete but
I thought I'd share it for comments, feedback and testing.
This patch enables dynamic autosuspend for all devices
supported by the driver, but it will only actually work on
LAN9500A (as this has a new SUSPEND3 mode for this purpose).
Unfortunately we don't know if the connected device supports
this feature until we query its ID register at runtime.
On unsupported devices (LAN9500/9512/9514) this patch claims
to support the feature but if enabled it will always return
failure to the autosuspend call (and fill up the kernel log
with a message every 2s).
Suggestions on how best to indicate this capability at runtime
instead of compile-time would be appreciated, so we don't have
to repeatedly fail if accidentally enabled. Or maybe this is
actually the best way?
We should also be able to identify devices supporting
autosuspend from the USB VID/PID if this would help.
UPDATE: reposting this to a wider audience due to lack of
feedback last time round
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
drivers/net/usb/smsc95xx.c | 136 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 135 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 9b73670..d9c6674 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -55,6 +55,13 @@
#define FEATURE_PHY_NLP_CROSSOVER (0x02)
#define FEATURE_AUTOSUSPEND (0x04)
+#define SUSPEND_SUSPEND0 (0x01)
+#define SUSPEND_SUSPEND1 (0x02)
+#define SUSPEND_SUSPEND2 (0x04)
+#define SUSPEND_SUSPEND3 (0x08)
+#define SUSPEND_ALLMODES (SUSPEND_SUSPEND0 | SUSPEND_SUSPEND1 | \
+ SUSPEND_SUSPEND2 | SUSPEND_SUSPEND3)
+
struct smsc95xx_priv {
u32 mac_cr;
u32 hash_hi;
@@ -62,6 +69,7 @@ struct smsc95xx_priv {
u32 wolopts;
spinlock_t mac_cr_lock;
u8 features;
+ u8 suspend_flags;
};
static bool turbo_mode = true;
@@ -1341,6 +1349,8 @@ static int smsc95xx_enter_suspend0(struct usbnet *dev)
if (ret < 0)
netdev_warn(dev->net, "Error reading PM_CTRL\n");
+ pdata->suspend_flags |= SUSPEND_SUSPEND0;
+
return ret;
}
@@ -1393,11 +1403,14 @@ static int smsc95xx_enter_suspend1(struct usbnet *dev)
if (ret < 0)
netdev_warn(dev->net, "Error writing PM_CTRL\n");
+ pdata->suspend_flags |= SUSPEND_SUSPEND1;
+
return ret;
}
static int smsc95xx_enter_suspend2(struct usbnet *dev)
{
+ struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
u32 val;
int ret;
@@ -1414,9 +1427,96 @@ static int smsc95xx_enter_suspend2(struct usbnet *dev)
if (ret < 0)
netdev_warn(dev->net, "Error writing PM_CTRL\n");
+ pdata->suspend_flags |= SUSPEND_SUSPEND2;
+
return ret;
}
+static int smsc95xx_enter_suspend3(struct usbnet *dev)
+{
+ struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
+ u32 val;
+ int ret;
+
+ ret = smsc95xx_read_reg_nopm(dev, RX_FIFO_INF, &val);
+ if (ret < 0) {
+ netdev_warn(dev->net, "Error reading RX_FIFO_INF");
+ return ret;
+ }
+
+ if (val & 0xFFFF) {
+ netdev_info(dev->net, "rx fifo not empty in autosuspend");
+ return -EBUSY;
+ }
+
+ ret = smsc95xx_read_reg_nopm(dev, PM_CTRL, &val);
+ if (ret < 0) {
+ netdev_warn(dev->net, "Error reading PM_CTRL");
+ return ret;
+ }
+
+ val &= ~(PM_CTL_SUS_MODE_ | PM_CTL_WUPS_ | PM_CTL_PHY_RST_);
+ val |= PM_CTL_SUS_MODE_3 | PM_CTL_RES_CLR_WKP_STS;
+
+ ret = smsc95xx_write_reg_nopm(dev, PM_CTRL, val);
+ if (ret < 0) {
+ netdev_warn(dev->net, "Error writing PM_CTRL");
+ return ret;
+ }
+
+ /* clear wol status */
+ val &= ~PM_CTL_WUPS_;
+ val |= PM_CTL_WUPS_WOL_;
+
+ ret = smsc95xx_write_reg_nopm(dev, PM_CTRL, val);
+ if (ret < 0) {
+ netdev_warn(dev->net, "Error writing PM_CTRL");
+ return ret;
+ }
+
+ pdata->suspend_flags |= SUSPEND_SUSPEND3;
+
+ return 0;
+}
+
+static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
+{
+ int ret;
+
+ if (!netif_running(dev->net)) {
+ /* interface is ifconfig down so fully power down hw */
+ netdev_dbg(dev->net, "autosuspend entering SUSPEND2");
+ return smsc95xx_enter_suspend2(dev);
+ }
+
+ if (!link_up) {
+ /* link is down so enter EDPD mode */
+ netdev_dbg(dev->net, "autosuspend entering SUSPEND1");
+
+ /* enable PHY wakeup events for if cable is attached */
+ ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
+ PHY_INT_MASK_ANEG_COMP_);
+ if (ret < 0) {
+ netdev_warn(dev->net, "error enabling PHY wakeup ints");
+ return ret;
+ }
+
+ netdev_info(dev->net, "entering SUSPEND1 mode");
+ return smsc95xx_enter_suspend1(dev);
+ }
+
+ /* enable PHY wakeup events so we remote wakeup if cable is pulled */
+ ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
+ PHY_INT_MASK_LINK_DOWN_);
+ if (ret < 0) {
+ netdev_warn(dev->net, "error enabling PHY wakeup ints");
+ return ret;
+ }
+
+ netdev_dbg(dev->net, "autosuspend entering SUSPEND3");
+ return smsc95xx_enter_suspend3(dev);
+}
+
static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
{
struct usbnet *dev = usb_get_intfdata(intf);
@@ -1424,15 +1524,35 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
u32 val, link_up;
int ret;
+ /* TODO: don't indicate this feature to usb framework if
+ * our current hardware doesn't have the capability
+ */
+ if ((message.event == PM_EVENT_AUTO_SUSPEND) &&
+ (!(pdata->features & FEATURE_AUTOSUSPEND))) {
+ netdev_warn(dev->net, "autosuspend not supported");
+ return -EBUSY;
+ }
+
ret = usbnet_suspend(intf, message);
if (ret < 0) {
netdev_warn(dev->net, "usbnet_suspend error\n");
return ret;
}
+ if (pdata->suspend_flags) {
+ netdev_warn(dev->net, "error during last resume");
+ pdata->suspend_flags = 0;
+ }
+
/* determine if link is up using only _nopm functions */
link_up = smsc95xx_link_ok_nopm(dev);
+ if (message.event == PM_EVENT_AUTO_SUSPEND) {
+ ret = smsc95xx_autosuspend(dev, link_up);
+ goto done;
+ }
+
+ /* if we get this far we're not autosuspending */
/* if no wol options set, or if link is down and we're not waking on
* PHY activity, enter lowest power SUSPEND2 mode
*/
@@ -1694,12 +1814,18 @@ static int smsc95xx_resume(struct usb_interface *intf)
{
struct usbnet *dev = usb_get_intfdata(intf);
struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
+ u8 suspend_flags = pdata->suspend_flags;
int ret;
u32 val;
BUG_ON(!dev);
- if (pdata->wolopts) {
+ netdev_dbg(dev->net, "resume suspend_flags=0x%02x", suspend_flags);
+
+ /* do this first to ensure it's cleared even in error case */
+ pdata->suspend_flags = 0;
+
+ if (suspend_flags & SUSPEND_ALLMODES) {
/* clear wake-up sources */
ret = smsc95xx_read_reg_nopm(dev, WUCSR, &val);
if (ret < 0) {
@@ -1891,6 +2017,12 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
return skb;
}
+static int smsc95xx_manage_power(struct usbnet *dev, int on)
+{
+ dev->intf->needs_remote_wakeup = on;
+ return 0;
+}
+
static const struct driver_info smsc95xx_info = {
.description = "smsc95xx USB 2.0 Ethernet",
.bind = smsc95xx_bind,
@@ -1900,6 +2032,7 @@ static const struct driver_info smsc95xx_info = {
.rx_fixup = smsc95xx_rx_fixup,
.tx_fixup = smsc95xx_tx_fixup,
.status = smsc95xx_status,
+ .manage_power = smsc95xx_manage_power,
.flags = FLAG_ETHER | FLAG_SEND_ZLP | FLAG_LINK_INTR,
};
@@ -2007,6 +2140,7 @@ static struct usb_driver smsc95xx_driver = {
.reset_resume = smsc95xx_resume,
.disconnect = usbnet_disconnect,
.disable_hub_initiated_lpm = 1,
+ .supports_autosuspend = 1,
};
module_usb_driver(smsc95xx_driver);
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Oliver Neukum @ 2012-12-10 12:09 UTC (permalink / raw)
To: Steve Glendinning
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Ming Lei,
linux-usb-u79uwXL29TY76Z2rM5mHXA,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
In-Reply-To: <1355140301-10518-1-git-send-email-steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org>
On Monday 10 December 2012 11:51:41 Steve Glendinning wrote:
> This is a work in-progress patch. It's not yet complete but
> I thought I'd share it for comments, feedback and testing.
>
> This patch enables dynamic autosuspend for all devices
> supported by the driver, but it will only actually work on
> LAN9500A (as this has a new SUSPEND3 mode for this purpose).
So this is a problem with remote wakeup on older hardware?
> Unfortunately we don't know if the connected device supports
> this feature until we query its ID register at runtime.
>
> On unsupported devices (LAN9500/9512/9514) this patch claims
> to support the feature but if enabled it will always return
> failure to the autosuspend call (and fill up the kernel log
> with a message every 2s).
>
> Suggestions on how best to indicate this capability at runtime
> instead of compile-time would be appreciated, so we don't have
> to repeatedly fail if accidentally enabled. Or maybe this is
> actually the best way?
If this is a problem with remote wakeup, you should up the
pm counter (usb_autopm_get_noresume()) in .manage_power
That was the reason I implemented this is a callback and not as
a helper in usbnet.
Regards
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [Patch net-next] bridge: fix seq check in br_mdb_dump()
From: Cong Wang @ 2012-12-10 12:15 UTC (permalink / raw)
To: netdev
Cc: Herbert Xu, Stephen Hemminger, David S. Miller, Thomas Graf,
Jesper Dangaard Brouer, Cong Wang
From: Cong Wang <amwang@redhat.com>
In case of rehashing, introduce a global variable 'br_mdb_rehash_seq'
which gets increased every time when rehashing, and assign
net->dev_base_seq + br_mdb_rehash_seq to cb->seq.
In theory cb->seq could be wrapped to zero, but this is not
easy to fix, as net->dev_base_seq is not visible inside
br_mdb_rehash(). In practice, this is rare.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
net/bridge/br_mdb.c | 6 ++----
net/bridge/br_multicast.c | 2 ++
net/bridge/br_private.h | 1 +
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index edc0d73..ccc43a9 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -117,10 +117,8 @@ static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock();
- /* TODO: in case of rehashing, we need to check
- * consistency for dumping.
- */
- cb->seq = net->dev_base_seq;
+ /* In theory this could be wrapped to 0... */
+ cb->seq = net->dev_base_seq + br_mdb_rehash_seq;
for_each_netdev_rcu(net, dev) {
if (dev->priv_flags & IFF_EBRIDGE) {
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 68e375a..847b98a1 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -37,6 +37,7 @@
rcu_dereference_protected(X, lockdep_is_held(&br->multicast_lock))
static void br_multicast_start_querier(struct net_bridge *br);
+unsigned int br_mdb_rehash_seq;
#if IS_ENABLED(CONFIG_IPV6)
static inline int ipv6_is_transient_multicast(const struct in6_addr *addr)
@@ -338,6 +339,7 @@ static int br_mdb_rehash(struct net_bridge_mdb_htable __rcu **mdbp, int max,
return err;
}
+ br_mdb_rehash_seq++;
call_rcu_bh(&mdb->rcu, br_mdb_free);
out:
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index ae0a6ec..f95b766 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -411,6 +411,7 @@ extern int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __us
/* br_multicast.c */
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+extern unsigned int br_mdb_rehash_seq;
extern int br_multicast_rcv(struct net_bridge *br,
struct net_bridge_port *port,
struct sk_buff *skb);
^ permalink raw reply related
* [Patch net-next] virtio_net: fix a typo in virtnet_alloc_queues()
From: Cong Wang @ 2012-12-10 12:24 UTC (permalink / raw)
To: netdev; +Cc: Jason Wang, David S. Miller, Cong Wang
Obviously it should check !vi->rq.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a644eeb..68d64f0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1347,7 +1347,7 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
if (!vi->sq)
goto err_sq;
vi->rq = kzalloc(sizeof(*vi->rq) * vi->max_queue_pairs, GFP_KERNEL);
- if (!vi->sq)
+ if (!vi->rq)
goto err_rq;
INIT_DELAYED_WORK(&vi->refill, refill_work);
^ permalink raw reply related
* Re: [net-next:master 195/198] net/bridge/br_mdb.c:79:35: sparse: incompatible types in comparison expression (different address spaces)
From: Cong Wang @ 2012-12-10 12:36 UTC (permalink / raw)
To: kbuild test robot; +Cc: netdev
In-Reply-To: <50c3fd5e.qB65a7/5e+IZT2ix%fengguang.wu@intel.com>
On Sun, 2012-12-09 at 10:54 +0800, kbuild test robot wrote:
> tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> head: 9ecb9aabaf634677c77af467f4e3028b09d7bcda
> commit: ee07c6e7a6f8a25c18f0a6b18152fbd7499245f6 [195/198] bridge: export multicast database via netlink
>
>
> sparse warnings:
>
> + net/bridge/br_mdb.c:79:35: sparse: incompatible types in comparison expression (different address spaces)
Hi, Fengguang,
I am not sure if I understand this warning correctly. Does the following
patch fix it?
Thanks!
---------------->
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 2528328..0bc0e13 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -77,7 +77,7 @@ static int br_mdb_fill_info(struct sk_buff *skb,
struct netlink_callback *cb,
}
for (pp = &mp->ports;
- (p = rcu_dereference(*pp)) != NULL;
+ (p = rcu_dereference_protected(*pp, 1)) !=
NULL;
pp = &p->next) {
port = p->port;
if (port) {
^ permalink raw reply related
* Re: [net-next:master 195/198] net/bridge/br_mdb.c:79:35: sparse: incompatible types in comparison expression (different address spaces)
From: Cong Wang @ 2012-12-10 12:44 UTC (permalink / raw)
To: kbuild test robot; +Cc: netdev
In-Reply-To: <1355143004.11752.19.camel@cr0>
On Mon, 2012-12-10 at 20:36 +0800, Cong Wang wrote:
> On Sun, 2012-12-09 at 10:54 +0800, kbuild test robot wrote:
> > tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> > head: 9ecb9aabaf634677c77af467f4e3028b09d7bcda
> > commit: ee07c6e7a6f8a25c18f0a6b18152fbd7499245f6 [195/198] bridge: export multicast database via netlink
> >
> >
> > sparse warnings:
> >
> > + net/bridge/br_mdb.c:79:35: sparse: incompatible types in comparison expression (different address spaces)
>
> Hi, Fengguang,
>
> I am not sure if I understand this warning correctly. Does the following
> patch fix it?
>
Hmm, no, probably this one:
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 2528328..cd6735c 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -63,7 +63,7 @@ static int br_mdb_fill_info(struct sk_buff *skb,
struct netlink_callback *cb,
for (i = 0; i < mdb->max; i++) {
struct hlist_node *h;
struct net_bridge_mdb_entry *mp;
- struct net_bridge_port_group *p, **pp;
+ struct net_bridge_port_group __rcu *p, **pp;
struct net_bridge_port *port;
hlist_for_each_entry_rcu(mp, h, &mdb->mhash[i],
hlist[mdb->ver]) {
^ permalink raw reply related
* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Ming Lei @ 2012-12-10 13:59 UTC (permalink / raw)
To: Steve Glendinning
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Oliver Neukum,
linux-usb-u79uwXL29TY76Z2rM5mHXA,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
In-Reply-To: <1355140301-10518-1-git-send-email-steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org>
On Mon, Dec 10, 2012 at 7:51 PM, Steve Glendinning
<steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org> wrote:
> This is a work in-progress patch. It's not yet complete but
> I thought I'd share it for comments, feedback and testing.
>
> This patch enables dynamic autosuspend for all devices
> supported by the driver, but it will only actually work on
> LAN9500A (as this has a new SUSPEND3 mode for this purpose).
> Unfortunately we don't know if the connected device supports
> this feature until we query its ID register at runtime.
>
> On unsupported devices (LAN9500/9512/9514) this patch claims
> to support the feature but if enabled it will always return
> failure to the autosuspend call (and fill up the kernel log
> with a message every 2s).
>
> Suggestions on how best to indicate this capability at runtime
> instead of compile-time would be appreciated, so we don't have
> to repeatedly fail if accidentally enabled. Or maybe this is
> actually the best way?
The ID register can be read inside bind(), so you may set
smsc95xx_info.manage_power as smsc95xx_manage_power
only for LAN9500A devices.
One disadvantage of above idea is that the link down can't trigger
runtime suspend via mange_power way(USB auto-suspend), but
we still can introduce explicit link change based runtime suspend for
non-LAN9500A devices.
>
> We should also be able to identify devices supporting
> autosuspend from the USB VID/PID if this would help.
>
> UPDATE: reposting this to a wider audience due to lack of
> feedback last time round
>
> Signed-off-by: Steve Glendinning <steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org>
> ---
> drivers/net/usb/smsc95xx.c | 136 +++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 135 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index 9b73670..d9c6674 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -55,6 +55,13 @@
> #define FEATURE_PHY_NLP_CROSSOVER (0x02)
> #define FEATURE_AUTOSUSPEND (0x04)
>
> +#define SUSPEND_SUSPEND0 (0x01)
> +#define SUSPEND_SUSPEND1 (0x02)
> +#define SUSPEND_SUSPEND2 (0x04)
> +#define SUSPEND_SUSPEND3 (0x08)
> +#define SUSPEND_ALLMODES (SUSPEND_SUSPEND0 | SUSPEND_SUSPEND1 | \
> + SUSPEND_SUSPEND2 | SUSPEND_SUSPEND3)
> +
> struct smsc95xx_priv {
> u32 mac_cr;
> u32 hash_hi;
> @@ -62,6 +69,7 @@ struct smsc95xx_priv {
> u32 wolopts;
> spinlock_t mac_cr_lock;
> u8 features;
> + u8 suspend_flags;
> };
>
> static bool turbo_mode = true;
> @@ -1341,6 +1349,8 @@ static int smsc95xx_enter_suspend0(struct usbnet *dev)
> if (ret < 0)
> netdev_warn(dev->net, "Error reading PM_CTRL\n");
>
> + pdata->suspend_flags |= SUSPEND_SUSPEND0;
> +
> return ret;
> }
>
> @@ -1393,11 +1403,14 @@ static int smsc95xx_enter_suspend1(struct usbnet *dev)
> if (ret < 0)
> netdev_warn(dev->net, "Error writing PM_CTRL\n");
>
> + pdata->suspend_flags |= SUSPEND_SUSPEND1;
> +
> return ret;
> }
>
> static int smsc95xx_enter_suspend2(struct usbnet *dev)
> {
> + struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
> u32 val;
> int ret;
>
> @@ -1414,9 +1427,96 @@ static int smsc95xx_enter_suspend2(struct usbnet *dev)
> if (ret < 0)
> netdev_warn(dev->net, "Error writing PM_CTRL\n");
>
> + pdata->suspend_flags |= SUSPEND_SUSPEND2;
> +
> return ret;
> }
>
> +static int smsc95xx_enter_suspend3(struct usbnet *dev)
> +{
> + struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
> + u32 val;
> + int ret;
> +
> + ret = smsc95xx_read_reg_nopm(dev, RX_FIFO_INF, &val);
> + if (ret < 0) {
> + netdev_warn(dev->net, "Error reading RX_FIFO_INF");
> + return ret;
> + }
> +
> + if (val & 0xFFFF) {
> + netdev_info(dev->net, "rx fifo not empty in autosuspend");
> + return -EBUSY;
> + }
> +
> + ret = smsc95xx_read_reg_nopm(dev, PM_CTRL, &val);
> + if (ret < 0) {
> + netdev_warn(dev->net, "Error reading PM_CTRL");
> + return ret;
> + }
> +
> + val &= ~(PM_CTL_SUS_MODE_ | PM_CTL_WUPS_ | PM_CTL_PHY_RST_);
> + val |= PM_CTL_SUS_MODE_3 | PM_CTL_RES_CLR_WKP_STS;
> +
> + ret = smsc95xx_write_reg_nopm(dev, PM_CTRL, val);
> + if (ret < 0) {
> + netdev_warn(dev->net, "Error writing PM_CTRL");
> + return ret;
> + }
> +
> + /* clear wol status */
> + val &= ~PM_CTL_WUPS_;
> + val |= PM_CTL_WUPS_WOL_;
> +
> + ret = smsc95xx_write_reg_nopm(dev, PM_CTRL, val);
> + if (ret < 0) {
> + netdev_warn(dev->net, "Error writing PM_CTRL");
> + return ret;
> + }
> +
> + pdata->suspend_flags |= SUSPEND_SUSPEND3;
> +
> + return 0;
> +}
> +
> +static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
> +{
> + int ret;
> +
> + if (!netif_running(dev->net)) {
> + /* interface is ifconfig down so fully power down hw */
> + netdev_dbg(dev->net, "autosuspend entering SUSPEND2");
> + return smsc95xx_enter_suspend2(dev);
> + }
> +
> + if (!link_up) {
> + /* link is down so enter EDPD mode */
> + netdev_dbg(dev->net, "autosuspend entering SUSPEND1");
> +
> + /* enable PHY wakeup events for if cable is attached */
> + ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
> + PHY_INT_MASK_ANEG_COMP_);
> + if (ret < 0) {
> + netdev_warn(dev->net, "error enabling PHY wakeup ints");
> + return ret;
> + }
> +
> + netdev_info(dev->net, "entering SUSPEND1 mode");
> + return smsc95xx_enter_suspend1(dev);
> + }
> +
> + /* enable PHY wakeup events so we remote wakeup if cable is pulled */
> + ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
> + PHY_INT_MASK_LINK_DOWN_);
> + if (ret < 0) {
> + netdev_warn(dev->net, "error enabling PHY wakeup ints");
> + return ret;
> + }
> +
> + netdev_dbg(dev->net, "autosuspend entering SUSPEND3");
> + return smsc95xx_enter_suspend3(dev);
> +}
> +
> static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
> {
> struct usbnet *dev = usb_get_intfdata(intf);
> @@ -1424,15 +1524,35 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
> u32 val, link_up;
> int ret;
>
> + /* TODO: don't indicate this feature to usb framework if
> + * our current hardware doesn't have the capability
> + */
> + if ((message.event == PM_EVENT_AUTO_SUSPEND) &&
> + (!(pdata->features & FEATURE_AUTOSUSPEND))) {
> + netdev_warn(dev->net, "autosuspend not supported");
> + return -EBUSY;
> + }
> +
> ret = usbnet_suspend(intf, message);
> if (ret < 0) {
> netdev_warn(dev->net, "usbnet_suspend error\n");
> return ret;
> }
>
> + if (pdata->suspend_flags) {
> + netdev_warn(dev->net, "error during last resume");
> + pdata->suspend_flags = 0;
> + }
> +
> /* determine if link is up using only _nopm functions */
> link_up = smsc95xx_link_ok_nopm(dev);
>
> + if (message.event == PM_EVENT_AUTO_SUSPEND) {
> + ret = smsc95xx_autosuspend(dev, link_up);
> + goto done;
> + }
> +
> + /* if we get this far we're not autosuspending */
> /* if no wol options set, or if link is down and we're not waking on
> * PHY activity, enter lowest power SUSPEND2 mode
> */
> @@ -1694,12 +1814,18 @@ static int smsc95xx_resume(struct usb_interface *intf)
> {
> struct usbnet *dev = usb_get_intfdata(intf);
> struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
> + u8 suspend_flags = pdata->suspend_flags;
> int ret;
> u32 val;
>
> BUG_ON(!dev);
>
> - if (pdata->wolopts) {
> + netdev_dbg(dev->net, "resume suspend_flags=0x%02x", suspend_flags);
> +
> + /* do this first to ensure it's cleared even in error case */
> + pdata->suspend_flags = 0;
> +
> + if (suspend_flags & SUSPEND_ALLMODES) {
> /* clear wake-up sources */
> ret = smsc95xx_read_reg_nopm(dev, WUCSR, &val);
> if (ret < 0) {
> @@ -1891,6 +2017,12 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
> return skb;
> }
>
> +static int smsc95xx_manage_power(struct usbnet *dev, int on)
> +{
> + dev->intf->needs_remote_wakeup = on;
> + return 0;
> +}
> +
> static const struct driver_info smsc95xx_info = {
> .description = "smsc95xx USB 2.0 Ethernet",
> .bind = smsc95xx_bind,
> @@ -1900,6 +2032,7 @@ static const struct driver_info smsc95xx_info = {
> .rx_fixup = smsc95xx_rx_fixup,
> .tx_fixup = smsc95xx_tx_fixup,
> .status = smsc95xx_status,
> + .manage_power = smsc95xx_manage_power,
> .flags = FLAG_ETHER | FLAG_SEND_ZLP | FLAG_LINK_INTR,
> };
>
> @@ -2007,6 +2140,7 @@ static struct usb_driver smsc95xx_driver = {
> .reset_resume = smsc95xx_resume,
> .disconnect = usbnet_disconnect,
> .disable_hub_initiated_lpm = 1,
> + .supports_autosuspend = 1,
> };
>
> module_usb_driver(smsc95xx_driver);
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Steve Glendinning @ 2012-12-10 14:18 UTC (permalink / raw)
To: Oliver Neukum
Cc: Steve Glendinning, netdev, Ming Lei, linux-usb,
Greg Kroah-Hartman
In-Reply-To: <1404210.ufpq7a3oeI@linux-lqwf.site>
On 10 December 2012 12:09, Oliver Neukum <oliver@neukum.org> wrote:
> So this is a problem with remote wakeup on older hardware?
Exactly, the older hardware revisions can't reliably do it.
>> Unfortunately we don't know if the connected device supports
>> this feature until we query its ID register at runtime.
>>
>> Suggestions on how best to indicate this capability at runtime
>> instead of compile-time would be appreciated, so we don't have
>> to repeatedly fail if accidentally enabled. Or maybe this is
>> actually the best way?
>
> If this is a problem with remote wakeup, you should up the
> pm counter (usb_autopm_get_noresume()) in .manage_power
> That was the reason I implemented this is a callback and not as
> a helper in usbnet.
Thanks, so something like this should do the job?
static int smsc95xx_manage_power(struct usbnet *dev, int on)
{
struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
dev->intf->needs_remote_wakeup = on;
if (pdata->features & FEATURE_AUTOSUSPEND)
return 0;
/* this chip revision doesn't support autosuspend */
netdev_info(dev->net, "hardware doesn't support USB autosuspend\n");
if (on)
usb_autopm_get_interface_no_resume(dev->intf);
else
usb_autopm_put_interface_no_suspend(dev->intf);
return 0;
}
^ permalink raw reply
* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
From: Neil Horman @ 2012-12-10 14:22 UTC (permalink / raw)
To: Ying Xue; +Cc: Jon Maloy, Paul Gortmaker, David Miller, netdev
In-Reply-To: <50C580E6.7030905@windriver.com>
On Mon, Dec 10, 2012 at 02:27:50PM +0800, Ying Xue wrote:
> Neil Horman wrote:
> >On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
> >>On 12/07/2012 02:20 PM, Neil Horman wrote:
> >>>On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> >>>>From: Ying Xue <ying.xue@windriver.com>
> >>>>
> >>>>The sk_receive_queue limit control is currently performed for
> >>>>all arriving messages, disregarding socket and message type.
> >>>>But for connected sockets this check is redundant, since the protocol
> >>>>flow control already makes queue overflow impossible.
> >>>>
> >>>Can you explain where that occurs?
> >>It happens in the functions port_dispatcher_sigh() and
> >>tipc_send(), among other places. Both are to be found in the
> >>file port.c, which was supposed to contain the 'generic' (i.e.,
> >>API independent) part of the send/receive code.
> >>Now that we have only one API left, the socket API, we are
> >>planning to merge the code in socket.c and port.c, and get rid
> >>of some code overhead.
> >>
> >>The flow control in TIPC is message based, where the sender
> >>requires to receive an explicit acknowledge message for each 512
> >>message the receiver reads to user space.
> >>If the sender has more than 1024 messages outstanding without having
> >>received an acknowledge he will be suspended or receive EAGAIN
> >>until he does.
> >>The plan going forward is to replace this mechanism with a more
> >>standard looking byte based flow control, while maintaining
> >>backwards compatibility.
> >>
> >Ok, That makes more sense, thank you. Although I still don't think this is
> >safe (but the problem may not be solely introduced by this patch). Using a
> >global limit that assumes the sender will block when the congestion window is
> >reached just doesn't seem sane to me. It clearly works with the Linux
> >implementation, as it conforms to your expectations, but an alternate
> >implementation could create a DOS situation by simply ignoring the window limit,
> >and continuing to send. I see that we drop frames over the global limit in
> >filter_rcv, but the check in rx_queue_full bumps up that limit based on the
> >value of msg_importance(msg), but that threshold is ignored if the value of
> >msg_importance is invalid. All a sender needs to do is flood a receiver with
> >frames containing an invalid set of message importance bits, and you will queue
> >frames indefinately. In fact that will also happen if you send message of
> >CRITICAL importance as well, so you don't even need to supply an invalid value
> >here.
> >
>
> You are absolutely right. I will correct these drawbacks in next version.
>
> >>>I see where the tipc dispatch function calls
> >>>sk_add_backlog, which checks the per socket recieve queue (regardless of weather
> >>>the receiving socket is connection oriented or connectionless), but if the
> >>>receiver doesn't call receive very often, This just adds a check against your
> >>>global limit, doing nothing for your per-socket limits.
> >>OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
> >>our per-socket limit. In fact, TIPC connectionless overflow
> >>control currently is a kind of a hybrid, based on a message
> >>counter when the socket is not locked, and based on
> >>sk_rcv_queue's byte limit when a message has to be added to the
> >>backlog.
> >>We are planning to fix this inconsistency too.
> >Good, thank you, that was seeming quite wrong to me.
> >
> >> In fact it seems to
> >>>repeat the same check twice, as in the worst case of the incomming message being
> >>>TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
> >>>OVERLOAD_LIMIT_BASE/2 again.
> >>Yes, you are right. The intention is that only the first test,
> >>if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
> >>will be run for the vast majority of messages, since we must assume
> >>that there is no overload most of the time.
> >>An inelegant optimization, perhaps, but not logically wrong.
> >No, not logically wrong, but not an optimization either. With this change,
> >your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
> >rx_queue_full, and then you do some multiplication based on that. If you really
> >want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
> >doubling it like this patch series does), mark rx_queue_full as inline, and just
> >pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
> >the conditional branch and a call instruction. If you add a multiplication
> >factor table, you can eliminate the if/else clauses in rx_queue_full as well.
> >
>
> Good suggestion with a factor table. Maybe it's unnecessary to
> explicitly mark rx_queue_full as inline. Currently it sounds like we
> let complier decide whether a function is defined as inline or not.
>
Thats correct, the compiler usually decides if something should be inlined
(unless you use the __always_inline) attribute. In this case, given a single
call site, it most like will just inline anyway. But if you're interested in
optimizing here, it might be worth taking the extra steps to make sure. In
fact, since this is your only call site, it may be worthwhile to just remove the
function entirely, and manually inline the check.
Neil
> Regards,
> Ying
>
> >Neil
> >
> >>///jon
> >>
> >>>Neil
> >>>
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>
^ permalink raw reply
* Re: [PATCH net-next v3] tipc: sk_recv_queue size check only for connectionless sockets
From: Neil Horman @ 2012-12-10 14:51 UTC (permalink / raw)
To: Ying Xue; +Cc: Paul.Gortmaker, jon.maloy, erik.hugne, netdev, tipc-discussion
In-Reply-To: <1355131380-8542-1-git-send-email-ying.xue@windriver.com>
On Mon, Dec 10, 2012 at 05:23:00PM +0800, Ying Xue wrote:
> The sk_receive_queue limit control is currently performed for all
> arriving messages, disregarding socket and message type. But for
> connectionless sockets this check is redundant, since the protocol
> flow already makes queue overflow impossible.
>
> We move the sk_receive_queue limit control so that it's only performed
> for connectionless sockets, i.e. SOCK_RDM and SOCK_DGRAM type sockets.
>
> However, as Neil Horman specified, we cannot simply force the socket
> receive queue limit against connectionless sockets as it may create a
> DoS vulnerability. For example, if a sender floods a receiver with
> messages containing an invalid set of message importance bits or
> CRITICAL importance, we will queue messages indefinitely.
>
> To avoid DoS attack, socket receive queue will be marked as overflow
> if we receive messages with invalid message importances, meanwhile,
> we also set one new threshold for CRITICAL importance messages.
>
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> v3 changes:
> - set new threshold for CRITICAL message
> - defined an importance factor table to avoid multiplication and
> division operations in rx_queue_full().
> - changed return value of rx_queue_full() from integer to boolean.
>
> net/tipc/socket.c | 44 +++++++++++++++++++-------------------------
> 1 files changed, 19 insertions(+), 25 deletions(-)
>
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 9b4e483..a18a757 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -43,7 +43,7 @@
> #define SS_LISTENING -1 /* socket is listening */
> #define SS_READY -2 /* socket is connectionless */
>
> -#define OVERLOAD_LIMIT_BASE 10000
> +#define OVERLOAD_LIMIT_BASE 5000
> #define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */
>
> struct tipc_sock {
> @@ -73,6 +73,13 @@ static struct proto tipc_proto;
>
> static int sockets_enabled;
>
> +static const u32 msg_importance_factor[] = {
> + OVERLOAD_LIMIT_BASE, /* TIPC_LOW_IMPORTANCE limit */
> + OVERLOAD_LIMIT_BASE * 2, /* TIPC_MEDIUM_IMPORTANCE limit */
> + OVERLOAD_LIMIT_BASE * 100, /* TIPC_HIGH_IMPORTANCE limit */
> + OVERLOAD_LIMIT_BASE * 200 /* TIPC_CRITICAL_IMPORTANCE limit */
> + };
> +
> /*
> * Revised TIPC socket locking policy:
> *
> @@ -1158,28 +1165,17 @@ static void tipc_data_ready(struct sock *sk, int len)
> * rx_queue_full - determine if receive queue can accept another message
> * @msg: message to be added to queue
> * @queue_size: current size of queue
> - * @base: nominal maximum size of queue
> *
> - * Returns 1 if queue is unable to accept message, 0 otherwise
> + * Returns true if queue is unable to accept message, false otherwise
> */
> -static int rx_queue_full(struct tipc_msg *msg, u32 queue_size, u32 base)
> +static bool rx_queue_full(struct tipc_msg *msg, u32 queue_size)
> {
> - u32 threshold;
> u32 imp = msg_importance(msg);
>
> - if (imp == TIPC_LOW_IMPORTANCE)
> - threshold = base;
> - else if (imp == TIPC_MEDIUM_IMPORTANCE)
> - threshold = base * 2;
> - else if (imp == TIPC_HIGH_IMPORTANCE)
> - threshold = base * 100;
> - else
> - return 0;
> + if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE))
> + return true;
>
> - if (msg_connected(msg))
> - threshold *= 4;
> -
> - return queue_size >= threshold;
> + return queue_size >= msg_importance_factor[imp];
> }
>
> /**
> @@ -1275,7 +1271,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
> {
> struct socket *sock = sk->sk_socket;
> struct tipc_msg *msg = buf_msg(buf);
> - u32 recv_q_len;
> u32 res = TIPC_OK;
>
> /* Reject message if it is wrong sort of message for socket */
> @@ -1285,19 +1280,18 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
> if (sock->state == SS_READY) {
> if (msg_connected(msg))
> return TIPC_ERR_NO_PORT;
> + /* Reject SOCK_DGRAM and SOCK_RDM message if there isn't room
> + * to queue it
> + */
> + if (unlikely(rx_queue_full(msg,
> + skb_queue_len(&sk->sk_receive_queue))))
> + return TIPC_ERR_OVERLOAD;
> } else {
> res = filter_connect(tipc_sk(sk), &buf);
> if (res != TIPC_OK || buf == NULL)
> return res;
> }
>
> - /* Reject message if there isn't room to queue it */
> - recv_q_len = skb_queue_len(&sk->sk_receive_queue);
> - if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
> - if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
> - return TIPC_ERR_OVERLOAD;
> - }
> -
> /* Enqueue message (finally!) */
> TIPC_SKB_CB(buf)->handle = 0;
> __skb_queue_tail(&sk->sk_receive_queue, buf);
> --
> 1.7.1
>
>
That looks more reasonable, thanks.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply
* Re: [PATCH RESEND] net: remove obsolete simple_strto<foo>
From: Neil Horman @ 2012-12-10 15:03 UTC (permalink / raw)
To: Abhijit Pawar
Cc: David S. Miller, Pablo Neira Ayuso, Patrick McHardy,
Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
John W. Linville, Johannes Berg, Cong Wang, Eric Dumazet,
Joe Perches, netdev, linux-kernel, netfilter-devel, netfilter,
coreteam, linux-wireless
In-Reply-To: <1355130748-7828-1-git-send-email-abhi.c.pawar@gmail.com>
On Mon, Dec 10, 2012 at 02:42:28PM +0530, Abhijit Pawar wrote:
> This patch replace the obsolete simple_strto<foo> with kstrto<foo>
>
> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply
* RE: ipgre rss is broken since gro
From: Eric Dumazet @ 2012-12-10 15:17 UTC (permalink / raw)
To: Dmitry Kravkov; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <504C9EFCA2D0054393414C9CB605C37F1BFC104B@SJEXCHMB06.corp.ad.broadcom.com>
On Mon, 2012-12-10 at 11:32 +0000, Dmitry Kravkov wrote:
> Current bnx2x do not apply RSS for GRE, non GRE RSS is working w/o problem.
What about you post the changes you did ?
^ permalink raw reply
* [PATCH net-next v3 01/22] bnx2x: Support probing and removing of VF device
From: Ariel Elior @ 2012-12-10 15:46 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Ariel Elior, Eilon Greenstein
In-Reply-To: <1355154406-10855-1-git-send-email-ariele@broadcom.com>
To support probing and removing of a bnx2x virtual function
the following were added:
1. add bnx2x_vfpf.h: defines the VF to PF channel
2. add bnx2x_sriov.h: header for bnx2x SR-IOV functionality
3. enumerate VF hw types (identify VFs)
4. if driving a VF, map VF bar
5. if driving a VF, allocate Vf to PF channel
6. refactor interrupt flows to include VF
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 21 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 25 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 391 +++++++++++++--------
drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h | 9 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h | 27 ++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h | 37 ++
7 files changed, 354 insertions(+), 158 deletions(-)
create mode 100644 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
create mode 100644 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 9a3b81e..b2f7425 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -49,6 +49,12 @@
#include "bnx2x_dcb.h"
#include "bnx2x_stats.h"
+enum bnx2x_int_mode {
+ BNX2X_INT_MODE_MSIX,
+ BNX2X_INT_MODE_INTX,
+ BNX2X_INT_MODE_MSI
+};
+
/* error/debug prints */
#define DRV_MODULE_NAME "bnx2x"
@@ -954,6 +960,9 @@ struct bnx2x_port {
extern struct workqueue_struct *bnx2x_wq;
#define BNX2X_MAX_NUM_OF_VFS 64
+#define BNX2X_VF_CID_WND 0
+#define BNX2X_CIDS_PER_VF (1 << BNX2X_VF_CID_WND)
+#define BNX2X_VF_CIDS (BNX2X_MAX_NUM_OF_VFS * BNX2X_CIDS_PER_VF)
#define BNX2X_VF_ID_INVALID 0xFF
/*
@@ -1231,6 +1240,10 @@ struct bnx2x {
(vn) * ((CHIP_IS_E1x(bp) || (CHIP_MODE_IS_4_PORT(bp))) ? 2 : 1))
#define BP_FW_MB_IDX(bp) BP_FW_MB_IDX_VN(bp, BP_VN(bp))
+ /* vf pf channel mailbox contains request and response buffers */
+ struct bnx2x_vf_mbx_msg *vf2pf_mbox;
+ dma_addr_t vf2pf_mbox_mapping;
+
struct net_device *dev;
struct pci_dev *pdev;
@@ -1318,8 +1331,6 @@ struct bnx2x {
#define DISABLE_MSI_FLAG (1 << 7)
#define TPA_ENABLE_FLAG (1 << 8)
#define NO_MCP_FLAG (1 << 9)
-
-#define BP_NOMCP(bp) (bp->flags & NO_MCP_FLAG)
#define GRO_ENABLE_FLAG (1 << 10)
#define MF_FUNC_DIS (1 << 11)
#define OWN_CNIC_IRQ (1 << 12)
@@ -1330,6 +1341,11 @@ struct bnx2x {
#define BC_SUPPORTS_FCOE_FEATURES (1 << 19)
#define USING_SINGLE_MSIX_FLAG (1 << 20)
#define BC_SUPPORTS_DCBX_MSG_NON_PMF (1 << 21)
+#define IS_VF_FLAG (1 << 22)
+
+#define BP_NOMCP(bp) ((bp)->flags & NO_MCP_FLAG)
+#define IS_VF(bp) ((bp)->flags & IS_VF_FLAG)
+#define IS_PF(bp) (!((bp)->flags & IS_VF_FLAG))
#define NO_ISCSI(bp) ((bp)->flags & NO_ISCSI_FLAG)
#define NO_ISCSI_OOO(bp) ((bp)->flags & NO_ISCSI_OOO_FLAG)
@@ -1432,6 +1448,7 @@ struct bnx2x {
u8 igu_sb_cnt;
u8 min_msix_vec_cnt;
+ u32 igu_base_addr;
dma_addr_t def_status_blk_mapping;
struct bnx2x_slowpath *slowpath;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 67baddd..0a493f4 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -1424,12 +1424,15 @@ void bnx2x_free_irq(struct bnx2x *bp)
int bnx2x_enable_msix(struct bnx2x *bp)
{
- int msix_vec = 0, i, rc, req_cnt;
+ int msix_vec = 0, i, rc;
- bp->msix_table[msix_vec].entry = msix_vec;
- BNX2X_DEV_INFO("msix_table[0].entry = %d (slowpath)\n",
- bp->msix_table[0].entry);
- msix_vec++;
+ /* VFs don't have a default status block */
+ if (IS_PF(bp)) {
+ bp->msix_table[msix_vec].entry = msix_vec;
+ BNX2X_DEV_INFO("msix_table[0].entry = %d (slowpath)\n",
+ bp->msix_table[0].entry);
+ msix_vec++;
+ }
/* Cnic requires an msix vector for itself */
if (CNIC_SUPPORT(bp)) {
@@ -1447,9 +1450,10 @@ int bnx2x_enable_msix(struct bnx2x *bp)
msix_vec++;
}
- req_cnt = BNX2X_NUM_ETH_QUEUES(bp) + CNIC_SUPPORT(bp) + 1;
+ DP(BNX2X_MSG_SP, "about to request enable msix with %d vectors",
+ msix_vec);
- rc = pci_enable_msix(bp->pdev, &bp->msix_table[0], req_cnt);
+ rc = pci_enable_msix(bp->pdev, &bp->msix_table[0], msix_vec);
/*
* reconfigure number of tx/rx queues according to available
@@ -1457,7 +1461,7 @@ int bnx2x_enable_msix(struct bnx2x *bp)
*/
if (rc >= BNX2X_MIN_MSIX_VEC_CNT(bp)) {
/* how less vectors we will have? */
- int diff = req_cnt - rc;
+ int diff = msix_vec - rc;
BNX2X_DEV_INFO("Trying to use less MSI-X vectors: %d\n", rc);
@@ -3889,7 +3893,10 @@ int bnx2x_alloc_mem_bp(struct bnx2x *bp)
* The biggest MSI-X table we might need is as a maximum number of fast
* path IGU SBs plus default SB (for PF).
*/
- msix_table_size = bp->igu_sb_cnt + 1;
+ msix_table_size = bp->igu_sb_cnt;
+ if (IS_PF(bp))
+ msix_table_size++;
+ BNX2X_DEV_INFO("msix_table_size %d", msix_table_size);
/* fp array: RSS plus CNIC related L2 queues */
fp_array_size = BNX2X_MAX_RSS_COUNT(bp) + CNIC_SUPPORT(bp);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 0991534..bca371e 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -863,7 +863,7 @@ static inline void bnx2x_del_all_napi(struct bnx2x *bp)
netif_napi_del(&bnx2x_fp(bp, i, napi));
}
-void bnx2x_set_int_mode(struct bnx2x *bp);
+int bnx2x_set_int_mode(struct bnx2x *bp);
static inline void bnx2x_disable_msi(struct bnx2x *bp)
{
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 940ef85..b9bc677 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -59,6 +59,8 @@
#include "bnx2x_init.h"
#include "bnx2x_init_ops.h"
#include "bnx2x_cmn.h"
+#include "bnx2x_vfpf.h"
+#include "bnx2x_sriov.h"
#include "bnx2x_dcb.h"
#include "bnx2x_sp.h"
@@ -133,39 +135,49 @@ enum bnx2x_board_type {
BCM57711E,
BCM57712,
BCM57712_MF,
+ BCM57712_VF,
BCM57800,
BCM57800_MF,
+ BCM57800_VF,
BCM57810,
BCM57810_MF,
- BCM57840_O,
+ BCM57810_VF,
BCM57840_4_10,
BCM57840_2_20,
- BCM57840_MFO,
BCM57840_MF,
+ BCM57840_VF,
BCM57811,
- BCM57811_MF
+ BCM57811_MF,
+ BCM57840_O,
+ BCM57840_MFO,
+ BCM57811_VF
};
/* indexed by board_type, above */
static struct {
char *name;
} board_info[] = {
- { "Broadcom NetXtreme II BCM57710 10 Gigabit PCIe [Everest]" },
- { "Broadcom NetXtreme II BCM57711 10 Gigabit PCIe" },
- { "Broadcom NetXtreme II BCM57711E 10 Gigabit PCIe" },
- { "Broadcom NetXtreme II BCM57712 10 Gigabit Ethernet" },
- { "Broadcom NetXtreme II BCM57712 10 Gigabit Ethernet Multi Function" },
- { "Broadcom NetXtreme II BCM57800 10 Gigabit Ethernet" },
- { "Broadcom NetXtreme II BCM57800 10 Gigabit Ethernet Multi Function" },
- { "Broadcom NetXtreme II BCM57810 10 Gigabit Ethernet" },
- { "Broadcom NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function" },
- { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet" },
- { "Broadcom NetXtreme II BCM57840 10 Gigabit Ethernet" },
- { "Broadcom NetXtreme II BCM57840 20 Gigabit Ethernet" },
- { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet Multi Function"},
- { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet Multi Function"},
- { "Broadcom NetXtreme II BCM57811 10 Gigabit Ethernet"},
- { "Broadcom NetXtreme II BCM57811 10 Gigabit Ethernet Multi Function"},
+ [BCM57710] = { "Broadcom NetXtreme II BCM57710 10 Gigabit PCIe [Everest]" },
+ [BCM57711] = { "Broadcom NetXtreme II BCM57711 10 Gigabit PCIe" },
+ [BCM57711E] = { "Broadcom NetXtreme II BCM57711E 10 Gigabit PCIe" },
+ [BCM57712] = { "Broadcom NetXtreme II BCM57712 10 Gigabit Ethernet" },
+ [BCM57712_MF] = { "Broadcom NetXtreme II BCM57712 10 Gigabit Ethernet Multi Function" },
+ [BCM57712_VF] = { "Broadcom NetXtreme II BCM57712 10 Gigabit Ethernet Virtual Function" },
+ [BCM57800] = { "Broadcom NetXtreme II BCM57800 10 Gigabit Ethernet" },
+ [BCM57800_MF] = { "Broadcom NetXtreme II BCM57800 10 Gigabit Ethernet Multi Function" },
+ [BCM57800_VF] = { "Broadcom NetXtreme II BCM57800 10 Gigabit Ethernet Virtual Function" },
+ [BCM57810] = { "Broadcom NetXtreme II BCM57810 10 Gigabit Ethernet" },
+ [BCM57810_MF] = { "Broadcom NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function" },
+ [BCM57810_VF] = { "Broadcom NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function" },
+ [BCM57840_4_10] = { "Broadcom NetXtreme II BCM57840 10 Gigabit Ethernet" },
+ [BCM57840_2_20] = { "Broadcom NetXtreme II BCM57840 20 Gigabit Ethernet" },
+ [BCM57840_MF] = { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet Multi Function" },
+ [BCM57840_VF] = { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet Virtual Function" },
+ [BCM57811] = { "Broadcom NetXtreme II BCM57811 10 Gigabit Ethernet" },
+ [BCM57811_MF] = { "Broadcom NetXtreme II BCM57811 10 Gigabit Ethernet Multi Function" },
+ [BCM57840_O] = { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet" },
+ [BCM57840_MFO] = { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet Multi Function" },
+ [BCM57811_VF] = { "Broadcom NetXtreme II BCM57840 10/20 Gigabit Ethernet Virtual Function" }
};
#ifndef PCI_DEVICE_ID_NX2_57710
@@ -7792,41 +7804,49 @@ int bnx2x_setup_leading(struct bnx2x *bp)
*
* In case of MSI-X it will also try to enable MSI-X.
*/
-void bnx2x_set_int_mode(struct bnx2x *bp)
+int bnx2x_set_int_mode(struct bnx2x *bp)
{
+ int rc = 0;
+
+ if (IS_VF(bp) && int_mode != BNX2X_INT_MODE_MSIX)
+ return -EINVAL;
+
switch (int_mode) {
- case INT_MODE_MSI:
+ case BNX2X_INT_MODE_MSIX:
+ /* attempt to enable msix */
+ rc = bnx2x_enable_msix(bp);
+
+ /* msix attained */
+ if (!rc)
+ return 0;
+
+ /* vfs use only msix */
+ if (rc && IS_VF(bp))
+ return rc;
+
+ /* failed to enable multiple MSI-X */
+ BNX2X_DEV_INFO("Failed to enable multiple MSI-X (%d), set number of queues to %d\n",
+ bp->num_queues,
+ 1 + bp->num_cnic_queues);
+
+ /* falling through... */
+ case BNX2X_INT_MODE_MSI:
bnx2x_enable_msi(bp);
+
/* falling through... */
- case INT_MODE_INTx:
+ case BNX2X_INT_MODE_INTX:
bp->num_ethernet_queues = 1;
bp->num_queues = bp->num_ethernet_queues + bp->num_cnic_queues;
BNX2X_DEV_INFO("set number of queues to 1\n");
break;
default:
- /* if we can't use MSI-X we only need one fp,
- * so try to enable MSI-X with the requested number of fp's
- * and fallback to MSI or legacy INTx with one fp
- */
- if (bnx2x_enable_msix(bp) ||
- bp->flags & USING_SINGLE_MSIX_FLAG) {
- /* failed to enable multiple MSI-X */
- BNX2X_DEV_INFO("Failed to enable multiple MSI-X (%d), set number of queues to %d\n",
- bp->num_queues,
- 1 + bp->num_cnic_queues);
-
- bp->num_queues = 1 + bp->num_cnic_queues;
-
- /* Try to enable MSI */
- if (!(bp->flags & USING_SINGLE_MSIX_FLAG) &&
- !(bp->flags & DISABLE_MSI_FLAG))
- bnx2x_enable_msi(bp);
- }
- break;
+ BNX2X_DEV_INFO("unknown value in int_mode module parameter\n");
+ return -EINVAL;
}
+ return 0;
}
-/* must be called prioir to any HW initializations */
+/* must be called prior to any HW initializations */
static inline u16 bnx2x_cid_ilt_lines(struct bnx2x *bp)
{
return L2_ILT_LINES(bp);
@@ -11081,9 +11101,13 @@ static int bnx2x_init_bp(struct bnx2x *bp)
INIT_DELAYED_WORK(&bp->sp_task, bnx2x_sp_task);
INIT_DELAYED_WORK(&bp->sp_rtnl_task, bnx2x_sp_rtnl_task);
INIT_DELAYED_WORK(&bp->period_task, bnx2x_period_task);
- rc = bnx2x_get_hwinfo(bp);
- if (rc)
- return rc;
+ if (IS_PF(bp)) {
+ rc = bnx2x_get_hwinfo(bp);
+ if (rc)
+ return rc;
+ } else {
+ random_ether_addr(bp->dev->dev_addr);
+ }
bnx2x_set_modes_bitmap(bp);
@@ -11096,7 +11120,7 @@ static int bnx2x_init_bp(struct bnx2x *bp)
func = BP_FUNC(bp);
/* need to reset chip if undi was active */
- if (!BP_NOMCP(bp)) {
+ if (IS_PF(bp) && !BP_NOMCP(bp)) {
/* init fw_seq */
bp->fw_seq =
SHMEM_RD(bp, func_mb[BP_FW_MB_IDX(bp)].drv_mb_header) &
@@ -11133,6 +11157,8 @@ static int bnx2x_init_bp(struct bnx2x *bp)
bp->mrrs = mrrs;
bp->tx_ring_size = IS_MF_FCOE_AFEX(bp) ? 0 : MAX_TX_AVAIL;
+ if (IS_VF(bp))
+ bp->rx_ring_size = MAX_RX_AVAIL;
/* make sure that the numbers are in the right granularity */
bp->tx_ticks = (50 / BNX2X_BTR) * BNX2X_BTR;
@@ -11161,12 +11187,18 @@ static int bnx2x_init_bp(struct bnx2x *bp)
bp->cnic_base_cl_id = FP_SB_MAX_E2;
/* multiple tx priority */
- if (CHIP_IS_E1x(bp))
+ if (IS_VF(bp))
+ bp->max_cos = 1;
+ else if (CHIP_IS_E1x(bp))
bp->max_cos = BNX2X_MULTI_TX_COS_E1X;
- if (CHIP_IS_E2(bp) || CHIP_IS_E3A0(bp))
+ else if (CHIP_IS_E2(bp) || CHIP_IS_E3A0(bp))
bp->max_cos = BNX2X_MULTI_TX_COS_E2_E3A0;
- if (CHIP_IS_E3B0(bp))
+ else if (CHIP_IS_E3B0(bp))
bp->max_cos = BNX2X_MULTI_TX_COS_E3B0;
+ else
+ BNX2X_ERR("unknown chip %x revision %x\n",
+ CHIP_NUM(bp), CHIP_REV(bp));
+ pr_info("set bp->max_cos to %d\n", bp->max_cos);
/* We need at least one default status block for slow-path events,
* second status block for the L2 queue, and a third status block for
@@ -11551,10 +11583,9 @@ static int bnx2x_set_coherency_mask(struct bnx2x *bp)
return 0;
}
-static int bnx2x_init_dev(struct pci_dev *pdev, struct net_device *dev,
- unsigned long board_type)
+static int bnx2x_init_dev(struct bnx2x *bp, struct pci_dev *pdev,
+ struct net_device *dev, unsigned long board_type)
{
- struct bnx2x *bp;
int rc;
u32 pci_cfg_dword;
bool chip_is_e1x = (board_type == BCM57710 ||
@@ -11562,11 +11593,9 @@ static int bnx2x_init_dev(struct pci_dev *pdev, struct net_device *dev,
board_type == BCM57711E);
SET_NETDEV_DEV(dev, &pdev->dev);
- bp = netdev_priv(dev);
bp->dev = dev;
bp->pdev = pdev;
- bp->flags = 0;
rc = pci_enable_device(pdev);
if (rc) {
@@ -11582,9 +11611,8 @@ static int bnx2x_init_dev(struct pci_dev *pdev, struct net_device *dev,
goto err_out_disable;
}
- if (!(pci_resource_flags(pdev, 2) & IORESOURCE_MEM)) {
- dev_err(&bp->pdev->dev, "Cannot find second PCI device"
- " base address, aborting\n");
+ if (IS_PF(bp) && !(pci_resource_flags(pdev, 2) & IORESOURCE_MEM)) {
+ dev_err(&bp->pdev->dev, "Cannot find second PCI device base address, aborting\n");
rc = -ENODEV;
goto err_out_disable;
}
@@ -11609,12 +11637,14 @@ static int bnx2x_init_dev(struct pci_dev *pdev, struct net_device *dev,
pci_save_state(pdev);
}
- bp->pm_cap = pci_find_capability(pdev, PCI_CAP_ID_PM);
- if (bp->pm_cap == 0) {
- dev_err(&bp->pdev->dev,
- "Cannot find power management capability, aborting\n");
- rc = -EIO;
- goto err_out_release;
+ if (IS_PF(bp)) {
+ bp->pm_cap = pci_find_capability(pdev, PCI_CAP_ID_PM);
+ if (bp->pm_cap == 0) {
+ dev_err(&bp->pdev->dev,
+ "Cannot find power management capability, aborting\n");
+ rc = -EIO;
+ goto err_out_release;
+ }
}
if (!pci_is_pcie(pdev)) {
@@ -11665,25 +11695,28 @@ static int bnx2x_init_dev(struct pci_dev *pdev, struct net_device *dev,
* Clean the following indirect addresses for all functions since it
* is not used by the driver.
*/
- REG_WR(bp, PXP2_REG_PGL_ADDR_88_F0, 0);
- REG_WR(bp, PXP2_REG_PGL_ADDR_8C_F0, 0);
- REG_WR(bp, PXP2_REG_PGL_ADDR_90_F0, 0);
- REG_WR(bp, PXP2_REG_PGL_ADDR_94_F0, 0);
+ if (IS_PF(bp)) {
+ REG_WR(bp, PXP2_REG_PGL_ADDR_88_F0, 0);
+ REG_WR(bp, PXP2_REG_PGL_ADDR_8C_F0, 0);
+ REG_WR(bp, PXP2_REG_PGL_ADDR_90_F0, 0);
+ REG_WR(bp, PXP2_REG_PGL_ADDR_94_F0, 0);
+
+ if (chip_is_e1x) {
+ REG_WR(bp, PXP2_REG_PGL_ADDR_88_F1, 0);
+ REG_WR(bp, PXP2_REG_PGL_ADDR_8C_F1, 0);
+ REG_WR(bp, PXP2_REG_PGL_ADDR_90_F1, 0);
+ REG_WR(bp, PXP2_REG_PGL_ADDR_94_F1, 0);
+ }
- if (chip_is_e1x) {
- REG_WR(bp, PXP2_REG_PGL_ADDR_88_F1, 0);
- REG_WR(bp, PXP2_REG_PGL_ADDR_8C_F1, 0);
- REG_WR(bp, PXP2_REG_PGL_ADDR_90_F1, 0);
- REG_WR(bp, PXP2_REG_PGL_ADDR_94_F1, 0);
+ /* Enable internal target-read (in case we are probed after PF
+ * FLR). Must be done prior to any BAR read access. Only for
+ * 57712 and up
+ */
+ if (!chip_is_e1x)
+ REG_WR(bp,
+ PGLUE_B_REG_INTERNAL_PFID_ENABLE_TARGET_READ, 1);
}
- /*
- * Enable internal target-read (in case we are probed after PF FLR).
- * Must be done prior to any BAR read access. Only for 57712 and up
- */
- if (!chip_is_e1x)
- REG_WR(bp, PGLUE_B_REG_INTERNAL_PFID_ENABLE_TARGET_READ, 1);
-
dev->watchdog_timeo = TX_TIMEOUT;
dev->netdev_ops = &bnx2x_netdev_ops;
@@ -11734,7 +11767,8 @@ err_out:
static void bnx2x_get_pcie_width_speed(struct bnx2x *bp, int *width, int *speed)
{
- u32 val = REG_RD(bp, PCICFG_OFFSET + PCICFG_LINK_CONTROL);
+ u32 val = 0;
+ pci_read_config_dword(bp->pdev, PCICFG_LINK_CONTROL, &val);
*width = (val & PCICFG_LINK_WIDTH) >> PCICFG_LINK_WIDTH_SHIFT;
@@ -12012,10 +12046,10 @@ static int bnx2x_set_qm_cid_count(struct bnx2x *bp)
*
*/
static int bnx2x_get_num_non_def_sbs(struct pci_dev *pdev,
- int cnic_cnt)
+ int cnic_cnt, bool is_vf)
{
- int pos;
- u16 control;
+ int pos, index;
+ u16 control = 0;
pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
@@ -12023,85 +12057,115 @@ static int bnx2x_get_num_non_def_sbs(struct pci_dev *pdev,
* If MSI-X is not supported - return number of SBs needed to support
* one fast path queue: one FP queue + SB for CNIC
*/
- if (!pos)
+ if (!pos) {
+ pr_info("no msix capability found");
return 1 + cnic_cnt;
+ }
+
+ pr_info("msix capability found");
/*
* The value in the PCI configuration space is the index of the last
* entry, namely one less than the actual size of the table, which is
* exactly what we want to return from this function: number of all SBs
* without the default SB.
+ * For VFs there is no default SB, then we return (index+1).
*/
pci_read_config_word(pdev, pos + PCI_MSI_FLAGS, &control);
- return control & PCI_MSIX_FLAGS_QSIZE;
-}
-struct cnic_eth_dev *bnx2x_cnic_probe(struct net_device *);
+ index = control & PCI_MSIX_FLAGS_QSIZE;
-static int bnx2x_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
-{
- struct net_device *dev = NULL;
- struct bnx2x *bp;
- int pcie_width, pcie_speed;
- int rc, max_non_def_sbs;
- int rx_count, tx_count, rss_count, doorbell_size;
- int cnic_cnt;
- /*
- * An estimated maximum supported CoS number according to the chip
- * version.
- * We will try to roughly estimate the maximum number of CoSes this chip
- * may support in order to minimize the memory allocated for Tx
- * netdev_queue's. This number will be accurately calculated during the
- * initialization of bp->max_cos based on the chip versions AND chip
- * revision in the bnx2x_init_bp().
- */
- u8 max_cos_est = 0;
+ return is_vf ? index + 1 : index;
+}
- switch (ent->driver_data) {
+static int set_max_cos_est(int chip_id)
+{
+ switch (chip_id) {
case BCM57710:
case BCM57711:
case BCM57711E:
- max_cos_est = BNX2X_MULTI_TX_COS_E1X;
- break;
-
+ return BNX2X_MULTI_TX_COS_E1X;
case BCM57712:
case BCM57712_MF:
- max_cos_est = BNX2X_MULTI_TX_COS_E2_E3A0;
- break;
-
+ case BCM57712_VF:
+ return BNX2X_MULTI_TX_COS_E2_E3A0;
case BCM57800:
case BCM57800_MF:
+ case BCM57800_VF:
case BCM57810:
case BCM57810_MF:
- case BCM57840_O:
case BCM57840_4_10:
case BCM57840_2_20:
+ case BCM57840_O:
case BCM57840_MFO:
+ case BCM57810_VF:
case BCM57840_MF:
+ case BCM57840_VF:
case BCM57811:
case BCM57811_MF:
- max_cos_est = BNX2X_MULTI_TX_COS_E3B0;
- break;
-
+ case BCM57811_VF:
+ return BNX2X_MULTI_TX_COS_E3B0;
+ return 1;
default:
- pr_err("Unknown board_type (%ld), aborting\n",
- ent->driver_data);
+ pr_err("Unknown board_type (%d), aborting\n", chip_id);
return -ENODEV;
}
+}
- cnic_cnt = 1;
- max_non_def_sbs = bnx2x_get_num_non_def_sbs(pdev, cnic_cnt);
+static int set_is_vf(int chip_id)
+{
+ switch (chip_id) {
+ case BCM57712_VF:
+ case BCM57800_VF:
+ case BCM57810_VF:
+ case BCM57840_VF:
+ case BCM57811_VF:
+ return true;
+ default:
+ return false;
+ }
+}
+
+struct cnic_eth_dev *bnx2x_cnic_probe(struct net_device *dev);
+
+static int bnx2x_init_one(struct pci_dev *pdev,
+ const struct pci_device_id *ent)
+{
+ struct net_device *dev = NULL;
+ struct bnx2x *bp;
+ int pcie_width, pcie_speed;
+ int rc, max_non_def_sbs;
+ int rx_count, tx_count, rss_count, doorbell_size;
+ int max_cos_est;
+ bool is_vf;
+ int cnic_cnt;
+
+ /* An estimated maximum supported CoS number according to the chip
+ * version.
+ * We will try to roughly estimate the maximum number of CoSes this chip
+ * may support in order to minimize the memory allocated for Tx
+ * netdev_queue's. This number will be accurately calculated during the
+ * initialization of bp->max_cos based on the chip versions AND chip
+ * revision in the bnx2x_init_bp().
+ */
+ max_cos_est = set_max_cos_est(ent->driver_data);
+ if (max_cos_est < 0)
+ return max_cos_est;
+ is_vf = set_is_vf(ent->driver_data);
+ cnic_cnt = is_vf ? 0 : 1;
- WARN_ON(!max_non_def_sbs);
+ max_non_def_sbs = bnx2x_get_num_non_def_sbs(pdev, cnic_cnt, is_vf);
/* Maximum number of RSS queues: one IGU SB goes to CNIC */
- rss_count = max_non_def_sbs - cnic_cnt;
+ rss_count = is_vf ? 1 : max_non_def_sbs - cnic_cnt;
+
+ if (rss_count < 1)
+ return -EINVAL;
/* Maximum number of netdev Rx queues: RSS + FCoE L2 */
rx_count = rss_count + cnic_cnt;
- /*
- * Maximum number of netdev Tx queues:
+ /* Maximum number of netdev Tx queues:
* Maximum TSS queues * Maximum supported number of CoS + FCoE L2
*/
tx_count = rss_count * max_cos_est + cnic_cnt;
@@ -12113,22 +12177,28 @@ static int bnx2x_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
bp = netdev_priv(dev);
+ bp->flags = 0;
+ if (is_vf)
+ bp->flags |= IS_VF_FLAG;
+
bp->igu_sb_cnt = max_non_def_sbs;
+ bp->igu_base_addr = IS_VF(bp) ? PXP_VF_ADDR_IGU_START : BAR_IGU_INTMEM;
bp->msg_enable = debug;
bp->cnic_support = cnic_cnt;
bp->cnic_probe = bnx2x_cnic_probe;
pci_set_drvdata(pdev, dev);
- rc = bnx2x_init_dev(pdev, dev, ent->driver_data);
+ rc = bnx2x_init_dev(bp, pdev, dev, ent->driver_data);
if (rc < 0) {
free_netdev(dev);
return rc;
}
+ BNX2X_DEV_INFO("This is a %s function\n",
+ IS_PF(bp) ? "physical" : "virtual");
BNX2X_DEV_INFO("Cnic support is %s\n", CNIC_SUPPORT(bp) ? "on" : "off");
- BNX2X_DEV_INFO("max_non_def_sbs %d\n", max_non_def_sbs);
-
+ BNX2X_DEV_INFO("Max num of status blocks %d\n", max_non_def_sbs);
BNX2X_DEV_INFO("Allocated netdev with %d tx and %d rx queues\n",
tx_count, rx_count);
@@ -12136,19 +12206,28 @@ static int bnx2x_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
if (rc)
goto init_one_exit;
- /*
- * Map doorbels here as we need the real value of bp->max_cos which
- * is initialized in bnx2x_init_bp().
+ /* Map doorbells here as we need the real value of bp->max_cos which
+ * is initialized in bnx2x_init_bp() to determine the number of
+ * l2 connections.
*/
- doorbell_size = BNX2X_L2_MAX_CID(bp) * (1 << BNX2X_DB_SHIFT);
- if (doorbell_size > pci_resource_len(pdev, 2)) {
- dev_err(&bp->pdev->dev,
- "Cannot map doorbells, bar size too small, aborting\n");
- rc = -ENOMEM;
- goto init_one_exit;
+ if (IS_VF(bp)) {
+ /* vf doorbells are embedded within the regview */
+ bp->doorbells = bp->regview + PXP_VF_ADDR_DB_START;
+
+ /* allocate vf2pf mailbox for vf to pf channel */
+ BNX2X_PCI_ALLOC(bp->vf2pf_mbox, &bp->vf2pf_mbox_mapping,
+ sizeof(struct bnx2x_vf_mbx_msg));
+ } else {
+ doorbell_size = BNX2X_L2_MAX_CID(bp) * (1 << BNX2X_DB_SHIFT);
+ if (doorbell_size > pci_resource_len(pdev, 2)) {
+ dev_err(&bp->pdev->dev,
+ "Cannot map doorbells, bar size too small, aborting\n");
+ rc = -ENOMEM;
+ goto init_one_exit;
+ }
+ bp->doorbells = ioremap_nocache(pci_resource_start(pdev, 2),
+ doorbell_size);
}
- bp->doorbells = ioremap_nocache(pci_resource_start(pdev, 2),
- doorbell_size);
if (!bp->doorbells) {
dev_err(&bp->pdev->dev,
"Cannot map doorbell space, aborting\n");
@@ -12158,6 +12237,7 @@ static int bnx2x_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
/* calc qm_cid_count */
bp->qm_cid_count = bnx2x_set_qm_cid_count(bp);
+ BNX2X_DEV_INFO("qm_cid_count %d\n", bp->qm_cid_count);
/* disable FCOE L2 queue for E1x*/
if (CHIP_IS_E1x(bp))
@@ -12179,13 +12259,19 @@ static int bnx2x_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
/* Configure interrupt mode: try to enable MSI-X/MSI if
* needed.
*/
- bnx2x_set_int_mode(bp);
+ rc = bnx2x_set_int_mode(bp);
+ if (rc) {
+ dev_err(&pdev->dev, "Cannot set interrupts\n");
+ goto init_one_exit;
+ }
+ /* register the net device */
rc = register_netdev(dev);
if (rc) {
dev_err(&pdev->dev, "Cannot register net device\n");
goto init_one_exit;
}
+ BNX2X_DEV_INFO("device name after netdev register %s\n", dev->name);
if (!NO_FCOE(bp)) {
@@ -12196,6 +12282,8 @@ static int bnx2x_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
}
bnx2x_get_pcie_width_speed(bp, &pcie_width, &pcie_speed);
+ BNX2X_DEV_INFO("got pcie width %d and speed %d\n",
+ pcie_width, pcie_speed);
BNX2X_DEV_INFO(
"%s (%c%d) PCI-E x%d %s found at mem %lx, IRQ %d, node addr %pM\n",
@@ -12209,11 +12297,16 @@ static int bnx2x_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
+alloc_mem_err:
+ BNX2X_PCI_FREE(bp->vf2pf_mbox, bp->vf2pf_mbox_mapping,
+ sizeof(struct bnx2x_vf_mbx_msg));
+ rc = -ENOMEM;
+
init_one_exit:
if (bp->regview)
iounmap(bp->regview);
- if (bp->doorbells)
+ if (IS_PF(bp) && bp->doorbells)
iounmap(bp->doorbells);
free_netdev(dev);
@@ -12253,13 +12346,15 @@ static void bnx2x_remove_one(struct pci_dev *pdev)
unregister_netdev(dev);
/* Power on: we can't let PCI layer write to us while we are in D3 */
- bnx2x_set_power_state(bp, PCI_D0);
+ if (IS_PF(bp))
+ bnx2x_set_power_state(bp, PCI_D0);
/* Disable MSI/MSI-X */
bnx2x_disable_msi(bp);
/* Power off */
- bnx2x_set_power_state(bp, PCI_D3hot);
+ if (IS_PF(bp))
+ bnx2x_set_power_state(bp, PCI_D3hot);
/* Make sure RESET task is not scheduled before continuing */
cancel_delayed_work_sync(&bp->sp_rtnl_task);
@@ -12267,11 +12362,15 @@ static void bnx2x_remove_one(struct pci_dev *pdev)
if (bp->regview)
iounmap(bp->regview);
- if (bp->doorbells)
- iounmap(bp->doorbells);
-
- bnx2x_release_firmware(bp);
+ /* for vf doorbells are part of the regview and were unmapped along with
+ * it. FW is only loaded by PF.
+ */
+ if (IS_PF(bp)) {
+ if (bp->doorbells)
+ iounmap(bp->doorbells);
+ bnx2x_release_firmware(bp);
+ }
bnx2x_free_mem_bp(bp);
free_netdev(dev);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
index bc2f65b..463a984 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
@@ -6554,6 +6554,15 @@
(7L<<ME_REG_ABS_PF_NUM_SHIFT) /* Absolute PF Num */
+#define PXP_VF_ADDR_IGU_START 0
+#define PXP_VF_ADDR_IGU_SIZE 0x3000
+#define PXP_VF_ADDR_IGU_END\
+ ((PXP_VF_ADDR_IGU_START) + (PXP_VF_ADDR_IGU_SIZE) - 1)
+#define PXP_VF_ADDR_DB_START 0x7c00
+#define PXP_VF_ADDR_DB_SIZE 0x200
+#define PXP_VF_ADDR_DB_END\
+ ((PXP_VF_ADDR_DB_START) + (PXP_VF_ADDR_DB_SIZE) - 1)
+
#define MDIO_REG_BANK_CL73_IEEEB0 0x0
#define MDIO_CL73_IEEEB0_CL73_AN_CONTROL 0x0
#define MDIO_CL73_IEEEB0_CL73_AN_CONTROL_RESTART_AN 0x0200
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
new file mode 100644
index 0000000..1b14745
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
@@ -0,0 +1,27 @@
+/* bnx2x_sriov.h: Broadcom Everest network driver.
+ *
+ * Copyright 2009-2012 Broadcom Corporation
+ *
+ * Unless you and Broadcom execute a separate written software license
+ * agreement governing use of this software, this software is licensed to you
+ * under the terms of the GNU General Public License version 2, available
+ * at http://www.gnu.org/licenses/old-licenses/gpl-2.0.html (the "GPL").
+ *
+ * Notwithstanding the above, under no circumstances may you combine this
+ * software in any way with any other Broadcom software provided under a
+ * license other than the GPL, without Broadcom's express prior written
+ * consent.
+ *
+ * Maintained by: Eilon Greenstein <eilong@broadcom.com>
+ * Written by: Shmulik Ravid <shmulikr@broadcom.com>
+ * Ariel Elior <ariele@broadcom.com>
+ */
+#ifndef BNX2X_SRIOV_H
+#define BNX2X_SRIOV_H
+
+struct bnx2x_vf_mbx_msg {
+ union vfpf_tlvs req;
+ union pfvf_tlvs resp;
+};
+
+#endif /* bnx2x_sriov.h */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h
new file mode 100644
index 0000000..bb37675
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h
@@ -0,0 +1,37 @@
+/* bnx2x_vfpf.h: Broadcom Everest network driver.
+ *
+ * Copyright (c) 2011-2012 Broadcom Corporation
+ *
+ * Unless you and Broadcom execute a separate written software license
+ * agreement governing use of this software, this software is licensed to you
+ * under the terms of the GNU General Public License version 2, available
+ * at http://www.gnu.org/licenses/old-licenses/gpl-2.0.html (the "GPL").
+ *
+ * Notwithstanding the above, under no circumstances may you combine this
+ * software in any way with any other Broadcom software provided under a
+ * license other than the GPL, without Broadcom's express prior written
+ * consent.
+ *
+ * Maintained by: Eilon Greenstein <eilong@broadcom.com>
+ * Written by: Ariel Elior <ariele@broadcom.com>
+ */
+#ifndef VF_PF_IF_H
+#define VF_PF_IF_H
+
+/* HW VF-PF channel definitions
+ * A.K.A VF-PF mailbox
+ */
+#define TLV_BUFFER_SIZE 1024
+
+struct tlv_buffer_size {
+ u8 tlv_buffer[TLV_BUFFER_SIZE];
+};
+
+union vfpf_tlvs {
+ struct tlv_buffer_size tlv_buf_size;
+};
+
+union pfvf_tlvs {
+ struct tlv_buffer_size tlv_buf_size;
+};
+#endif /* VF_PF_IF_H */
--
1.7.9.GIT
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox