Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: 82571EB: Detected Hardware Unit Hang
From: Dave, Tushar N @ 2012-07-11 18:51 UTC (permalink / raw)
  To: Joe Jin
  Cc: e1000-devel@lists.sf.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Dave, Tushar N
In-Reply-To: <4FFD0908.6080506@oracle.com>

>-----Original Message-----
>From: Joe Jin [mailto:joe.jin@oracle.com]
>Sent: Tuesday, July 10, 2012 10:03 PM
>To: Dave, Tushar N
>Cc: e1000-devel@lists.sf.net; netdev@vger.kernel.org; linux-
>kernel@vger.kernel.org
>Subject: Re: 82571EB: Detected Hardware Unit Hang
>
>On 07/11/12 12:05, Dave, Tushar N wrote:
>> When you said you had this issue with RHEL5 and RHEL6 drivers, have you
>install RHEl5/6 kernel and reproduced it? If so I think I should install
>RHEL6 and try reproduce it locally!
>>
>Yes I reproduced this on both RHEL5 and RHEL6.
>
>So far I tried to scp big file (~1GB) will hit it at once.
>
>Thanks,
>Joe

Joe,

I see couple of errors in lspci output.
Device capability status register shows UnCorrectable PCIe error. This means there is certainly something went wrong. The only way to recover from Uncorrectable errors is reset.

	DevSta:	CorrErr- *UncorrErr+ FatalErr+ UnsuppReq+ AuxPwr+ TransPend-

Also AER sections in lspci output shows PCIe completion timeout.

	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- *CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq+ ACSViol-

I suggest you should load AER driver and check for any error messages in log. Also please check any error message reported by system in BIOS log. Are there any machine check errors? 

When did you notice this issue? have 82571 ever been working before on this server?

One more thing, Cache line size 256 is little unusual( I never seen this value before, mostly it's 64). Does BIOS settings have been changed? Are you using default BIOS setting?

Thanks.

-Tushar

^ permalink raw reply

* [PATCH net-next] netxen: fix link notification order
From: Flavio Leitner @ 2012-07-11 18:56 UTC (permalink / raw)
  To: netdev; +Cc: Sony Chacko, Rajesh Borundia, Flavio Leitner

First update the adapter variables with the current speed and
mode before fire the notification. Otherwise, the get_settings()
may provide old values.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---
 drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
index b2c1b676..bc165f4 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
@@ -1437,8 +1437,6 @@ netxen_handle_linkevent(struct netxen_adapter *adapter, nx_fw_msg_t *msg)
 				netdev->name, cable_len);
 	}
 
-	netxen_advert_link_change(adapter, link_status);
-
 	/* update link parameters */
 	if (duplex == LINKEVENT_FULL_DUPLEX)
 		adapter->link_duplex = DUPLEX_FULL;
@@ -1447,6 +1445,8 @@ netxen_handle_linkevent(struct netxen_adapter *adapter, nx_fw_msg_t *msg)
 	adapter->module_type = module;
 	adapter->link_autoneg = autoneg;
 	adapter->link_speed = link_speed;
+
+	netxen_advert_link_change(adapter, link_status);
 }
 
 static void
-- 
1.7.10.4

^ permalink raw reply related

* Re: pull request: wireless-next 2012-07-11
From: Rafał Miłecki @ 2012-07-11 19:00 UTC (permalink / raw)
  To: John W. Linville
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120711183320.GB1937-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

2012/7/11 John W. Linville <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>:
> Ugh, forgot the signature again...
>
> On Wed, Jul 11, 2012 at 02:17:22PM -0400, John W. Linville wrote:
>> Rafał Miłecki (2):
>>       b43: N-PHY: fix RSSI calibration
>>       bcma: use custom printing functions

MIPS users: bcma patch breaks compilation for you (sorry!), please apply
bcma: fix CC driver compilation on MIPS
locally before it gets merged.

-- 
Rafał
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Send your details to claim £950,000.00 that has been awarded to your e-mail by Benz Promotional Company. Send Name:, Address:, and Tel:
From: Benz Promo @ 2012-07-11 19:17 UTC (permalink / raw)




^ permalink raw reply

* Re: [PATCH v3 net-next] tcp: TCP Small Queues
From: Nandita Dukkipati @ 2012-07-11 19:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, codel, ncardwell, David Miller, mattmathis
In-Reply-To: <1342021831.3265.8174.camel@edumazet-glaptop>


[-- Attachment #1.1: Type: text/plain, Size: 19657 bytes --]

I have a couple of high level questions on the two solutions for maintain
small queues at qdisc: 1) using codel's feedback versus 2) the approach in
TSQ to explicitly limit bytes per-flow to tcp_limit_output_bytes.

1. multiple flows case: codel feedback (be it drop/ECN or direct feedback
like in your patch with fq_codel: report congestion notification at enqueue
time) seems to work in maintaining small queues regardless of the number of
connections. TSQ probably won't achieve this goal when number of flows is
large. Is this correct?

2. cwnd adjustment: codel feedback directly and explicitly controls
snd_cwnd which in turn will also adjust the send buffer size to appropriate
value. That's a nice property, which TSQ probably won't have because it
doesn't explicitly adjust snd_cwnd.

Considering these two points, why TSQ over the Codel feedback approach?

On Wed, Jul 11, 2012 at 8:50 AM, Eric Dumazet <eric.dumazet@gmail.com>wrote:

> This introduce TSQ (TCP Small Queues)
>
> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
> problem.
>
> sk->sk_wmem_alloc not allowed to grow above a given limit,
> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
> given time.
>
> TSO packets are sized/capped to half the limit, so that we have two
> TSO packets in flight, allowing better bandwidth use.
>
> As a side effect, setting the limit to 40000 automatically reduces the
> standard gso max limit (65536) to 40000/2 : It can help to reduce
> latencies of high prio packets, having smaller TSO packets.
>
> This means we divert sock_wfree() to a tcp_wfree() handler, to
> queue/send following frames when skb_orphan() [2] is called for the
> already queued skbs.
>
> Results on my dev machines (tg3/ixgbe nics) are really impressive,
> using standard pfifo_fast, and with or without TSO/GSO.
>
> Without reduction of nominal bandwidth, we have reduction of buffering
> per bulk sender :
> < 1ms on Gbit (instead of 50ms with TSO)
> < 8ms on 100Mbit (instead of 132 ms)
>
> I no longer have 4 MBytes backlogged in qdisc by a single netperf
> session, and both side socket autotuning no longer use 4 Mbytes.
>
> As skb destructor cannot restart xmit itself ( as qdisc lock might be
> taken at this point ), we delegate the work to a tasklet. We use one
> tasklest per cpu for performance reasons.
>
> If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
> This flag is tested in a new protocol method called from release_sock(),
> to eventually send new segments.
>
> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
> [2] skb_orphan() is usually called at TX completion time,
>   but some drivers call it in their start_xmit() handler.
>   These drivers should at least use BQL, or else a single TCP
>   session can still fill the whole NIC TX ring, since TSQ will
>   have no effect.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Dave Taht <dave.taht@bufferbloat.net>
> Cc: Tom Herbert <therbert@google.com>
> Cc: Matt Mathis <mattmathis@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Nandita Dukkipati <nanditad@google.com>
> ---
>  Documentation/networking/ip-sysctl.txt |   14 ++
>  include/linux/tcp.h                    |    9 +
>  include/net/sock.h                     |    2
>  include/net/tcp.h                      |    4
>  net/core/sock.c                        |    4
>  net/ipv4/sysctl_net_ipv4.c             |    7 +
>  net/ipv4/tcp.c                         |    6
>  net/ipv4/tcp_ipv4.c                    |    1
>  net/ipv4/tcp_minisocks.c               |    1
>  net/ipv4/tcp_output.c                  |  154 ++++++++++++++++++++++-
>  net/ipv6/tcp_ipv6.c                    |    1
>  11 files changed, 202 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt
> b/Documentation/networking/ip-sysctl.txt
> index 47b6c79..e20c17a 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN
>         Documentation/networking/tcp-thin.txt
>         Default: 0
>
> +tcp_limit_output_bytes - INTEGER
> +       Controls TCP Small Queue limit per tcp socket.
> +       TCP bulk sender tends to increase packets in flight until it
> +       gets losses notifications. With SNDBUF autotuning, this can
> +       result in a large amount of packets queued in qdisc/device
> +       on the local machine, hurting latency of other flows, for
> +       typical pfifo_fast qdiscs.
> +       tcp_limit_output_bytes limits the number of bytes on qdisc
> +       or device to reduce artificial RTT/cwnd and reduce bufferbloat.
> +       Note: For GSO/TSO enabled flows, we try to have at least two
> +       packets in flight. Reducing tcp_limit_output_bytes might also
> +       reduce the size of individual GSO packet (64KB being the max)
> +       Default: 131072
> +
>  UDP variables:
>
>  udp_mem - vector of 3 INTEGERs: min, pressure, max
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 2de9cf4..1888169 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -339,6 +339,9 @@ struct tcp_sock {
>         u32     rcv_tstamp;     /* timestamp of last received ACK (for
> keepalives) */
>         u32     lsndtime;       /* timestamp of last sent data packet (for
> restart window) */
>
> +       struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
> +       unsigned long   tsq_flags;
> +
>         /* Data for direct copy to user */
>         struct {
>                 struct sk_buff_head     prequeue;
> @@ -494,6 +497,12 @@ struct tcp_sock {
>         struct tcp_cookie_values  *cookie_values;
>  };
>
> +enum tsq_flags {
> +       TSQ_THROTTLED,
> +       TSQ_QUEUED,
> +       TSQ_OWNED, /* tcp_tasklet_func() found socket was locked */
> +};
> +
>  static inline struct tcp_sock *tcp_sk(const struct sock *sk)
>  {
>         return (struct tcp_sock *)sk;
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 640432a..eefce84 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -858,6 +858,8 @@ struct proto {
>         int                     (*backlog_rcv) (struct sock *sk,
>                                                 struct sk_buff *skb);
>
> +       void            (*release_cb)(struct sock *sk);
> +
>         /* Keeping track of sk's, looking them up, and port selection
> methods. */
>         void                    (*hash)(struct sock *sk);
>         void                    (*unhash)(struct sock *sk);
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 3618fef..439984b 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -253,6 +253,7 @@ extern int sysctl_tcp_cookie_size;
>  extern int sysctl_tcp_thin_linear_timeouts;
>  extern int sysctl_tcp_thin_dupack;
>  extern int sysctl_tcp_early_retrans;
> +extern int sysctl_tcp_limit_output_bytes;
>
>  extern atomic_long_t tcp_memory_allocated;
>  extern struct percpu_counter tcp_sockets_allocated;
> @@ -321,6 +322,8 @@ extern struct proto tcp_prot;
>
>  extern void tcp_init_mem(struct net *net);
>
> +extern void tcp_tasklet_init(void);
> +
>  extern void tcp_v4_err(struct sk_buff *skb, u32);
>
>  extern void tcp_shutdown (struct sock *sk, int how);
> @@ -334,6 +337,7 @@ extern int tcp_sendmsg(struct kiocb *iocb, struct sock
> *sk, struct msghdr *msg,
>                        size_t size);
>  extern int tcp_sendpage(struct sock *sk, struct page *page, int offset,
>                         size_t size, int flags);
> +extern void tcp_release_cb(struct sock *sk);
>  extern int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg);
>  extern int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
>                                  const struct tcphdr *th, unsigned int
> len);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 929bdcc..24039ac 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2159,6 +2159,10 @@ void release_sock(struct sock *sk)
>         spin_lock_bh(&sk->sk_lock.slock);
>         if (sk->sk_backlog.tail)
>                 __release_sock(sk);
> +
> +       if (sk->sk_prot->release_cb)
> +               sk->sk_prot->release_cb(sk);
> +
>         sk->sk_lock.owned = 0;
>         if (waitqueue_active(&sk->sk_lock.wq))
>                 wake_up(&sk->sk_lock.wq);
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 12aa0c5..70730f7 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -598,6 +598,13 @@ static struct ctl_table ipv4_table[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec
>         },
> +       {
> +               .procname       = "tcp_limit_output_bytes",
> +               .data           = &sysctl_tcp_limit_output_bytes,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec
> +       },
>  #ifdef CONFIG_NET_DMA
>         {
>                 .procname       = "tcp_dma_copybreak",
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index d902da9..4252cd8 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -376,6 +376,7 @@ void tcp_init_sock(struct sock *sk)
>         skb_queue_head_init(&tp->out_of_order_queue);
>         tcp_init_xmit_timers(sk);
>         tcp_prequeue_init(tp);
> +       INIT_LIST_HEAD(&tp->tsq_node);
>
>         icsk->icsk_rto = TCP_TIMEOUT_INIT;
>         tp->mdev = TCP_TIMEOUT_INIT;
> @@ -796,6 +797,10 @@ static unsigned int tcp_xmit_size_goal(struct sock
> *sk, u32 mss_now,
>                                   inet_csk(sk)->icsk_ext_hdr_len -
>                                   tp->tcp_header_len);
>
> +               /* TSQ : try to have two TSO segments in flight */
> +               xmit_size_goal = min_t(u32, xmit_size_goal,
> +                                      sysctl_tcp_limit_output_bytes >> 1);
> +
>                 xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
>
>                 /* We try hard to avoid divides here */
> @@ -3574,4 +3579,5 @@ void __init tcp_init(void)
>         tcp_secret_primary = &tcp_secret_one;
>         tcp_secret_retiring = &tcp_secret_two;
>         tcp_secret_secondary = &tcp_secret_two;
> +       tcp_tasklet_init();
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index ddefd39..01545a3 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -2588,6 +2588,7 @@ struct proto tcp_prot = {
>         .sendmsg                = tcp_sendmsg,
>         .sendpage               = tcp_sendpage,
>         .backlog_rcv            = tcp_v4_do_rcv,
> +       .release_cb             = tcp_release_cb,
>         .hash                   = inet_hash,
>         .unhash                 = inet_unhash,
>         .get_port               = inet_csk_get_port,
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index 6560886..c66f2ed 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -424,6 +424,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk,
> struct request_sock *req,
>                         treq->snt_isn + 1 + tcp_s_data_size(oldtp);
>
>                 tcp_prequeue_init(newtp);
> +               INIT_LIST_HEAD(&newtp->tsq_node);
>
>                 tcp_init_wl(newtp, treq->rcv_isn);
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index c465d3e..03854ab 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -50,6 +50,9 @@ int sysctl_tcp_retrans_collapse __read_mostly = 1;
>   */
>  int sysctl_tcp_workaround_signed_windows __read_mostly = 0;
>
> +/* Default TSQ limit of two TSO segments */
> +int sysctl_tcp_limit_output_bytes __read_mostly = 131072;
> +
>  /* This limits the percentage of the congestion window which we
>   * will allow a single TSO frame to consume.  Building TSO frames
>   * which are too large can cause TCP streams to be bursty.
> @@ -65,6 +68,8 @@ int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
>  int sysctl_tcp_cookie_size __read_mostly = 0; /* TCP_COOKIE_MAX */
>  EXPORT_SYMBOL_GPL(sysctl_tcp_cookie_size);
>
> +static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int
> nonagle,
> +                          int push_one, gfp_t gfp);
>
>  /* Account for new data that has been sent to the network. */
>  static void tcp_event_new_data_sent(struct sock *sk, const struct sk_buff
> *skb)
> @@ -783,6 +788,140 @@ static unsigned int tcp_established_options(struct
> sock *sk, struct sk_buff *skb
>         return size;
>  }
>
> +
> +/* TCP SMALL QUEUES (TSQ)
> + *
> + * TSQ goal is to keep small amount of skbs per tcp flow in tx queues
> (qdisc+dev)
> + * to reduce RTT and bufferbloat.
> + * We do this using a special skb destructor (tcp_wfree).
> + *
> + * Its important tcp_wfree() can be replaced by sock_wfree() in the event
> skb
> + * needs to be reallocated in a driver.
> + * The invariant being skb->truesize substracted from sk->sk_wmem_alloc
> + *
> + * Since transmit from skb destructor is forbidden, we use a tasklet
> + * to process all sockets that eventually need to send more skbs.
> + * We use one tasklet per cpu, with its own queue of sockets.
> + */
> +struct tsq_tasklet {
> +       struct tasklet_struct   tasklet;
> +       struct list_head        head; /* queue of tcp sockets */
> +};
> +static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet);
> +
> +/*
> + * One tasklest per cpu tries to send more skbs.
> + * We run in tasklet context but need to disable irqs when
> + * transfering tsq->head because tcp_wfree() might
> + * interrupt us (non NAPI drivers)
> + */
> +static void tcp_tasklet_func(unsigned long data)
> +{
> +       struct tsq_tasklet *tsq = (struct tsq_tasklet *)data;
> +       LIST_HEAD(list);
> +       unsigned long flags;
> +       struct list_head *q, *n;
> +       struct tcp_sock *tp;
> +       struct sock *sk;
> +
> +       local_irq_save(flags);
> +       list_splice_init(&tsq->head, &list);
> +       local_irq_restore(flags);
> +
> +       list_for_each_safe(q, n, &list) {
> +               tp = list_entry(q, struct tcp_sock, tsq_node);
> +               list_del(&tp->tsq_node);
> +
> +               sk = (struct sock *)tp;
> +               bh_lock_sock(sk);
> +
> +               if (!sock_owned_by_user(sk)) {
> +                       if ((1 << sk->sk_state) &
> +                           (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
> +                            TCPF_CLOSING | TCPF_CLOSE_WAIT))
> +                               tcp_write_xmit(sk,
> +                                              tcp_current_mss(sk),
> +                                              0, 0,
> +                                              GFP_ATOMIC);
> +               } else {
> +                       /* defer the work to tcp_release_cb() */
> +                       set_bit(TSQ_OWNED, &tp->tsq_flags);
> +               }
> +               bh_unlock_sock(sk);
> +
> +               clear_bit(TSQ_QUEUED, &tp->tsq_flags);
> +               sk_free(sk);
> +       }
> +}
> +
> +/**
> + * tcp_release_cb - tcp release_sock() callback
> + * @sk: socket
> + *
> + * called from release_sock() to perform protocol dependent
> + * actions before socket release.
> + */
> +void tcp_release_cb(struct sock *sk)
> +{
> +       struct tcp_sock *tp = tcp_sk(sk);
> +
> +       if (test_and_clear_bit(TSQ_OWNED, &tp->tsq_flags)) {
> +               if ((1 << sk->sk_state) &
> +                   (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
> +                    TCPF_CLOSING | TCPF_CLOSE_WAIT))
> +                       tcp_write_xmit(sk,
> +                                      tcp_current_mss(sk),
> +                                      0, 0,
> +                                      GFP_ATOMIC);
> +       }
> +}
> +EXPORT_SYMBOL(tcp_release_cb);
> +
> +void __init tcp_tasklet_init(void)
> +{
> +       int i;
> +
> +       for_each_possible_cpu(i) {
> +               struct tsq_tasklet *tsq = &per_cpu(tsq_tasklet, i);
> +
> +               INIT_LIST_HEAD(&tsq->head);
> +               tasklet_init(&tsq->tasklet,
> +                            tcp_tasklet_func,
> +                            (unsigned long)tsq);
> +       }
> +}
> +
> +/*
> + * Write buffer destructor automatically called from kfree_skb.
> + * We cant xmit new skbs from this context, as we might already
> + * hold qdisc lock.
> + */
> +void tcp_wfree(struct sk_buff *skb)
> +{
> +       struct sock *sk = skb->sk;
> +       struct tcp_sock *tp = tcp_sk(sk);
> +
> +       if (test_and_clear_bit(TSQ_THROTTLED, &tp->tsq_flags) &&
> +           !test_and_set_bit(TSQ_QUEUED, &tp->tsq_flags)) {
> +               unsigned long flags;
> +               struct tsq_tasklet *tsq;
> +
> +               /* Keep a ref on socket.
> +                * This last ref will be released in tcp_tasklet_func()
> +                */
> +               atomic_sub(skb->truesize - 1, &sk->sk_wmem_alloc);
> +
> +               /* queue this socket to tasklet queue */
> +               local_irq_save(flags);
> +               tsq = &__get_cpu_var(tsq_tasklet);
> +               list_add(&tp->tsq_node, &tsq->head);
> +               tasklet_schedule(&tsq->tasklet);
> +               local_irq_restore(flags);
> +       } else {
> +               sock_wfree(skb);
> +       }
> +}
> +
>  /* This routine actually transmits TCP packets queued in by
>   * tcp_do_sendmsg().  This is used by both the initial
>   * transmission and possible later retransmissions.
> @@ -844,7 +983,12 @@ static int tcp_transmit_skb(struct sock *sk, struct
> sk_buff *skb, int clone_it,
>
>         skb_push(skb, tcp_header_size);
>         skb_reset_transport_header(skb);
> -       skb_set_owner_w(skb, sk);
> +
> +       skb_orphan(skb);
> +       skb->sk = sk;
> +       skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ?
> +                         tcp_wfree : sock_wfree;
> +       atomic_add(skb->truesize, &sk->sk_wmem_alloc);
>
>         /* Build TCP header and checksum it. */
>         th = tcp_hdr(skb);
> @@ -1780,6 +1924,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned
> int mss_now, int nonagle,
>         while ((skb = tcp_send_head(sk))) {
>                 unsigned int limit;
>
> +
>                 tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
>                 BUG_ON(!tso_segs);
>
> @@ -1800,6 +1945,13 @@ static bool tcp_write_xmit(struct sock *sk,
> unsigned int mss_now, int nonagle,
>                                 break;
>                 }
>
> +               /* TSQ : sk_wmem_alloc accounts skb truesize,
> +                * including skb overhead. But thats OK.
> +                */
> +               if (atomic_read(&sk->sk_wmem_alloc) >=
> sysctl_tcp_limit_output_bytes) {
> +                       set_bit(TSQ_THROTTLED, &tp->tsq_flags);
> +                       break;
> +               }
>                 limit = mss_now;
>                 if (tso_segs > 1 && !tcp_urg_mode(tp))
>                         limit = tcp_mss_split_point(sk, skb, mss_now,
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 61175cb..70458a9 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1970,6 +1970,7 @@ struct proto tcpv6_prot = {
>         .sendmsg                = tcp_sendmsg,
>         .sendpage               = tcp_sendpage,
>         .backlog_rcv            = tcp_v6_do_rcv,
> +       .release_cb             = tcp_release_cb,
>         .hash                   = tcp_v6_hash,
>         .unhash                 = inet_unhash,
>         .get_port               = inet_csk_get_port,
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 22042 bytes --]

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply

* Re: [PATCH 4/4] asix: Add a new driver for the AX88172A
From: Michael Riesch @ 2012-07-11 19:23 UTC (permalink / raw)
  To: Christian Riesch
  Cc: Ben Hutchings, netdev, Oliver Neukum, Eric Dumazet, Allan Chou,
	Mark Lord, Grant Grundler, Ming Lei
In-Reply-To: <CABkLObq+JtgECK=Q6dxOgDEZ++GxfLzutHGbWzt2uD5LStKjwg@mail.gmail.com>

On Wed, 2012-07-11 at 17:10 +0200, Christian Riesch wrote:
> Hi again,
[...]
> >>> Using the ifindex sounds good to me,
> >>>
> >>> snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "asix-%d",
> >>>         dev->net->ifindex);
> >>>
> >>> works on any system with less than 10^12 network interfaces.
> >>
> >> Ok, I'll change that to use ifindex.
> >
> > No, I won't.
> > At the time the mdio bus is registered, ifindex is not yet set, so the
> > snprintf would always result in "asix-0".
> 
> What do you think about
> snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "usb-%03d:%03d",
> dev->udev->bus->busnum, dev->udev->devnum);
> ??
> 
> This would use the busnum/devnum identifier as reported by lsusb and
> would be short enough for an mdio bus name.

IMHO this would be a good solution - it is unique and there you can tell
which MDIO bus belongs to which USB device by the name (not sure if
someone will ever need to do that, but it is neat).

Regards, Michael

^ permalink raw reply

* 3.5rc6 modprobe -r ip_vs oopses
From: Dave Jones @ 2012-07-11 19:07 UTC (permalink / raw)
  To: netdev; +Cc: wensong, horms, ja

Triggered during modprobe -r ip_vs
(W taint flag came from something unrelated: 80211 trying >MAX_ORDER alloc failing).

BUG: unable to handle kernel NULL pointer dereference at 0000000000000750
IP: [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
PGD ae993067 PUD 35cbb067 PMD 0 
Oops: 0000 [#1] SMP 
CPU 2 

Pid: 11534, comm: modprobe Tainted: G        WC   3.5.0-rc6+ #78
 Dell Inc.                 Precision WorkStation 490    
/0DT031

RIP: 0010:[<ffffffffa4c9280b>]  [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
RSP: 0000:ffff8800c92b7e28  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880222fa22a0 RCX: 0000000000000006
RDX: ffffffffa4caba50 RSI: 2222222222222222 RDI: 2222222222222222
RBP: ffff8800c92b7e78 R08: 2222222222222222 R09: 2222222222222222
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82bec540
R13: ffffffff82bec638 R14: ffffffff82bec540 R15: ffffffffa4caba40
FS:  00007f652be40700(0000) GS:ffff880226a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000750 CR3: 0000000035ea4000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 11534, threadinfo ffff8800c92b6000, task ffff8801278d4900)
Stack:
 2222222222222222 0000000000000100 ffffffffa4caba50 ffffffffa4caa7a0
 ffffffffa4ca8750 ffff880222fa22a0 ffffffffa4ca8750 ffffffff82bec638
 ffffffff82bec540 0000000000000000 ffff8800c92b7eb8 ffffffff8156adcf

Call Trace:
 [<ffffffff8156adcf>] unregister_netdevice_notifier+0x7f/0x100
 [<ffffffffa4c96005>] ip_vs_control_cleanup+0x15/0x20 [ip_vs]
 [<ffffffffa4ca33b2>] ip_vs_cleanup+0x41/0xc8f [ip_vs]
 [<ffffffff810e0b8c>] sys_delete_module+0x19c/0x2c0
 [<ffffffff816c1615>] ? sysret_check+0x22/0x5d
 [<ffffffff8133c04e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816c15e9>] system_call_fastpath+0x16/0x1b
Code: 18 75 cd 4c 89 ef e8 35 48 00 00 eb c3 0f 1f 00 48 83 45 b8 01 48 81 
7d b8 00 01 00 00 0f 85 cd fe ff ff 49 8b 84 24 c0 0c 00 00 <4c> 8b a8 50 07 
00 00 48 05 50 07 00 00 49 39 c5 75 38 48 c7 c7 

RIP 
 [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
 RSP <ffff8800c92b7e28>
CR2: 0000000000000750
---[ end trace 83bf880b95bd63b8 ]---

That's here...

        list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
    226b:       4c 8b a8 50 07 00 00    mov    0x750(%rax),%r13

^ permalink raw reply

* Re: pull request: wireless-next 2012-07-11
From: John W. Linville @ 2012-07-11 20:19 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CACna6rwN+tBu=BG0TR2udq3PH9qvPg44jJQPGc4G3jH_t2=GjQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, Jul 11, 2012 at 09:00:50PM +0200, Rafał Miłecki wrote:
> 2012/7/11 John W. Linville <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>:
> > Ugh, forgot the signature again...
> >
> > On Wed, Jul 11, 2012 at 02:17:22PM -0400, John W. Linville wrote:
> >> Rafał Miłecki (2):
> >>       b43: N-PHY: fix RSSI calibration
> >>       bcma: use custom printing functions
> 
> MIPS users: bcma patch breaks compilation for you (sorry!), please apply
> bcma: fix CC driver compilation on MIPS
> locally before it gets merged.

Ugh...

Dave, I've got that patch queued-up now.  Disregard this request and
I'll send you another one tomorrow or so.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org			might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* YOUR DIPLOMAT HAS NOW ARRIVE WITH YOUR CASH CONSIGNMENT BOX AT J.F.KENNEDY INTERNATIONAL AIRPORT NEW YORK, CALL HIM NOW ( 1-347 441 3406) email: (john4321@yahoo.cn)
From: OFFICE MAIL @ 2012-07-11 20:37 UTC (permalink / raw)
  To: Recipients

This is to notify you that your diplomat JOHN BENARD has now arrive with your cash consignment box loaded with $usd1.5m at J.F.Kennedy international airport new york city,from Dr.sanusi lamido the Governor of Central Bank of Nigeria.Contact him now with your Name home,address also your valid phone number through his email (john4321@yahoo.cn) for the release of your fund.

^ permalink raw reply

* Re: 3.5rc6 modprobe -r ip_vs oopses
From: Julian Anastasov @ 2012-07-11 20:59 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, wensong, horms
In-Reply-To: <20120711190722.GA31185@redhat.com>


	Hello,

On Wed, 11 Jul 2012, Dave Jones wrote:

> Triggered during modprobe -r ip_vs
> (W taint flag came from something unrelated: 80211 trying >MAX_ORDER alloc failing).
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000750
> IP: [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
> PGD ae993067 PUD 35cbb067 PMD 0 
> Oops: 0000 [#1] SMP 
> CPU 2 
> 
> Pid: 11534, comm: modprobe Tainted: G        WC   3.5.0-rc6+ #78
>  Dell Inc.                 Precision WorkStation 490    
> /0DT031
> 
> RIP: 0010:[<ffffffffa4c9280b>]  [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
> RSP: 0000:ffff8800c92b7e28  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff880222fa22a0 RCX: 0000000000000006
> RDX: ffffffffa4caba50 RSI: 2222222222222222 RDI: 2222222222222222
> RBP: ffff8800c92b7e78 R08: 2222222222222222 R09: 2222222222222222
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82bec540
> R13: ffffffff82bec638 R14: ffffffff82bec540 R15: ffffffffa4caba40
> FS:  00007f652be40700(0000) GS:ffff880226a00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000750 CR3: 0000000035ea4000 CR4: 00000000000007e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process modprobe (pid: 11534, threadinfo ffff8800c92b6000, task ffff8801278d4900)
> Stack:
>  2222222222222222 0000000000000100 ffffffffa4caba50 ffffffffa4caa7a0
>  ffffffffa4ca8750 ffff880222fa22a0 ffffffffa4ca8750 ffffffff82bec638
>  ffffffff82bec540 0000000000000000 ffff8800c92b7eb8 ffffffff8156adcf
> 
> Call Trace:
>  [<ffffffff8156adcf>] unregister_netdevice_notifier+0x7f/0x100
>  [<ffffffffa4c96005>] ip_vs_control_cleanup+0x15/0x20 [ip_vs]
>  [<ffffffffa4ca33b2>] ip_vs_cleanup+0x41/0xc8f [ip_vs]
>  [<ffffffff810e0b8c>] sys_delete_module+0x19c/0x2c0
>  [<ffffffff816c1615>] ? sysret_check+0x22/0x5d
>  [<ffffffff8133c04e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff816c15e9>] system_call_fastpath+0x16/0x1b
> Code: 18 75 cd 4c 89 ef e8 35 48 00 00 eb c3 0f 1f 00 48 83 45 b8 01 48 81 
> 7d b8 00 01 00 00 0f 85 cd fe ff ff 49 8b 84 24 c0 0c 00 00 <4c> 8b a8 50 07 
> 00 00 48 05 50 07 00 00 49 39 c5 75 38 48 c7 c7 
> 
> RIP 
>  [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
>  RSP <ffff8800c92b7e28>
> CR2: 0000000000000750
> ---[ end trace 83bf880b95bd63b8 ]---
> 
> That's here...
> 
>         list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
>     226b:       4c 8b a8 50 07 00 00    mov    0x750(%rax),%r13
> 

	Yes, I got the same oops 4 days ago, we already
have a fix:

http://archive.linuxvirtualserver.org/html/lvs-devel/2012-07/msg00002.html
http://archive.linuxvirtualserver.org/html/lvs-devel/2012-07/msg00020.html

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [PATCH 1/1] net: sched: add ipset ematch
From: Florian Westphal @ 2012-07-11 20:56 UTC (permalink / raw)
  To: netdev; +Cc: kadlec, Florian Westphal

Can be used to match packets against netfilter ip sets created via ipset(8).
skb->sk_iif is used as 'incoming interface', skb->dev is 'outgoing interface'.

Since ipset is usually called from netfilter, the ematch
initializes a fake xt_action_param, pulls the ip header into the
linear area and also sets skb->data to the IP header (otherwise
matching Layer 4 set types doesn't work).

Tested-by: Mr Dash Four <mr.dash.four@googlemail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/pkt_cls.h |    3 +-
 net/sched/Kconfig       |   10 ++++
 net/sched/Makefile      |    1 +
 net/sched/em_ipset.c    |  135 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 148 insertions(+), 1 deletions(-)
 create mode 100644 net/sched/em_ipset.c

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 38fbd4b..082eafa 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -453,7 +453,8 @@ enum {
 #define	TCF_EM_TEXT		5
 #define	TCF_EM_VLAN		6
 #define	TCF_EM_CANID		7
-#define	TCF_EM_MAX		7
+#define	TCF_EM_IPSET		8
+#define	TCF_EM_MAX		8
 
 enum {
 	TCF_EM_PROG_TC
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 4a5d2bd..62fb51f 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -517,6 +517,16 @@ config NET_EMATCH_CANID
 	  To compile this code as a module, choose M here: the
 	  module will be called em_canid.
 
+config NET_EMATCH_IPSET
+	tristate "IPset"
+	depends on NET_EMATCH && IP_SET
+	---help---
+	  Say Y here if you want to be able to classify packets based on
+	  ipset membership.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called em_ipset.
+
 config NET_CLS_ACT
 	bool "Actions"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index bcada75..978cbf0 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -56,3 +56,4 @@ obj-$(CONFIG_NET_EMATCH_U32)	+= em_u32.o
 obj-$(CONFIG_NET_EMATCH_META)	+= em_meta.o
 obj-$(CONFIG_NET_EMATCH_TEXT)	+= em_text.o
 obj-$(CONFIG_NET_EMATCH_CANID)	+= em_canid.o
+obj-$(CONFIG_NET_EMATCH_IPSET)	+= em_ipset.o
diff --git a/net/sched/em_ipset.c b/net/sched/em_ipset.c
new file mode 100644
index 0000000..3130320
--- /dev/null
+++ b/net/sched/em_ipset.c
@@ -0,0 +1,135 @@
+/*
+ * net/sched/em_ipset.c	ipset ematch
+ *
+ * Copyright (c) 2012 Florian Westphal <fw@strlen.de>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ */
+
+#include <linux/gfp.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter/xt_set.h>
+#include <linux/ipv6.h>
+#include <net/ip.h>
+#include <net/pkt_cls.h>
+
+static int em_ipset_change(struct tcf_proto *tp, void *data, int data_len,
+			   struct tcf_ematch *em)
+{
+	struct xt_set_info *set = data;
+	ip_set_id_t index;
+
+	if (data_len != sizeof(*set))
+		return -EINVAL;
+
+	index = ip_set_nfnl_get_byindex(set->index);
+	if (index == IPSET_INVALID_ID)
+		return -ENOENT;
+
+	em->datalen = sizeof(*set);
+	em->data = (unsigned long)kmemdup(data, em->datalen, GFP_KERNEL);
+	if (em->data)
+		return 0;
+
+	ip_set_nfnl_put(index);
+	return -ENOMEM;
+}
+
+static void em_ipset_destroy(struct tcf_proto *p, struct tcf_ematch *em)
+{
+	const struct xt_set_info *set = (const void *) em->data;
+	if (set) {
+		ip_set_nfnl_put(set->index);
+		kfree((void *) em->data);
+	}
+}
+
+static int em_ipset_match(struct sk_buff *skb, struct tcf_ematch *em,
+			  struct tcf_pkt_info *info)
+{
+	struct ip_set_adt_opt opt;
+	struct xt_action_param acpar;
+	const struct xt_set_info *set = (const void *) em->data;
+	struct net_device *dev, *indev = NULL;
+	int ret, network_offset;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		acpar.family = NFPROTO_IPV4;
+		if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
+			return 0;
+		acpar.thoff = ip_hdrlen(skb);
+		break;
+	case htons(ETH_P_IPV6):
+		acpar.family = NFPROTO_IPV6;
+		if (!pskb_network_may_pull(skb, sizeof(struct ipv6hdr)))
+			return 0;
+		/* doesn't call ipv6_find_hdr() because ipset doesn't use thoff, yet */
+		acpar.thoff = sizeof(struct ipv6hdr);
+		break;
+	default:
+		return 0;
+	}
+
+	acpar.hooknum = 0;
+
+	opt.family = acpar.family;
+	opt.dim = set->dim;
+	opt.flags = set->flags;
+	opt.cmdflags = 0;
+	opt.timeout = ~0u;
+
+	network_offset = skb_network_offset(skb);
+	skb_pull(skb, network_offset);
+
+	dev = skb->dev;
+
+	rcu_read_lock();
+
+	if (dev && skb->skb_iif)
+		indev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
+
+	acpar.in      = indev ? indev : dev;
+	acpar.out     = dev;
+
+	ret = ip_set_test(set->index, skb, &acpar, &opt);
+
+	rcu_read_unlock();
+
+	skb_push(skb, network_offset);
+	return ret;
+}
+
+static struct tcf_ematch_ops em_ipset_ops = {
+	.kind	  = TCF_EM_IPSET,
+	.change	  = em_ipset_change,
+	.destroy  = em_ipset_destroy,
+	.match	  = em_ipset_match,
+	.owner	  = THIS_MODULE,
+	.link	  = LIST_HEAD_INIT(em_ipset_ops.link)
+};
+
+static int __init init_em_ipset(void)
+{
+	return tcf_em_register(&em_ipset_ops);
+}
+
+static void __exit exit_em_ipset(void)
+{
+	tcf_em_unregister(&em_ipset_ops);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Florian Westphal <fw@strlen.de>");
+MODULE_DESCRIPTION("TC extended match for IP sets");
+
+module_init(init_em_ipset);
+module_exit(exit_em_ipset);
+
+MODULE_ALIAS_TCF_EMATCH(TCF_EM_IPSET);
-- 
1.7.3.4

^ permalink raw reply related

* Re: [RFC PATCH 09/10] ixgbe: Add support for displaying the number of Tx/Rx channels
From: Alexander Duyck @ 2012-07-11 21:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
	alexander.duyck
In-Reply-To: <1342030903.2613.52.camel@bwh-desktop.uk.solarflarecom.com>

On 07/11/2012 11:21 AM, Ben Hutchings wrote:
> On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
>> This patch adds support for the ethtool get_channels operation.
>>
>> Since the ixgbe driver has to support DCB as well as the other modes the
>> assumption I made here is that the number of channels in DCB modes refers
>> to the number of queues per traffic class, not the number of queues total.
> [...]
>
> When MSI-X is enabled, a 'channel' is an MSI-X vector and the associated
> queues, i.e. total number of channels reported should be the total
> number of MSI-X vectors in use.  (That was my intended interpretation,
> anyway.  It may be that there is too much variation in the way queues
> and interrupts are associated for these operations to be defined in a
> general way.)
>
> Ben.
>
The problem with the MSI-X interpretation is that ixgbe has that type of
control reversed.  We base everything on the number of queues, and then
from that you can end up determining the number of MSI-X vectors.  So
for example we could tell ixgbe via this interface to generate 64
queues, but if the system only has 8 CPUs we would end up with 8 MSI-X
vectors each with 8 queues.

Also as I mentioned in the case of DCB things get even more
complicated.  We need to have a symmetric number of queues per traffic
class based on the way we currently have DCB implemented.  The way I saw
it I could go two routes, the first being to force channels to be a
multiple of TCs which would have been complicated to deal with, or the
simpler approach I chose which was to apply 'channel' to be per TC. 
This way if DCB is then disabled we can easily revert to the standard
interpretation which would mean we would only have as many queues as the
channels specified.

Thanks,

Alex

^ permalink raw reply

* Re: [RFC PATCH 07/10] ixgbe: Add function for setting XPS queue mapping
From: Alexander Duyck @ 2012-07-11 21:12 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
	alexander.duyck
In-Reply-To: <1342030507.2613.46.camel@bwh-desktop.uk.solarflarecom.com>

On 07/11/2012 11:15 AM, Ben Hutchings wrote:
> On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
>> This change adds support for ixgbe to configure the XPS queue mapping on
>> load.  The result of this change is that on open we will now be resetting
>> the number of Tx queues, and then setting the default configuration for XPS
>> based on if ATR is enabled or disabled.
> [...]
>
> I didn't see where you're resetting the number of TX queues; was that
> actually added in an earlier patch?
>
> It seems strange to be resetting XPS configuration on open; normally net
> device configuration persists as long as the device is registered.
> Maybe only do this if the number of TX queues has to change?
>
> Ben.
>
Actually I am working on top of a set of patches for ixgbe that haven't
been submitted upstream.  In one of those patches I moved our call to
netif_set_real_num_tx_queues into ixgbe_open.  The call is only one or
two lines above the call I make to ixgbe_set_xps_mapping.

I will see what I can do about resetting the settings only when we
change the number of queues.

Thanks,

Alex

^ permalink raw reply

* [ethtool PATCH] ethtool: Resolve use of uninitialized memory in rxclass_get_dev_info
From: Alexander Duyck @ 2012-07-11 21:16 UTC (permalink / raw)
  To: netdev; +Cc: bhutchings, jeffrey.t.kirsher, alexander.duyck

The ethtool function for getting the rule count was not zeroing out the
data field before passing it to the kernel.  As a result the value started
uninitialized and was incorrectly returning a result indicating that
devices supported setting new rule indexes.  In order to correct this I am
adding a one line fix that sets data to zero before we pass the command to
the kernel.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 rxclass.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/rxclass.c b/rxclass.c
index 4d49aa6..e1633a8 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -207,6 +207,7 @@ static int rxclass_get_dev_info(struct cmd_context *ctx, __u32 *count,
 	int err;

 	nfccmd.cmd = ETHTOOL_GRXCLSRLCNT;
+	nfccmd.data = 0;
 	err = send_ioctl(ctx, &nfccmd);
 	*count = nfccmd.rule_cnt;
 	if (driver_select)

^ permalink raw reply related

* Re: [PATCH net-next] r8169: Remove rtl_ocpdr_cond
From: Francois Romieu @ 2012-07-11 22:00 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1342009916-1519-1-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> No waiting is needed for mac_ocp_{write / read}. And the bit 31 of
> OCPDR would not change, so rtl_udelay_loop_wait_high always return
> false. That is, the r8168_mac_ocp_read always retuen ~0.

(testing with davem's 48ee3569f31d91084dc694fef5517eb782428083)

It seemed right at first sight but testing without firmware produces
unexpected results. While it is not exactly fast if I ask it to emit
packets with ping -qf, it turns really slow when I use the -l preload
option: at most a few hundreds pps.

An old pcie 8168b in the same computer behaves correctly with the same
kernel.

I'll try again tomorrow evening.

-- 
Ueimor

^ permalink raw reply

* limit allocation size in ieee802154_sock_sendmsg
From: Dave Jones @ 2012-07-11 22:12 UTC (permalink / raw)
  To: netdev

Sasha reported this last November (https://lkml.org/lkml/2011/11/22/41)
and I keep walking into it while fuzz testing.

Length of the packet is passed in from userspace, and passed down to the
page allocator unchecked. If the allocation size is bigger than MAX_ORDER,
we get this backtrace from the page allocator.

 WARNING: at mm/page_alloc.c:2298 __alloc_pages_nodemask+0x46d/0xb00()
 Call Trace:
  [<ffffffff81064b2f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff81064b8a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8116105d>] __alloc_pages_nodemask+0x46d/0xb00
  [<ffffffff8155ed3b>] ? __alloc_skb+0x4b/0x230
  [<ffffffff810d43dd>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff816ace3f>] kmalloc_large_node+0x66/0xab
  [<ffffffff811ad9b5>] __kmalloc_node_track_caller+0x1b5/0x200
  [<ffffffff81559701>] ? sock_alloc_send_pskb+0x271/0x3e0
  [<ffffffff8155ed68>] __alloc_skb+0x78/0x230
  [<ffffffff81559701>] sock_alloc_send_pskb+0x271/0x3e0
  [<ffffffff8156d331>] ? dev_getfirstbyhwtype+0x161/0x250
  [<ffffffff8156d1d0>] ? rps_may_expire_flow+0x220/0x220
  [<ffffffff81559885>] sock_alloc_send_skb+0x15/0x20
  [<ffffffffa4a73347>] dgram_sendmsg+0xe7/0x340 [af_802154]
  [<ffffffffa4a72044>] ieee802154_sock_sendmsg+0x14/0x20 [af_802154]
  [<ffffffff81553787>] sock_sendmsg+0x117/0x130
  [<ffffffff810cdb7d>] ? trace_hardirqs_off+0xd/0x10
  [<ffffffff816b81c7>] ? _raw_spin_unlock_irqrestore+0x77/0x80
  [<ffffffff8134312b>] ? debug_object_activate+0x12b/0x1a0
  [<ffffffff816b81c7>] ? _raw_spin_unlock_irqrestore+0x77/0x80
  [<ffffffff81556b60>] sys_sendto+0x140/0x190
  [<ffffffff816c5014>] ? bad_to_user+0x5e/0x80a
  [<ffffffff8133c04e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff816c15e9>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Jones <davej@redhat.com>

diff --git a/net/ieee802154/af_ieee802154.c b/net/ieee802154/af_ieee802154.c
index 40e606f..1da228a 100644
--- a/net/ieee802154/af_ieee802154.c
+++ b/net/ieee802154/af_ieee802154.c
@@ -108,6 +108,9 @@ static int ieee802154_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 {
 	struct sock *sk = sock->sk;
 
+	if (len > MAX_ORDER_NR_PAGES * PAGE_SIZE)
+		return -EINVAL;
+
 	return sk->sk_prot->sendmsg(iocb, sk, msg, len);
 }
 

^ permalink raw reply related

* Re: UDP ordering when using multiple rx queue
From: Chris Friesen @ 2012-07-11 22:50 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: netdev
In-Reply-To: <CAL8zT=g5nHd6FprhxFc21XTgfXeaokt4QN1fS4ekcNVkuvZE9g@mail.gmail.com>

On 07/11/2012 01:53 AM, Jean-Michel Hautbois wrote:
> On receiver side, I need to get the packets ordered, or the
> application will consider the packets are late (and then, lost).
> (Yes, the application is badly written on that specific part, but it
> is not mine :)).

Not the first such app I've seen.

> Several tests lead to a simple conclusion : when the NIC has only one
> RX queue, everything is ok (like be2net for instance), but when it has
> more than one RX queue, then I can have "lost packets".
> This is the case for bnx2x or mlx4 for instance.

This depends on the hardware.  The Intel NICs for example (and others, 
I'm just most familiar with them) support multiple queues but they do 
hardware hashing of "flows" such that a given flow will be routed to a 
specific queue and thus stay in order.

> Here are my questions :
> - Is it possible to force a driver to use only one rx queue, even if
> it can use more without reloading the driver (and this is feasible
> only when a parameter exists for that !) ?

This depends on the driver, but generally I would expect this to be a 
module parameter.

> - Is it possible to "force" the network stack to give the packets on
> the correct order (I would say no, as this is not specified in the
> protocol) ?

No, it's up to the hardware/driver.


> My only bet is the first one (forcing one rx queue).
> The last and desperate solution would be rewriting the application,
> not easy to make it accepted.

Depending on the hardware/driver you may be able to enable flow hashing.

Chris

^ permalink raw reply

* pull request: sfc-next 2012-07-11
From: Ben Hutchings @ 2012-07-11 23:13 UTC (permalink / raw)
  To: David Miller; +Cc: linux-net-drivers, netdev

[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]

The following changes since commit c56bf6fe785abbd83751a462f0c7067f7145b97a:

  ipv6: fix a bad cast in ip6_dst_lookup_tail() (2012-07-06 00:23:41 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next.git for-davem

(commit 9a46fccdd28f9939c10f81748e1430afbb95138d)

1. 128-bit MMIO writes for TX push, by Stuart Hodgson.  We previously
   tried to do this with write-combining, but that turned out to be
   unsafe.  Now we use SSE (on x86_64 only).
2. Fix potential badness when running a self-test with SR-IOV enabled.
3. Fix calculation of some interface statistics that could run backward.
4. Miscellaneous cleanup.

Ben.

Ben Hutchings (10):
      sfc: Work around bogus 'uninitialised variable' warning
      sfc: Use generic DMA API, not PCI-DMA API
      sfc: Remove dead write to tso_state::packet_space
      sfc: Stop changing header offsets on TX
      sfc: Use strlcpy() to copy ethtool stats names
      sfc: Use dev_kfree_skb() in efx_end_loopback()
      sfc: Explain why efx_mcdi_exit_assertion() ignores result of efx_mcdi_rpc()
      sfc: Disable VF queues during register self-test
      sfc: Fix interface statistics running backward
      sfc: Correct some comments on enum reset_type

Stuart Hodgson (1):
      sfc: Implement 128-bit writes for efx_writeo_page

 drivers/net/ethernet/sfc/bitfield.h    |    5 ++
 drivers/net/ethernet/sfc/efx.c         |   10 ++--
 drivers/net/ethernet/sfc/enum.h        |    8 ++--
 drivers/net/ethernet/sfc/ethtool.c     |    2 +-
 drivers/net/ethernet/sfc/falcon.c      |   35 +++++++++++--
 drivers/net/ethernet/sfc/falcon_xmac.c |   12 ++--
 drivers/net/ethernet/sfc/filter.c      |    2 +-
 drivers/net/ethernet/sfc/io.h          |   51 +++++++++++++++++
 drivers/net/ethernet/sfc/mcdi.c        |   11 +++-
 drivers/net/ethernet/sfc/net_driver.h  |    9 ++-
 drivers/net/ethernet/sfc/nic.c         |   11 ++---
 drivers/net/ethernet/sfc/nic.h         |   18 ++++++
 drivers/net/ethernet/sfc/rx.c          |   22 ++++----
 drivers/net/ethernet/sfc/selftest.c    |   64 ++++++----------------
 drivers/net/ethernet/sfc/siena.c       |   37 ++++++++++---
 drivers/net/ethernet/sfc/tx.c          |   93 ++++++++++++++------------------
 16 files changed, 237 insertions(+), 153 deletions(-)

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH net-next 01/11] sfc: Implement 128-bit writes for efx_writeo_page
From: Ben Hutchings @ 2012-07-11 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Add support for writing a TX descriptor to the NIC in one PCIe
transaction on x86_64 machines.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/bitfield.h |    5 +++
 drivers/net/ethernet/sfc/io.h       |   51 +++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/sfc/bitfield.h b/drivers/net/ethernet/sfc/bitfield.h
index b26a954..8f8af3e 100644
--- a/drivers/net/ethernet/sfc/bitfield.h
+++ b/drivers/net/ethernet/sfc/bitfield.h
@@ -69,6 +69,10 @@
 	((width) == 32 ? ~((u32) 0) :		\
 	 (((((u32) 1) << (width))) - 1))
 
+typedef struct {
+        __le64 b, a;
+} efx_le_128;
+
 /* A doubleword (i.e. 4 byte) datatype - little-endian in HW */
 typedef union efx_dword {
 	__le32 u32[1];
@@ -83,6 +87,7 @@ typedef union efx_qword {
 
 /* An octword (eight-word, i.e. 16 byte) datatype - little-endian in HW */
 typedef union efx_oword {
+	efx_le_128 u128;
 	__le64 u64[2];
 	efx_qword_t qword[2];
 	__le32 u32[4];
diff --git a/drivers/net/ethernet/sfc/io.h b/drivers/net/ethernet/sfc/io.h
index 751d1ec..0868aef 100644
--- a/drivers/net/ethernet/sfc/io.h
+++ b/drivers/net/ethernet/sfc/io.h
@@ -57,10 +57,57 @@
  *   current state.
  */
 
+#ifdef CONFIG_X86_64
+#define EFX_USE_SSE_IO 1
+#endif
+
 #if BITS_PER_LONG == 64
 #define EFX_USE_QWORD_IO 1
 #endif
 
+#ifdef EFX_USE_SSE_IO
+
+static inline void _efx_writeo(struct efx_nic *efx, efx_le_128 value,
+			       unsigned int reg)
+{
+	unsigned long cr0;
+	efx_le_128 xmm_save[1];
+	void __iomem *addr = efx->membase + reg;
+
+	preempt_disable();
+
+	/* Save the xmm0 register to stack */
+	asm volatile(
+		"movq %%cr0,%0          ;\n\t"
+		"clts                   ;\n\t"
+		"movups %%xmm0,(%1)     ;\n\t"
+		: "=&r" (cr0)
+		: "r" (xmm_save)
+		: "memory");
+
+	/* First read the data into register xmm0
+	 * Then write this out to the address that was given
+	 */
+	asm volatile(
+		"movdqu %1,%%xmm0       ;\n\t"
+		"movdqa %%xmm0,%0       ;\n\t"
+		: "=m" (*(unsigned long __force *)addr)
+		: "m" (value)
+		: "memory");
+
+	/* Restore the xmm0 register */
+	asm volatile(
+		"sfence                 ;\n\t"
+		"movups (%1),%%xmm0     ;\n\t"
+		"movq   %0,%%cr0        ;\n\t"
+		:
+		: "r" (cr0), "r" (xmm_save)
+		: "memory");
+
+	preempt_enable();
+}
+#endif /* EFX_USE_SSE_IO */
+
 #ifdef EFX_USE_QWORD_IO
 static inline void _efx_writeq(struct efx_nic *efx, __le64 value,
 				  unsigned int reg)
@@ -235,6 +282,9 @@ static inline void _efx_writeo_page(struct efx_nic *efx, efx_oword_t *value,
 		   "writing register %x with " EFX_OWORD_FMT "\n", reg,
 		   EFX_OWORD_VAL(*value));
 
+#ifdef EFX_USE_SSE_IO
+	_efx_writeo(efx, value->u128, reg + 0);
+#else
 #ifdef EFX_USE_QWORD_IO
 	_efx_writeq(efx, value->u64[0], reg + 0);
 	_efx_writeq(efx, value->u64[1], reg + 8);
@@ -244,6 +294,7 @@ static inline void _efx_writeo_page(struct efx_nic *efx, efx_oword_t *value,
 	_efx_writed(efx, value->u32[2], reg + 8);
 	_efx_writed(efx, value->u32[3], reg + 12);
 #endif
+#endif
 }
 #define efx_writeo_page(efx, value, reg, page)				\
 	_efx_writeo_page(efx, value,					\
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 02/11] sfc: Work around bogus 'uninitialised variable' warning
From: Ben Hutchings @ 2012-07-11 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

With some gcc versions & optimisations, the compiler will warn that
'depth' in efx_filter_insert_filter() may be used without being
initialised, although this is not the case.

This is related to inlining of efx_filter_search(), which only has
one caller since commit 8db182f4a8a6e2dcb8b65905ea4af56210e65430
('sfc: Remove now-unused filter function').

Shut the compiler up by initialising it to 0.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/filter.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/filter.c b/drivers/net/ethernet/sfc/filter.c
index fea7f73..c3fd61f 100644
--- a/drivers/net/ethernet/sfc/filter.c
+++ b/drivers/net/ethernet/sfc/filter.c
@@ -662,7 +662,7 @@ s32 efx_filter_insert_filter(struct efx_nic *efx, struct efx_filter_spec *spec,
 	struct efx_filter_table *table = efx_filter_spec_table(state, spec);
 	struct efx_filter_spec *saved_spec;
 	efx_oword_t filter;
-	unsigned int filter_idx, depth;
+	unsigned int filter_idx, depth = 0;
 	u32 key;
 	int rc;
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 03/11] sfc: Use generic DMA API, not PCI-DMA API
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        |   10 ++--
 drivers/net/ethernet/sfc/net_driver.h |    2 +-
 drivers/net/ethernet/sfc/nic.c        |    8 ++--
 drivers/net/ethernet/sfc/rx.c         |   22 ++++----
 drivers/net/ethernet/sfc/tx.c         |   83 ++++++++++++++++-----------------
 5 files changed, 62 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index b95f2e1..70554a1 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1103,8 +1103,8 @@ static int efx_init_io(struct efx_nic *efx)
 	 * masks event though they reject 46 bit masks.
 	 */
 	while (dma_mask > 0x7fffffffUL) {
-		if (pci_dma_supported(pci_dev, dma_mask)) {
-			rc = pci_set_dma_mask(pci_dev, dma_mask);
+		if (dma_supported(&pci_dev->dev, dma_mask)) {
+			rc = dma_set_mask(&pci_dev->dev, dma_mask);
 			if (rc == 0)
 				break;
 		}
@@ -1117,10 +1117,10 @@ static int efx_init_io(struct efx_nic *efx)
 	}
 	netif_dbg(efx, probe, efx->net_dev,
 		  "using DMA mask %llx\n", (unsigned long long) dma_mask);
-	rc = pci_set_consistent_dma_mask(pci_dev, dma_mask);
+	rc = dma_set_coherent_mask(&pci_dev->dev, dma_mask);
 	if (rc) {
-		/* pci_set_consistent_dma_mask() is not *allowed* to
-		 * fail with a mask that pci_set_dma_mask() accepted,
+		/* dma_set_coherent_mask() is not *allowed* to
+		 * fail with a mask that dma_set_mask() accepted,
 		 * but just in case...
 		 */
 		netif_err(efx, probe, efx->net_dev,
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 0e57535..8a9f6d4 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -100,7 +100,7 @@ struct efx_special_buffer {
  * @len: Length of this fragment.
  *	This field is zero when the queue slot is empty.
  * @continuation: True if this fragment is not the end of a packet.
- * @unmap_single: True if pci_unmap_single should be used.
+ * @unmap_single: True if dma_unmap_single should be used.
  * @unmap_len: Length of this fragment to unmap
  */
 struct efx_tx_buffer {
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 4a9a5be..287738d 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -308,8 +308,8 @@ efx_free_special_buffer(struct efx_nic *efx, struct efx_special_buffer *buffer)
 int efx_nic_alloc_buffer(struct efx_nic *efx, struct efx_buffer *buffer,
 			 unsigned int len)
 {
-	buffer->addr = pci_alloc_consistent(efx->pci_dev, len,
-					    &buffer->dma_addr);
+	buffer->addr = dma_alloc_coherent(&efx->pci_dev->dev, len,
+					  &buffer->dma_addr, GFP_ATOMIC);
 	if (!buffer->addr)
 		return -ENOMEM;
 	buffer->len = len;
@@ -320,8 +320,8 @@ int efx_nic_alloc_buffer(struct efx_nic *efx, struct efx_buffer *buffer,
 void efx_nic_free_buffer(struct efx_nic *efx, struct efx_buffer *buffer)
 {
 	if (buffer->addr) {
-		pci_free_consistent(efx->pci_dev, buffer->len,
-				    buffer->addr, buffer->dma_addr);
+		dma_free_coherent(&efx->pci_dev->dev, buffer->len,
+				  buffer->addr, buffer->dma_addr);
 		buffer->addr = NULL;
 	}
 }
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 243e91f..6d1c6cf 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -155,11 +155,11 @@ static int efx_init_rx_buffers_skb(struct efx_rx_queue *rx_queue)
 		rx_buf->len = skb_len - NET_IP_ALIGN;
 		rx_buf->flags = 0;
 
-		rx_buf->dma_addr = pci_map_single(efx->pci_dev,
+		rx_buf->dma_addr = dma_map_single(&efx->pci_dev->dev,
 						  skb->data, rx_buf->len,
-						  PCI_DMA_FROMDEVICE);
-		if (unlikely(pci_dma_mapping_error(efx->pci_dev,
-						   rx_buf->dma_addr))) {
+						  DMA_FROM_DEVICE);
+		if (unlikely(dma_mapping_error(&efx->pci_dev->dev,
+					       rx_buf->dma_addr))) {
 			dev_kfree_skb_any(skb);
 			rx_buf->u.skb = NULL;
 			return -EIO;
@@ -200,10 +200,10 @@ static int efx_init_rx_buffers_page(struct efx_rx_queue *rx_queue)
 				   efx->rx_buffer_order);
 		if (unlikely(page == NULL))
 			return -ENOMEM;
-		dma_addr = pci_map_page(efx->pci_dev, page, 0,
+		dma_addr = dma_map_page(&efx->pci_dev->dev, page, 0,
 					efx_rx_buf_size(efx),
-					PCI_DMA_FROMDEVICE);
-		if (unlikely(pci_dma_mapping_error(efx->pci_dev, dma_addr))) {
+					DMA_FROM_DEVICE);
+		if (unlikely(dma_mapping_error(&efx->pci_dev->dev, dma_addr))) {
 			__free_pages(page, efx->rx_buffer_order);
 			return -EIO;
 		}
@@ -247,14 +247,14 @@ static void efx_unmap_rx_buffer(struct efx_nic *efx,
 
 		state = page_address(rx_buf->u.page);
 		if (--state->refcnt == 0) {
-			pci_unmap_page(efx->pci_dev,
+			dma_unmap_page(&efx->pci_dev->dev,
 				       state->dma_addr,
 				       efx_rx_buf_size(efx),
-				       PCI_DMA_FROMDEVICE);
+				       DMA_FROM_DEVICE);
 		}
 	} else if (!(rx_buf->flags & EFX_RX_BUF_PAGE) && rx_buf->u.skb) {
-		pci_unmap_single(efx->pci_dev, rx_buf->dma_addr,
-				 rx_buf->len, PCI_DMA_FROMDEVICE);
+		dma_unmap_single(&efx->pci_dev->dev, rx_buf->dma_addr,
+				 rx_buf->len, DMA_FROM_DEVICE);
 	}
 }
 
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 94d0365..18860f2 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -36,15 +36,15 @@ static void efx_dequeue_buffer(struct efx_tx_queue *tx_queue,
 			       unsigned int *bytes_compl)
 {
 	if (buffer->unmap_len) {
-		struct pci_dev *pci_dev = tx_queue->efx->pci_dev;
+		struct device *dma_dev = &tx_queue->efx->pci_dev->dev;
 		dma_addr_t unmap_addr = (buffer->dma_addr + buffer->len -
 					 buffer->unmap_len);
 		if (buffer->unmap_single)
-			pci_unmap_single(pci_dev, unmap_addr, buffer->unmap_len,
-					 PCI_DMA_TODEVICE);
+			dma_unmap_single(dma_dev, unmap_addr, buffer->unmap_len,
+					 DMA_TO_DEVICE);
 		else
-			pci_unmap_page(pci_dev, unmap_addr, buffer->unmap_len,
-				       PCI_DMA_TODEVICE);
+			dma_unmap_page(dma_dev, unmap_addr, buffer->unmap_len,
+				       DMA_TO_DEVICE);
 		buffer->unmap_len = 0;
 		buffer->unmap_single = false;
 	}
@@ -138,7 +138,7 @@ efx_max_tx_len(struct efx_nic *efx, dma_addr_t dma_addr)
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 {
 	struct efx_nic *efx = tx_queue->efx;
-	struct pci_dev *pci_dev = efx->pci_dev;
+	struct device *dma_dev = &efx->pci_dev->dev;
 	struct efx_tx_buffer *buffer;
 	skb_frag_t *fragment;
 	unsigned int len, unmap_len = 0, fill_level, insert_ptr;
@@ -167,17 +167,17 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	fill_level = tx_queue->insert_count - tx_queue->old_read_count;
 	q_space = efx->txq_entries - 1 - fill_level;
 
-	/* Map for DMA.  Use pci_map_single rather than pci_map_page
+	/* Map for DMA.  Use dma_map_single rather than dma_map_page
 	 * since this is more efficient on machines with sparse
 	 * memory.
 	 */
 	unmap_single = true;
-	dma_addr = pci_map_single(pci_dev, skb->data, len, PCI_DMA_TODEVICE);
+	dma_addr = dma_map_single(dma_dev, skb->data, len, PCI_DMA_TODEVICE);
 
 	/* Process all fragments */
 	while (1) {
-		if (unlikely(pci_dma_mapping_error(pci_dev, dma_addr)))
-			goto pci_err;
+		if (unlikely(dma_mapping_error(dma_dev, dma_addr)))
+			goto dma_err;
 
 		/* Store fields for marking in the per-fragment final
 		 * descriptor */
@@ -246,7 +246,7 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 		i++;
 		/* Map for DMA */
 		unmap_single = false;
-		dma_addr = skb_frag_dma_map(&pci_dev->dev, fragment, 0, len,
+		dma_addr = skb_frag_dma_map(dma_dev, fragment, 0, len,
 					    DMA_TO_DEVICE);
 	}
 
@@ -261,7 +261,7 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 
 	return NETDEV_TX_OK;
 
- pci_err:
+ dma_err:
 	netif_err(efx, tx_err, efx->net_dev,
 		  " TX queue %d could not map skb with %d bytes %d "
 		  "fragments for DMA\n", tx_queue->queue, skb->len,
@@ -284,11 +284,11 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	/* Free the fragment we were mid-way through pushing */
 	if (unmap_len) {
 		if (unmap_single)
-			pci_unmap_single(pci_dev, unmap_addr, unmap_len,
-					 PCI_DMA_TODEVICE);
+			dma_unmap_single(dma_dev, unmap_addr, unmap_len,
+					 DMA_TO_DEVICE);
 		else
-			pci_unmap_page(pci_dev, unmap_addr, unmap_len,
-				       PCI_DMA_TODEVICE);
+			dma_unmap_page(dma_dev, unmap_addr, unmap_len,
+				       DMA_TO_DEVICE);
 	}
 
 	return rc;
@@ -684,20 +684,19 @@ static __be16 efx_tso_check_protocol(struct sk_buff *skb)
  */
 static int efx_tsoh_block_alloc(struct efx_tx_queue *tx_queue)
 {
-
-	struct pci_dev *pci_dev = tx_queue->efx->pci_dev;
+	struct device *dma_dev = &tx_queue->efx->pci_dev->dev;
 	struct efx_tso_header *tsoh;
 	dma_addr_t dma_addr;
 	u8 *base_kva, *kva;
 
-	base_kva = pci_alloc_consistent(pci_dev, PAGE_SIZE, &dma_addr);
+	base_kva = dma_alloc_coherent(dma_dev, PAGE_SIZE, &dma_addr, GFP_ATOMIC);
 	if (base_kva == NULL) {
 		netif_err(tx_queue->efx, tx_err, tx_queue->efx->net_dev,
 			  "Unable to allocate page for TSO headers\n");
 		return -ENOMEM;
 	}
 
-	/* pci_alloc_consistent() allocates pages. */
+	/* dma_alloc_coherent() allocates pages. */
 	EFX_BUG_ON_PARANOID(dma_addr & (PAGE_SIZE - 1u));
 
 	for (kva = base_kva; kva < base_kva + PAGE_SIZE; kva += TSOH_STD_SIZE) {
@@ -714,7 +713,7 @@ static int efx_tsoh_block_alloc(struct efx_tx_queue *tx_queue)
 /* Free up a TSO header, and all others in the same page. */
 static void efx_tsoh_block_free(struct efx_tx_queue *tx_queue,
 				struct efx_tso_header *tsoh,
-				struct pci_dev *pci_dev)
+				struct device *dma_dev)
 {
 	struct efx_tso_header **p;
 	unsigned long base_kva;
@@ -731,7 +730,7 @@ static void efx_tsoh_block_free(struct efx_tx_queue *tx_queue,
 			p = &(*p)->next;
 	}
 
-	pci_free_consistent(pci_dev, PAGE_SIZE, (void *)base_kva, base_dma);
+	dma_free_coherent(dma_dev, PAGE_SIZE, (void *)base_kva, base_dma);
 }
 
 static struct efx_tso_header *
@@ -743,11 +742,11 @@ efx_tsoh_heap_alloc(struct efx_tx_queue *tx_queue, size_t header_len)
 	if (unlikely(!tsoh))
 		return NULL;
 
-	tsoh->dma_addr = pci_map_single(tx_queue->efx->pci_dev,
+	tsoh->dma_addr = dma_map_single(&tx_queue->efx->pci_dev->dev,
 					TSOH_BUFFER(tsoh), header_len,
-					PCI_DMA_TODEVICE);
-	if (unlikely(pci_dma_mapping_error(tx_queue->efx->pci_dev,
-					   tsoh->dma_addr))) {
+					DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(&tx_queue->efx->pci_dev->dev,
+				       tsoh->dma_addr))) {
 		kfree(tsoh);
 		return NULL;
 	}
@@ -759,9 +758,9 @@ efx_tsoh_heap_alloc(struct efx_tx_queue *tx_queue, size_t header_len)
 static void
 efx_tsoh_heap_free(struct efx_tx_queue *tx_queue, struct efx_tso_header *tsoh)
 {
-	pci_unmap_single(tx_queue->efx->pci_dev,
+	dma_unmap_single(&tx_queue->efx->pci_dev->dev,
 			 tsoh->dma_addr, tsoh->unmap_len,
-			 PCI_DMA_TODEVICE);
+			 DMA_TO_DEVICE);
 	kfree(tsoh);
 }
 
@@ -892,13 +891,13 @@ static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue)
 			unmap_addr = (buffer->dma_addr + buffer->len -
 				      buffer->unmap_len);
 			if (buffer->unmap_single)
-				pci_unmap_single(tx_queue->efx->pci_dev,
+				dma_unmap_single(&tx_queue->efx->pci_dev->dev,
 						 unmap_addr, buffer->unmap_len,
-						 PCI_DMA_TODEVICE);
+						 DMA_TO_DEVICE);
 			else
-				pci_unmap_page(tx_queue->efx->pci_dev,
+				dma_unmap_page(&tx_queue->efx->pci_dev->dev,
 					       unmap_addr, buffer->unmap_len,
-					       PCI_DMA_TODEVICE);
+					       DMA_TO_DEVICE);
 			buffer->unmap_len = 0;
 		}
 		buffer->len = 0;
@@ -954,9 +953,9 @@ static int tso_get_head_fragment(struct tso_state *st, struct efx_nic *efx,
 	int hl = st->header_len;
 	int len = skb_headlen(skb) - hl;
 
-	st->unmap_addr = pci_map_single(efx->pci_dev, skb->data + hl,
-					len, PCI_DMA_TODEVICE);
-	if (likely(!pci_dma_mapping_error(efx->pci_dev, st->unmap_addr))) {
+	st->unmap_addr = dma_map_single(&efx->pci_dev->dev, skb->data + hl,
+					len, DMA_TO_DEVICE);
+	if (likely(!dma_mapping_error(&efx->pci_dev->dev, st->unmap_addr))) {
 		st->unmap_single = true;
 		st->unmap_len = len;
 		st->in_len = len;
@@ -1008,7 +1007,7 @@ static int tso_fill_packet_with_fragment(struct efx_tx_queue *tx_queue,
 		buffer->continuation = !end_of_packet;
 
 		if (st->in_len == 0) {
-			/* Transfer ownership of the pci mapping */
+			/* Transfer ownership of the DMA mapping */
 			buffer->unmap_len = st->unmap_len;
 			buffer->unmap_single = st->unmap_single;
 			st->unmap_len = 0;
@@ -1181,18 +1180,18 @@ static int efx_enqueue_skb_tso(struct efx_tx_queue *tx_queue,
 
  mem_err:
 	netif_err(efx, tx_err, efx->net_dev,
-		  "Out of memory for TSO headers, or PCI mapping error\n");
+		  "Out of memory for TSO headers, or DMA mapping error\n");
 	dev_kfree_skb_any(skb);
 
  unwind:
 	/* Free the DMA mapping we were in the process of writing out */
 	if (state.unmap_len) {
 		if (state.unmap_single)
-			pci_unmap_single(efx->pci_dev, state.unmap_addr,
-					 state.unmap_len, PCI_DMA_TODEVICE);
+			dma_unmap_single(&efx->pci_dev->dev, state.unmap_addr,
+					 state.unmap_len, DMA_TO_DEVICE);
 		else
-			pci_unmap_page(efx->pci_dev, state.unmap_addr,
-				       state.unmap_len, PCI_DMA_TODEVICE);
+			dma_unmap_page(&efx->pci_dev->dev, state.unmap_addr,
+				       state.unmap_len, DMA_TO_DEVICE);
 	}
 
 	efx_enqueue_unwind(tx_queue);
@@ -1216,5 +1215,5 @@ static void efx_fini_tso(struct efx_tx_queue *tx_queue)
 
 	while (tx_queue->tso_headers_free != NULL)
 		efx_tsoh_block_free(tx_queue, tx_queue->tso_headers_free,
-				    tx_queue->efx->pci_dev);
+				    &tx_queue->efx->pci_dev->dev);
 }
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 04/11] sfc: Remove dead write to tso_state::packet_space
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

tso_state::packet_space is always set in tso_start_packet(); the
value set in tso_start() is not used, and is also incorrect.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/tx.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 18860f2..cfa5f6d 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -926,7 +926,6 @@ static void tso_start(struct tso_state *st, const struct sk_buff *skb)
 	EFX_BUG_ON_PARANOID(tcp_hdr(skb)->syn);
 	EFX_BUG_ON_PARANOID(tcp_hdr(skb)->rst);
 
-	st->packet_space = st->full_packet_size;
 	st->out_len = skb->len - st->header_len;
 	st->unmap_len = 0;
 	st->unmap_single = false;
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 05/11] sfc: Stop changing header offsets on TX
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

There is nothing in the VLAN driver or core VLAN support that
invalidates the TCP and IP header offsets.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/tx.c |    9 ---------
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index cfa5f6d..9b225a7 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -651,17 +651,8 @@ static __be16 efx_tso_check_protocol(struct sk_buff *skb)
 	EFX_BUG_ON_PARANOID(((struct ethhdr *)skb->data)->h_proto !=
 			    protocol);
 	if (protocol == htons(ETH_P_8021Q)) {
-		/* Find the encapsulated protocol; reset network header
-		 * and transport header based on that. */
 		struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
 		protocol = veh->h_vlan_encapsulated_proto;
-		skb_set_network_header(skb, sizeof(*veh));
-		if (protocol == htons(ETH_P_IP))
-			skb_set_transport_header(skb, sizeof(*veh) +
-						 4 * ip_hdr(skb)->ihl);
-		else if (protocol == htons(ETH_P_IPV6))
-			skb_set_transport_header(skb, sizeof(*veh) +
-						 sizeof(struct ipv6hdr));
 	}
 
 	if (protocol == htons(ETH_P_IP)) {
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 06/11] sfc: Use strlcpy() to copy ethtool stats names
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Fix CID 113703 in the Coverity report on Linux.

ethtool stats names are limited to 32 bytes including a null
terminator.  Use strlcpy() to ensure that we will always include the
null terminator even if a source string becomes longer than this.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index 03ded36..10536f9 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -453,7 +453,7 @@ static void efx_ethtool_get_strings(struct net_device *net_dev,
 	switch (string_set) {
 	case ETH_SS_STATS:
 		for (i = 0; i < EFX_ETHTOOL_NUM_STATS; i++)
-			strncpy(ethtool_strings[i].name,
+			strlcpy(ethtool_strings[i].name,
 				efx_ethtool_stats[i].name,
 				sizeof(ethtool_strings[i].name));
 		break;
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 07/11] sfc: Use dev_kfree_skb() in efx_end_loopback()
From: Ben Hutchings @ 2012-07-11 23:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Fix CID 102619 in the Coverity report on Linux.

efx_end_loopback() iterates over an array of skb pointers of which
some may be null (if efx_begin_loopback() failed).  It should not use
dev_kfree_skb_irq(), which requires non-null pointers.  In practice
this is safe because it does not run in interrupt context and
therefore always ends up calling dev_kfree_skb(), which does allow
null pointers.  But we should make that explicit.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/selftest.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/selftest.c b/drivers/net/ethernet/sfc/selftest.c
index de4c006..ccc428f 100644
--- a/drivers/net/ethernet/sfc/selftest.c
+++ b/drivers/net/ethernet/sfc/selftest.c
@@ -488,7 +488,7 @@ static int efx_end_loopback(struct efx_tx_queue *tx_queue,
 		skb = state->skbs[i];
 		if (skb && !skb_shared(skb))
 			++tx_done;
-		dev_kfree_skb_any(skb);
+		dev_kfree_skb(skb);
 	}
 
 	netif_tx_unlock_bh(efx->net_dev);
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox