* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 13:07 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=hHDPnvRFcQ0w+D=AP+QK6ic4X=tva6Yw_XGwuTbAYjhQ@mail.gmail.com>
2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
>> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>>
>>> Well, well, well, after having tested several configurations, several
>>> drivers, I have a big difference between an old 2.6.26 kernel and a
>>> newer one (I tried 3.2 and 3.4).
>>>
>>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>>> set to 4096. I am sending packets only, nothing on RX.
>>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>>> kernel, but a lot of drops with a newer kernel.
>>> So, I don't know if I missed something in my kernel configuration, but
>>> I have used the 2.6.26 one as a reference, in order to set the same
>>> options (DMA related, etc).
>>>
>>> I easily reproduce this problem and setting a bigger txqueuelen solves
>>> it partially.
>>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>>
>>> If you have any idea, I am interested, as this is a big issue for my use case.
>>>
>>
>> Yep.
>>
>> This driver wants to limit number of tx completions, thats just wrong.
>>
>> Fix and dirty patch:
>>
>>
>> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
>> index c5c4c0e..1e8f8a6 100644
>> --- a/drivers/net/ethernet/emulex/benet/be.h
>> +++ b/drivers/net/ethernet/emulex/benet/be.h
>> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>> #define MAX_TX_QS 8
>> #define MAX_ROCE_EQS 5
>> #define MAX_MSIX_VECTORS (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
>> -#define BE_TX_BUDGET 256
>> +#define BE_TX_BUDGET 65535
>> #define BE_NAPI_WEIGHT 64
>> #define MAX_RX_POST BE_NAPI_WEIGHT /* Frags posted at a time */
>> #define RX_FRAGS_REFILL_WM (RX_Q_LEN - MAX_RX_POST)
>>
>
> I will try that in a few minutes.
> I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
> a broadcom (bnx2x).
>
And it is not really better, still need about 18000 at 2.4Gbps in
order to avoid drops...
I really think there is something in the networking stack or in my
configuration (DMA ? Something else ?)...
As it doesn't seem to be driver related as I said...
JM
^ permalink raw reply
* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 13:10 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606111357.GA15070@redhat.com>
On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> We currently do all stats either on napi callback or from
> start_xmit callback.
> This makes them safe, yes?
Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
include/linux/u64_stats_sync.h section 6)
* 6) If counter might be written by an interrupt, readers should block interrupts.
* (On UP, there is no seqcount_t protection, a reader allowing interrupts could
* read partial values)
Yes, its tricky...
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..705aaa7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
u64 tpackets, tbytes, rpackets, rbytes;
do {
- start = u64_stats_fetch_begin(&stats->syncp);
+ start = u64_stats_fetch_begin_bh(&stats->syncp);
tpackets = stats->tx_packets;
tbytes = stats->tx_bytes;
rpackets = stats->rx_packets;
rbytes = stats->rx_bytes;
- } while (u64_stats_fetch_retry(&stats->syncp, start));
+ } while (u64_stats_fetch_retry_bh(&stats->syncp, start));
tot->rx_packets += rpackets;
tot->tx_packets += tpackets;
^ permalink raw reply related
* [PATCH 1/1] block/nbd: micro-optimization in nbd request completion
From: Chetan Loke @ 2012-06-06 14:15 UTC (permalink / raw)
To: Paul.Clements, axboe, linux-kernel; +Cc: netdev, Chetan Loke
Add in-flight cmds to the tail. That way while searching(during request completion),we will always get a hit on the first element.
Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
drivers/block/nbd.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 061427a..8957b9f 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -481,7 +481,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req)
nbd_end_request(req);
} else {
spin_lock(&nbd->queue_lock);
- list_add(&req->queuelist, &nbd->queue_head);
+ list_add_tail(&req->queuelist, &nbd->queue_head);
spin_unlock(&nbd->queue_lock);
}
--
1.7.5.2
^ permalink raw reply related
* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 14:36 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=jGHo82mo-s8Tfs9LWzfu2GkrS4eZJoeOpHhpXHMr6csg@mail.gmail.com>
2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
>> 2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
>>> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>>>
>>>> Well, well, well, after having tested several configurations, several
>>>> drivers, I have a big difference between an old 2.6.26 kernel and a
>>>> newer one (I tried 3.2 and 3.4).
>>>>
>>>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>>>> set to 4096. I am sending packets only, nothing on RX.
>>>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>>>> kernel, but a lot of drops with a newer kernel.
>>>> So, I don't know if I missed something in my kernel configuration, but
>>>> I have used the 2.6.26 one as a reference, in order to set the same
>>>> options (DMA related, etc).
>>>>
>>>> I easily reproduce this problem and setting a bigger txqueuelen solves
>>>> it partially.
>>>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>>>
>>>> If you have any idea, I am interested, as this is a big issue for my use case.
>>>>
>>>
>>> Yep.
>>>
>>> This driver wants to limit number of tx completions, thats just wrong.
>>>
>>> Fix and dirty patch:
>>>
>>>
>>> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
>>> index c5c4c0e..1e8f8a6 100644
>>> --- a/drivers/net/ethernet/emulex/benet/be.h
>>> +++ b/drivers/net/ethernet/emulex/benet/be.h
>>> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>>> #define MAX_TX_QS 8
>>> #define MAX_ROCE_EQS 5
>>> #define MAX_MSIX_VECTORS (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
>>> -#define BE_TX_BUDGET 256
>>> +#define BE_TX_BUDGET 65535
>>> #define BE_NAPI_WEIGHT 64
>>> #define MAX_RX_POST BE_NAPI_WEIGHT /* Frags posted at a time */
>>> #define RX_FRAGS_REFILL_WM (RX_Q_LEN - MAX_RX_POST)
>>>
>>
>> I will try that in a few minutes.
>> I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
>> a broadcom (bnx2x).
>>
>
> And it is not really better, still need about 18000 at 2.4Gbps in
> order to avoid drops...
> I really think there is something in the networking stack or in my
> configuration (DMA ? Something else ?)...
> As it doesn't seem to be driver related as I said...
>
If it can help, on a 3.0 kernel a txqueuelen of 9000 is sufficient in
order to get this bandwith on TX.
JM
^ permalink raw reply
* Re: [PATCH] ip.7: Improve explanation about calling listen or connect
From: Flavio Leitner @ 2012-06-06 14:44 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Peter Schiffer, linux-man-u79uwXL29TY76Z2rM5mHXA, netdev
In-Reply-To: <4FBF66D8.7060007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Hi,
Could someone tell me what's the patch current state?
It has been a month already with no feedback.
thanks,
fbl
On Fri, 25 May 2012 13:02:48 +0200
Peter Schiffer <pschiffe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Hi Michael,
>
> do you have any comments for this update? Or do you need some supporting
> info?
>
> peter
>
> On 05/09/2012 02:30 PM, Flavio Leitner wrote:
> > Signed-off-by: Flavio Leitner<fbl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> > man7/ip.7 | 15 +++++++++------
> > 1 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/man7/ip.7 b/man7/ip.7
> > index 9f560df..84fe32d 100644
> > --- a/man7/ip.7
> > +++ b/man7/ip.7
> > @@ -69,12 +69,11 @@ For
> > you may specify a valid IANA IP protocol defined in
> > RFC\ 1700 assigned numbers.
> > .PP
> > -.\" FIXME ip current does an autobind in listen, but I'm not sure
> > -.\" if that should be documented.
> > When a process wants to receive new incoming packets or connections, it
> > should bind a socket to a local interface address using
> > .BR bind (2).
> > -Only one IP socket may be bound to any given local (address, port) pair.
> > +In this case, only one IP socket may be bound to any given local
> > +(address, port) pair.
> > When
> > .B INADDR_ANY
> > is specified in the bind call, the socket will be bound to
> > @@ -82,10 +81,14 @@ is specified in the bind call, the socket will be bound to
> > local interfaces.
> > When
> > .BR listen (2)
> > -or
> > +is called on an unbound socket, the socket is automatically bound
> > +to a random free port with the local address set to
> > +.BR INADDR_ANY .
> > +When
> > .BR connect (2)
> > -are called on an unbound socket, it is automatically bound to a
> > -random free port with the local address set to
> > +is called on an unbound socket, the socket is automatically bound
> > +to a random free port or an usable shared port with the local address
> > +set to
> > .BR INADDR_ANY .
> >
> > A TCP local socket address that has been bound is unavailable for
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Possible deadlock in ipv6?
From: Vladimir Davydov @ 2012-06-06 14:49 UTC (permalink / raw)
To: netdev
I'm not familiar with the linux net subsystem, so I would appreciate if
someone could clarify if the following call chain is possible:
addrconf_ifdown() calls neigh_ifdown(nd_tbl) which locks nd_tbl.lock for
writing and calls
pneigh_ifdown
pndisc_destructor
ipv6_dev_mc_dec
__ipv6_dev_mc_dec
igmp6_group_dropped
igmp6_leave_group
igmp6_send
icmp6_dst_alloc
ip6_neigh_lookup
neigh_create
and neigh_create() locks nd_tbl.lock for writing again resulting in a
deadlock.
Thank you.
^ permalink raw reply
* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 14:49 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338988210.2760.4485.camel@edumazet-glaptop>
On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
>
> > We currently do all stats either on napi callback or from
> > start_xmit callback.
> > This makes them safe, yes?
>
> Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> include/linux/u64_stats_sync.h section 6)
>
> * 6) If counter might be written by an interrupt, readers should block interrupts.
> * (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> * read partial values)
>
> Yes, its tricky...
Sounds good, but I have a question: this realies on counters
being atomic on 64 bit.
Would not it be better to always use a seqlock even on 64 bit?
This way counters would actually be correct and in sync.
As it is if we want e.g. average packet size,
we can not rely e.g. on it being bytes/packets.
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..705aaa7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
> u64 tpackets, tbytes, rpackets, rbytes;
>
> do {
> - start = u64_stats_fetch_begin(&stats->syncp);
> + start = u64_stats_fetch_begin_bh(&stats->syncp);
> tpackets = stats->tx_packets;
> tbytes = stats->tx_bytes;
> rpackets = stats->rx_packets;
> rbytes = stats->rx_bytes;
> - } while (u64_stats_fetch_retry(&stats->syncp, start));
> + } while (u64_stats_fetch_retry_bh(&stats->syncp, start));
>
> tot->rx_packets += rpackets;
> tot->tx_packets += tpackets;
>
^ permalink raw reply
* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Stephen Hemminger @ 2012-06-06 15:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eric Dumazet, Jason Wang, netdev, rusty, linux-kernel,
virtualization
In-Reply-To: <20120606144941.GA17092@redhat.com>
On Wed, 6 Jun 2012 17:49:42 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.
This has not been a requirement on real physical devices; therefore
the added overhead is not really justified.
Many network cards use counters in hardware to count packets/bytes
and there is no expectation of atomic access there.
^ permalink raw reply
* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 15:19 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606144941.GA17092@redhat.com>
On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> >
> > > We currently do all stats either on napi callback or from
> > > start_xmit callback.
> > > This makes them safe, yes?
> >
> > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> > include/linux/u64_stats_sync.h section 6)
> >
> > * 6) If counter might be written by an interrupt, readers should block interrupts.
> > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> > * read partial values)
> >
> > Yes, its tricky...
>
> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.
When this stuff was discussed, we chose to have a nop on 64bits.
Your point has little to do with 64bit stats, it was already like that
with 'long int' counters.
Consider average driver doing :
dev->stats.rx_bytes += skb->len;
dev->stats.rx_packets++;
A concurrent reader can read an updated rx_bytes and a 'previous'
rx_packets one.
'fixing' this requires a lot of work and memory barriers (in all
drivers), for a very litle gain (at most one packet error)
u64_stats_sync was really meant to be 0-cost on 64bit arches.
^ permalink raw reply
* Re: [net-next PATCH v2 1/3] Added kernel support in EEE Ethtool commands
From: Ben Hutchings @ 2012-06-06 15:20 UTC (permalink / raw)
To: Yuval Mintz; +Cc: davem, netdev, eilong, peppe.cavallaro
In-Reply-To: <1338973098-16439-2-git-send-email-yuvalmin@broadcom.com>
On Wed, 2012-06-06 at 11:58 +0300, Yuval Mintz wrote:
> This patch extends the kernel's ethtool interface by adding support
> for 2 new EEE commands - get_eee and set_eee.
>
> Thanks goes to Giuseppe Cavallaro for his original patch adding this support.
>
> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
> include/linux/ethtool.h | 32 ++++++++++++++++++++++++++++++++
> net/core/ethtool.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 72 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index e17fa71..6250e1f 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -137,6 +137,32 @@ struct ethtool_eeprom {
> };
>
> /**
> + * struct ethtool_eee - Energy Efficient Ethernet information
> + * @cmd: ETHTOOL_{G,S}EEE
> + * @supported: Link speeds for which there is eee support.
> + * @advertised: Link speeds the interface advertises (AN) as eee capable.
> + * @lp_advertised: Link speeds the link partner advertised as eee capable.
And these are bitmasks of SUPPORTED_* & ADVERTISED_* flags, right?
Maybe 'link modes' not 'link speeds'?
Otherwise, this all looks good to me (with limited knowledge of EEE).
Ben.
> + * @eee_active: Result of the eee auto negotiation.
> + * @eee_enabled: EEE configured mode (enabled/disabled).
> + * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given
> + * that eee was negotiated.
> + * @tx_lpi_timer: Time in microseconds the interface delays prior to asserting
> + * its tx lpi (after reaching 'idle' state). Effective only when eee
> + * was negotiated and tx_lpi_enabled was set.
> + */
> +struct ethtool_eee {
[...]
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH 1/2] Ethtool: Add EEE support
From: Ben Hutchings @ 2012-06-06 15:48 UTC (permalink / raw)
To: Yuval Mintz; +Cc: netdev, eilong, peppe.cavallaro
In-Reply-To: <1338878470-24784-2-git-send-email-yuvalmin@broadcom.com>
On Tue, 2012-06-05 at 09:41 +0300, Yuval Mintz wrote:
> This patch adds 2 new ethtool commands which can be
> used to manipulate network interfaces' support in
> EEE.
>
> Output of 'get' has the following form:
>
> EEE Settings for p2p1:
> EEE status: enabled - active
> Tx LPI: 1000 (u)
> Supported EEE link modes: 10000baseT/Full
> Advertised EEE link modes: 10000baseT/Full
> Link partner advertised EEE link modes: 10000baseT/Full
>
> Thanks goes to Giuseppe Cavallaro for his original patch.
[...]
> diff --git a/ethtool.c b/ethtool.c
> index f18f611..063e72b 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -359,7 +359,8 @@ static int do_version(struct cmd_context *ctx)
> return 0;
> }
>
> -static void dump_link_caps(const char *prefix, const char *an_prefix, u32 mask);
> +static void dump_link_caps(const char *prefix, const char *an_prefix, u32 mask,
> + u8 all);
For now, use int for booleans. At some point I would like to see a
thorough cleanup of ethtool to use bool where appropriate - but that's
independent of this.
[...]
> /* Print link capability flags (supported, advertised or lp_advertised).
> * Assumes that the corresponding SUPPORTED and ADVERTISED flags are equal.
> */
> static void
> -dump_link_caps(const char *prefix, const char *an_prefix, u32 mask)
> +dump_link_caps(const char *prefix, const char *an_prefix, u32 mask, u8 all)
> {
> int indent;
> int did1;
> @@ -456,24 +457,26 @@ dump_link_caps(const char *prefix, const char *an_prefix, u32 mask)
> fprintf(stdout, "Not reported");
> fprintf(stdout, "\n");
>
> - fprintf(stdout, " %s pause frame use: ", prefix);
> - if (mask & ADVERTISED_Pause) {
> - fprintf(stdout, "Symmetric");
> - if (mask & ADVERTISED_Asym_Pause)
> - fprintf(stdout, " Receive-only");
> - fprintf(stdout, "\n");
> - } else {
> - if (mask & ADVERTISED_Asym_Pause)
> - fprintf(stdout, "Transmit-only\n");
> + if (all) {
It might be clearer to invert this flag and name it something like
'link_mode_only'.
> + fprintf(stdout, " %s pause frame use: ", prefix);
> + if (mask & ADVERTISED_Pause) {
> + fprintf(stdout, "Symmetric");
> + if (mask & ADVERTISED_Asym_Pause)
> + fprintf(stdout, " Receive-only");
> + fprintf(stdout, "\n");
> + } else {
> + if (mask & ADVERTISED_Asym_Pause)
> + fprintf(stdout, "Transmit-only\n");
> + else
> + fprintf(stdout, "No\n");
> + }
> +
> + fprintf(stdout, " %s auto-negotiation: ", an_prefix);
> + if (mask & ADVERTISED_Autoneg)
> + fprintf(stdout, "Yes\n");
> else
> fprintf(stdout, "No\n");
> }
> -
> - fprintf(stdout, " %s auto-negotiation: ", an_prefix);
> - if (mask & ADVERTISED_Autoneg)
> - fprintf(stdout, "Yes\n");
> - else
> - fprintf(stdout, "No\n");
> }
>
> static int dump_ecmd(struct ethtool_cmd *ep)
[...]
> @@ -1116,6 +1120,36 @@ static int dump_rxfhash(int fhash, u64 val)
> return 0;
> }
>
> +static int dump_eeecmd(struct ethtool_eee *ep)
Is there any reason for this not to return void?
> +{
> +
> + fprintf(stdout, " EEE status: ");
> + if (!ep->supported) {
> + fprintf(stdout, "not supported\n");
> + return 0;
> + } else if (!ep->eee_enabled) {
> + fprintf(stdout, "disabled\n");
> + } else {
> + fprintf(stdout, "enabled - ");
> + if (ep->eee_active)
> + fprintf(stdout, "active\n");
> + else
> + fprintf(stdout, "inactive\n");
> + }
> +
> + fprintf(stdout, " Tx LPI:");
> + if (ep->tx_lpi_enabled)
> + fprintf(stdout, " %d (u)\n", ep->tx_lpi_timer);
"us" not "(u)"
> + else
> + fprintf(stdout, " disabled\n");
> +
> + dump_link_caps("Supported EEE", "", ep->supported, 0);
> + dump_link_caps("Advertised EEE", "", ep->advertised, 0);
> + dump_link_caps("Link partner advertised EEE", "", ep->lp_advertised, 0);
> +
> + return 0;
> +}
> +
[...]
> +static int do_seee(struct cmd_context *ctx)
> +{
> + int adv_c = -1, lpi_c = -1, lpi_time_c = -1, eee_c = -1;
> + int change = -1, change2 = -1;
> + struct ethtool_eee eeecmd;
> + struct cmdline_info cmdline_eee[] = {
> + { "advertise", CMDL_U32, &adv_c, &eeecmd.advertised },
> + { "tx-lpi", CMDL_BOOL, &lpi_c, &eeecmd.tx_lpi_enabled },
> + { "tx-timer", CMDL_U32, &lpi_time_c, &eeecmd.tx_lpi_timer},
> + { "eee", CMDL_BOOL, &eee_c, &eeecmd.eee_enabled},
> + };
> +
> + if (ctx->argc == 0)
> + exit_bad_args();
> +
> + parse_generic_cmdline(ctx, &change, cmdline_eee,
> + ARRAY_SIZE(cmdline_eee));
> +
> + eeecmd.cmd = ETHTOOL_GEEE;
> + if (send_ioctl(ctx, &eeecmd)) {
> + perror("Cannot get EEE settings");
> + return 1;
> + }
> +
> + do_generic_set(cmdline_eee, ARRAY_SIZE(cmdline_eee), &change2);
> +
> + if (change2) {
> +
> + eeecmd.cmd = ETHTOOL_SEEE;
> + if (send_ioctl(ctx, &eeecmd)) {
> + perror("Cannot set EEE settings");
> + return 1;
> + }
> + }
> +
> + return 1;
return 0, I think!
> +}
> +
> int send_ioctl(struct cmd_context *ctx, void *cmd)
> {
> #ifndef TEST_ETHTOOL
> @@ -3423,6 +3516,12 @@ static const struct option {
> " [ hex on|off ]\n"
> " [ offset N ]\n"
> " [ length N ]\n" },
> + { "--get-eee", 1, do_geee, "Get EEE settings"},
> + { "--set-eee", 1, do_seee, "Set EEE settings",
> + " [ eee on|off ]\n"
> + " [ advertise %x ]\n"
> + " [ tx-lpi on|off ]\n"
> + " [ tx-timer %x ]\n"},
The tx-timer value would normally be specified in decimal, so put "%d"
here.
> { "-h|--help", 0, show_usage, "Show this help" },
> { "--version", 0, do_version, "Show version number" },
> {}
You also need to add some test cases for the command line parsing in
test-cmdline.c.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: Possible deadlock in ipv6?
From: Eric Dumazet @ 2012-06-06 15:53 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: netdev
In-Reply-To: <4FCF6DF4.2090304@parallels.com>
On Wed, 2012-06-06 at 18:49 +0400, Vladimir Davydov wrote:
> I'm not familiar with the linux net subsystem, so I would appreciate if
> someone could clarify if the following call chain is possible:
>
> addrconf_ifdown() calls neigh_ifdown(nd_tbl) which locks nd_tbl.lock for
> writing and calls
>
> pneigh_ifdown
> pndisc_destructor
> ipv6_dev_mc_dec
> __ipv6_dev_mc_dec
> igmp6_group_dropped
> igmp6_leave_group
> igmp6_send
> icmp6_dst_alloc
> ip6_neigh_lookup
> neigh_create
>
> and neigh_create() locks nd_tbl.lock for writing again resulting in a
> deadlock.
It seems a deadlock is possible indeed, good catch !
^ permalink raw reply
* Re: [PATCH 2/2] Ethtool: add EEE to ethtool's documentation
From: Ben Hutchings @ 2012-06-06 15:56 UTC (permalink / raw)
To: Yuval Mintz; +Cc: netdev, eilong, peppe.cavallaro
In-Reply-To: <1338878470-24784-3-git-send-email-yuvalmin@broadcom.com>
On Tue, 2012-06-05 at 09:41 +0300, Yuval Mintz wrote:
> under Synopsis:
> ethtool --get-eee devname
> ethtool --set-eee devname [eee on|off] [tx-lpi on|off] [tx-timer N] [advertise N]
>
> under Options:
> --get-eee
> Queries the specified network device for its support in Efficient Energy Ethernet (ac-
> cording to the IEEE 802.3az specifications)
> --set-eee
> Sets the device EEE behaviour.
> eee on|off
> Enables/Disables the device support in EEE.
> tx-lpi on|off
> Determines whether the device should assert its tx lpi.
> advertise N
> Sets the speeds for which the device would advertise EEE capabliities. Values are as
> for --change advertise
> tx-timer N
> Sets the amount of time the device should stay in idle mode prior to asserting its tx
> lpi (in microseconds). This has meaning only when tx lpi is on.
Please just fold this change into the patch that adds the options. Also
there is no need to repeat the content in the commit message.
> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
> ethtool.8.in | 32 ++++++++++++++++++++++++++++++++
> 1 files changed, 32 insertions(+), 0 deletions(-)
>
> diff --git a/ethtool.8.in b/ethtool.8.in
> index 523b737..b906d8e 100644
> --- a/ethtool.8.in
> +++ b/ethtool.8.in
> @@ -335,6 +335,16 @@ ethtool \- query or control network driver and hardware settings
> .I devname flag
> .A1 on off
> .RB ...
> +.HP
> +.B ethtool \-\-get\-eee
> +.I devname
> +.HP
> +.B ethtool \-\-set\-eee
> +.I devname
> +.B2 eee on off
> +.B2 tx-lpi on off
> +.BN tx-timer
> +.BN advertise
> .
> .\" Adjust lines (i.e. full justification) and hyphenate.
> .ad
> @@ -817,6 +827,28 @@ Sets the device's private flags as specified.
> .I flag
> .A1 on off
> Sets the state of the named private flag.
> +.TP
> +.B \-\-get\-eee
> +Queries the specified network device for its support in Efficient Energy
> +Ethernet (according to the IEEE 802.3az specifications)
'of', not 'in'
> +.TP
> +.B \-\-set\-eee
> +Sets the device EEE behaviour.
> +.TP
> +.A2 eee on off
> +Enables/Disables the device support in EEE.
Lower-case 'd' for 'disables'.
'of', not 'in'
> +.TP
> +.A2 tx-lpi on off
> +Determines whether the device should assert its tx lpi.
'TX LPI' should be in upper-case.
> +.TP
> +.BI advertise \ N
> +Sets the speeds for which the device would advertise EEE capabliities.
'would', not 'should'
'capabilities', not 'capabliities'
> +Values are as for
> +.B \-\-change advertise
> +.TP
> +.BI tx-timer \ N
> +Sets the amount of time the device should stay in idle mode prior to asserting
> +its tx lpi (in microseconds). This has meaning only when tx lpi is on.
Same here.
Ben.
> .SH BUGS
> Not supported (in part or whole) on all network drivers.
> .SH AUTHOR
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: Possible deadlock in ipv6?
From: Eric Dumazet @ 2012-06-06 15:58 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: netdev
In-Reply-To: <1338998019.26966.10.camel@edumazet-glaptop>
On Wed, 2012-06-06 at 17:53 +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 18:49 +0400, Vladimir Davydov wrote:
> > I'm not familiar with the linux net subsystem, so I would appreciate if
> > someone could clarify if the following call chain is possible:
> >
> > addrconf_ifdown() calls neigh_ifdown(nd_tbl) which locks nd_tbl.lock for
> > writing and calls
> >
> > pneigh_ifdown
> > pndisc_destructor
> > ipv6_dev_mc_dec
> > __ipv6_dev_mc_dec
> > igmp6_group_dropped
> > igmp6_leave_group
> > igmp6_send
> > icmp6_dst_alloc
> > ip6_neigh_lookup
> > neigh_create
> >
> > and neigh_create() locks nd_tbl.lock for writing again resulting in a
> > deadlock.
>
> It seems a deadlock is possible indeed, good catch !
>
>
And it seems this neigh_down() can be removed, its called later
(after dev->ip6_ptr is cleared)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8f6411c..62c4c00 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2750,7 +2750,6 @@ static int addrconf_ifdown(struct net_device *dev, int how)
ASSERT_RTNL();
rt6_ifdown(net, dev);
- neigh_ifdown(&nd_tbl, dev);
idev = __in6_dev_get(dev);
if (idev == NULL)
^ permalink raw reply related
* Re: [net 0/3][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-06-06 16:00 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1338955735-8967-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 5 Jun 2012 21:08:52 -0700
> This series contains 2 fixes for ixgbe and a fix for e1000e.
>
> The following are changes since commit dc5cd894cace7bda4a743487a9f87d59a3f0a095:
> net/hyperv: Use wait_event on outstanding sends during device removal
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master
Pulled, thanks Jeff.
^ permalink raw reply
* Re: Possible deadlock in ipv6?
From: Vladimir Davydov @ 2012-06-06 16:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <1338998314.26966.12.camel@edumazet-glaptop>
On Jun 6, 2012, at 7:58 PM, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 17:53 +0200, Eric Dumazet wrote:
>> On Wed, 2012-06-06 at 18:49 +0400, Vladimir Davydov wrote:
>>> I'm not familiar with the linux net subsystem, so I would appreciate if
>>> someone could clarify if the following call chain is possible:
>>>
>>> addrconf_ifdown() calls neigh_ifdown(nd_tbl) which locks nd_tbl.lock for
>>> writing and calls
>>>
>>> pneigh_ifdown
>>> pndisc_destructor
>>> ipv6_dev_mc_dec
>>> __ipv6_dev_mc_dec
>>> igmp6_group_dropped
>>> igmp6_leave_group
>>> igmp6_send
>>> icmp6_dst_alloc
>>> ip6_neigh_lookup
>>> neigh_create
>>>
>>> and neigh_create() locks nd_tbl.lock for writing again resulting in a
>>> deadlock.
>>
>> It seems a deadlock is possible indeed, good catch !
>>
>>
>
> And it seems this neigh_down() can be removed, its called later
> (after dev->ip6_ptr is cleared)
>
BTW, commit d1ed113f1669390da9898da3beddcc058d938587 did exactly the same, but it was reverted along with a bundle of other commits by 73a8bd74e2618990dbb218c3d82f53e60acd9af0.
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 8f6411c..62c4c00 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2750,7 +2750,6 @@ static int addrconf_ifdown(struct net_device *dev, int how)
> ASSERT_RTNL();
>
> rt6_ifdown(net, dev);
> - neigh_ifdown(&nd_tbl, dev);
>
> idev = __in6_dev_get(dev);
> if (idev == NULL)
>
>
^ permalink raw reply
* Re: [PATCH net-next 1/7] be2net: don't call vid_config() when there's no vlan config
From: David Miller @ 2012-06-06 16:09 UTC (permalink / raw)
To: sathya.perla; +Cc: netdev
In-Reply-To: <9e61e8da-5c37-448c-8c07-71fa8b81fb5c@exht1.ad.emulex.com>
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 16:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338995944.26966.6.camel@edumazet-glaptop>
On Wed, Jun 06, 2012 at 05:19:04PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> > > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> > >
> > > > We currently do all stats either on napi callback or from
> > > > start_xmit callback.
> > > > This makes them safe, yes?
> > >
> > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> > > include/linux/u64_stats_sync.h section 6)
> > >
> > > * 6) If counter might be written by an interrupt, readers should block interrupts.
> > > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> > > * read partial values)
> > >
> > > Yes, its tricky...
> >
> > Sounds good, but I have a question: this realies on counters
> > being atomic on 64 bit.
> > Would not it be better to always use a seqlock even on 64 bit?
> > This way counters would actually be correct and in sync.
> > As it is if we want e.g. average packet size,
> > we can not rely e.g. on it being bytes/packets.
>
> When this stuff was discussed, we chose to have a nop on 64bits.
>
> Your point has little to do with 64bit stats, it was already like that
> with 'long int' counters.
Yes, of course.
> Consider average driver doing :
>
> dev->stats.rx_bytes += skb->len;
> dev->stats.rx_packets++;
>
> A concurrent reader can read an updated rx_bytes and a 'previous'
> rx_packets one.
>
> 'fixing' this requires a lot of work and memory barriers (in all
> drivers), for a very litle gain (at most one packet error)
> u64_stats_sync was really meant to be 0-cost on 64bit arches.
>
>
I understand, and not arguing about that.
But why do you say at most 1 packet?
Consider get_stats doing:
u64_stats_update_begin(&stats->syncp);
stats->tx_bytes += skb->len;
on 64 bit at this point
tx_packets might get incremented any number of times, no?
stats->tx_packets++;
u64_stats_update_end(&stats->syncp);
now tx_bytes and tx_packets are out of sync by more than 1.
^ permalink raw reply
* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Jesse Brandeburg @ 2012-06-06 16:26 UTC (permalink / raw)
To: Hiroaki SHIMODA
Cc: Eric Dumazet, Tom Herbert, Denys Fedoryshchenko, netdev,
e1000-devel, jeffrey.t.kirsher, davem
In-Reply-To: <20120606174303.0bfc9868.shimoda.hiroaki@gmail.com>
On Wed, 6 Jun 2012 Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> wrote:
> Sorry for long delay. I'll post.
> (I have no idea how to fix this problem as keeping TXDCTL.WTHRESH to 5)
I don't like changing WTHRESH wholesale because making the global change
to WTHRESH on e1000e just to fix this one bug (likely specific to a
particular chip/hardware) will adversely effect performance on many
models supported by e1000e not demonstrating any problem. We could
possibly check something in for just ESB2LAN (S5000 chipset).
Other people (tom herbert) with this same chipset have been unable to
reproduce this issue right?
^ permalink raw reply
* RE: [PATCH 1/2] Ethtool: Add EEE support
From: Yuval Mintz @ 2012-06-06 16:27 UTC (permalink / raw)
To: Ben Hutchings
Cc: netdev@vger.kernel.org, Eilon Greenstein, peppe.cavallaro@st.com
In-Reply-To: <1338997734.2836.17.camel@bwh-desktop.uk.solarflarecom.com>
Hi Ben,
I stand corrected; I'll supply a new version shortly.
I've got 2 questions though:
> For now, use int for booleans. At some point I would like to see a
> thorough cleanup of ethtool to use bool where appropriate - but that's
> independent of this
I've noticed that the 'do_generic_set' function assumes all fields are ints.
Is this a convention we should stick to (using __u32 in the ethtool structs)?
I'm asking because I'm "wasting" fields in the ethtool_eee struct as I use
__u32 for boolean fields, simply because what seems to be the conventional
method won't work with smaller fields (corrupts the following fields).
The seconds question - is there a dependency between your acceptance of
this patch series and Dave's acceptance of the kernel's ethtool modification?
I'm asking because changes in the ethtool header there should be applied in
this patch series as well (in ethtool-copy.h).
Thanks,
Yuval
^ permalink raw reply
* Re: [PATCH net-next 0/3] Remove casts to same type
From: David Miller @ 2012-06-06 16:33 UTC (permalink / raw)
To: joe; +Cc: netdev
In-Reply-To: <cover.1338849364.git.joe@perches.com>
From: Joe Perches <joe@perches.com>
Date: Mon, 4 Jun 2012 15:44:15 -0700
> Adding casts of objects to the same type is unnecessary
> and confusing for a human reader.
>
> Remove them via coccinelle script.
>
> No additional sparse warnings are emitted.
>
> Joe Perches (3):
> ethernet: Remove casts to same type
> wireless: Remove casts to same type
> drivers: net: Remove casts to same type
All applied, thanks Joe.
^ permalink raw reply
* Re: [PATCH (net.git) 0/3 (v3)] stmmac fixes for net.git
From: David Miller @ 2012-06-06 16:35 UTC (permalink / raw)
To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1338873777-5374-1-git-send-email-peppe.cavallaro@st.com>
From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 5 Jun 2012 07:22:54 +0200
> These patches fix a problem in the driver when built as dynamic
> module and fix the driver's documentation.
>
> v2: this patchset has the same patches I had sent before but
> I removed a patch that did a cleanup (now moved for net-next).
>
> v3: removed wrong reviewed-by from this patch:
> "stmmac: fix driver Kconfig when built as module"
>
> Giuseppe Cavallaro (3):
> stmmac: fix driver's doc when run kernel-doc script
> stmmac: update driver's doc
> stmmac: fix driver Kconfig when built as module
All applied, thanks.
^ permalink raw reply
* RE: [net-next PATCH v2 1/3] Added kernel support in EEE Ethtool commands
From: Yuval Mintz @ 2012-06-06 16:40 UTC (permalink / raw)
To: Ben Hutchings
Cc: davem@davemloft.net, netdev@vger.kernel.org, Eilon Greenstein,
peppe.cavallaro@st.com
In-Reply-To: <1338996011.2836.2.camel@bwh-desktop.uk.solarflarecom.com>
> > + * @supported: Link speeds for which there is eee support.
> > + * @advertised: Link speeds the interface advertises (AN) as eee capable.
> > + * @lp_advertised: Link speeds the link partner advertised as eee capable.
>
> And these are bitmasks of SUPPORTED_* & ADVERTISED_* flags, right?
Right.
> Maybe 'link modes' not 'link speeds'?
Not that it matters greatly, but there are SUPPORTED & ADVERTISED flags for
things other than link speeds, such as connection type and flow control,
so using exactly the same semantic in description might confuse someone.
Thank,
Yuval
^ permalink raw reply
* [PATCH] Net: mv643xx_eth: Fix compile error for architectures without clk.
From: Andrew Lunn @ 2012-06-06 16:40 UTC (permalink / raw)
To: jwboyer; +Cc: buytenh, olof, netdev, ben, Andrew Lunn
Commit 452503ebc (ARM: Orion: Eth: Add clk/clkdev support.) broke
the building of the driver on architectures which don't have clk
support. In particular PPC32 Pegasos which uses this driver.
Add #ifdef around the clk API usage.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/ethernet/marvell/mv643xx_eth.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index 04d901d..f0f06b2 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -436,7 +436,9 @@ struct mv643xx_eth_private {
/*
* Hardware-specific parameters.
*/
+#if defined(CONFIG_HAVE_CLK)
struct clk *clk;
+#endif
unsigned int t_clk;
};
@@ -2895,17 +2897,17 @@ static int mv643xx_eth_probe(struct platform_device *pdev)
mp->dev = dev;
/*
- * Get the clk rate, if there is one, otherwise use the default.
+ * Start with a default rate, and if there is a clock, allow
+ * it to override the default.
*/
+ mp->t_clk = 133000000;
+#if defined(CONFIG_HAVE_CLK)
mp->clk = clk_get(&pdev->dev, (pdev->id ? "1" : "0"));
if (!IS_ERR(mp->clk)) {
clk_prepare_enable(mp->clk);
mp->t_clk = clk_get_rate(mp->clk);
- } else {
- mp->t_clk = 133000000;
- printk(KERN_WARNING "Unable to get clock");
}
-
+#endif
set_params(mp, pd);
netif_set_real_num_tx_queues(dev, mp->txq_count);
netif_set_real_num_rx_queues(dev, mp->rxq_count);
@@ -2995,10 +2997,13 @@ static int mv643xx_eth_remove(struct platform_device *pdev)
phy_detach(mp->phy);
cancel_work_sync(&mp->tx_timeout_task);
+#if defined(CONFIG_HAVE_CLK)
if (!IS_ERR(mp->clk)) {
clk_disable_unprepare(mp->clk);
clk_put(mp->clk);
}
+#endif
+
free_netdev(mp->dev);
platform_set_drvdata(pdev, NULL);
--
1.7.10
^ permalink raw reply related
* Re: [PATCH v9] tilegx network driver: initial support
From: David Miller @ 2012-06-06 16:41 UTC (permalink / raw)
To: cmetcalf; +Cc: bhutchings, arnd, linux-kernel, netdev
In-Reply-To: <201206042023.q54KNEZp003834@farm-0002.internal.tilera.com>
From: Chris Metcalf <cmetcalf@tilera.com>
Date: Mon, 4 Jun 2012 16:12:03 -0400
> This change adds support for the tilegx network driver based on the
> GXIO IORPC support in the tilegx software stack, using the on-chip
> mPIPE packet processing engine.
>
> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
> ---
> This change fixes some bugs that were discovered during additional
> testing of the TSO refactoring. In addition, I added a comment
> explaining why we provide TSO support as essentially driver-side GSO.
>
> The previous v8 version of the patch (from 10 days ago) received no
> feedback; if anyone would care to provide feedback on this version of
> the driver, it would be much appreciated. Thanks!
Someone other than me please review this driver.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox