* Re: VPN traffic leaks in IPv6/IPv4 dual-stack networks/hosts
From: Fernando Gont @ 2012-11-27 16:07 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1354032252.14302.37.camel@edumazet-glaptop>
Hi, Eric,
On 11/27/2012 01:04 PM, Eric Dumazet wrote:
>> P.S.: Not sure if this is the right list to send this note. Please
>> advice of a more appropriate one and/or feel free to forward this note
>> if deemed appropriate...
>
> This seems a user space issue to me.
>
> accept_ra on linux is set to 1, meaning that as soon as forwarding is
> enabled, RA are ignored.
I don't follow. Why would RAs be ignored if accept_ra is set to 1??
Cheers,
--
Fernando Gont
e-mail: fernando@gont.com.ar || fgont@si6networks.com
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1
^ permalink raw reply
* Re: VPN traffic leaks in IPv6/IPv4 dual-stack networks/hosts
From: Eric Dumazet @ 2012-11-27 16:04 UTC (permalink / raw)
To: Fernando Gont; +Cc: netdev
In-Reply-To: <50B4D43A.7030208@gont.com.ar>
On Tue, 2012-11-27 at 11:54 -0300, Fernando Gont wrote:
> Folks,
>
> FYI. This is might affect Linux users employing e.g. OpenVPN:
> <http://tools.ietf.org/html/draft-gont-opsec-vpn-leakages>.
>
> For a project such as OpenVPN, a (portable) fix might be non-trivial.
> However, I guess Linux might hook some iptables rules when establishing
> the VPN tunnel, such that e.g. all v6 traffic is filtered (yes, this is
> certainly not the most desirable fix, but still probably better than
> having your supposedly-secured traffic being sent in the clear).
>
> P.S.: Not sure if this is the right list to send this note. Please
> advice of a more appropriate one and/or feel free to forward this note
> if deemed appropriate...
This seems a user space issue to me.
accept_ra on linux is set to 1, meaning that as soon as forwarding is
enabled, RA are ignored.
^ permalink raw reply
* Re: [PATCH 2/2] smsc75xx: support PHY wakeup source
From: Joe Perches @ 2012-11-27 15:56 UTC (permalink / raw)
To: Steve Glendinning; +Cc: netdev
In-Reply-To: <1354026482-10443-3-git-send-email-steve.glendinning@shawell.net>
On Tue, 2012-11-27 at 14:28 +0000, Steve Glendinning wrote:
> This patch enables LAN7500 family devices to wake from suspend
> on either link up or link down events.
[]
> diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
[]
> +static int smsc75xx_enter_suspend1(struct usbnet *dev)
> +{
> + u32 val;
> + int ret;
> +
> + ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> + check_warn_return(ret, "Error reading PMT_CTL");
Hi Steve, can you please add newlines to these new
check_warn_<foo> messages and the netdev_<level> ones too?
It helps avoid message interleaving.
> + if (!link_up) {
> + struct mii_if_info *mii = &dev->mii;
> + netdev_info(dev->net, "entering SUSPEND1 mode");
etc..
^ permalink raw reply
* Re: [PATCH v3 8/7] pppoatm: fix missing wakeup in pppoatm_send()
From: chas williams - CONTRACTOR @ 2012-11-27 15:23 UTC (permalink / raw)
To: David Woodhouse; +Cc: Krzysztof Mazur, netdev, linux-kernel, davem
In-Reply-To: <1354022867.26346.334.camel@shinybook.infradead.org>
On Tue, 27 Nov 2012 13:27:47 +0000
David Woodhouse <dwmw2@infradead.org> wrote:
> > i really would prefer not to use a strange name since it might confuse
> > larger group of people who are more familiar with the traditional meaning
> > of this function. vcc_release() isnt exported so we could rename it if
> > things get too confusing.
> >
> > i have to look at this a bit more but we might be able to use release_cb
> > to get rid of the null push to detach the underlying protocol. that would
> > be somewhat nice.
>
> In the meantime, should I resend this patch with the name 'release_cb'
> instead of 'unlock_cb'? I'll just put a comment in to make sure it isn't
> confused with vcc_release(), and if we need to change vcc_release()
> later we can.
>
yes, but dont call it 8/7 since that doesnt make sense.
^ permalink raw reply
* VPN traffic leaks in IPv6/IPv4 dual-stack networks/hosts
From: Fernando Gont @ 2012-11-27 14:54 UTC (permalink / raw)
To: netdev
Folks,
FYI. This is might affect Linux users employing e.g. OpenVPN:
<http://tools.ietf.org/html/draft-gont-opsec-vpn-leakages>.
For a project such as OpenVPN, a (portable) fix might be non-trivial.
However, I guess Linux might hook some iptables rules when establishing
the VPN tunnel, such that e.g. all v6 traffic is filtered (yes, this is
certainly not the most desirable fix, but still probably better than
having your supposedly-secured traffic being sent in the clear).
P.S.: Not sure if this is the right list to send this note. Please
advice of a more appropriate one and/or feel free to forward this note
if deemed appropriate...
Thanks,
--
Fernando Gont
e-mail: fernando@gont.com.ar || fgont@si6networks.com
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1
^ permalink raw reply
* [PATCH net-next] net: move inet_dport/inet_num in sock_common
From: Eric Dumazet @ 2012-11-27 15:06 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Ling Ma
From: Eric Dumazet <edumazet@google.com>
commit 68835aba4d9b (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.
This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses.
Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.
The namespace check can also be done at last.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
---
include/linux/ipv6.h | 26 ++++++++++---------
include/net/inet_hashtables.h | 38 ++++++++++++++++-------------
include/net/inet_sock.h | 4 +--
include/net/inet_timewait_sock.h | 5 ++-
include/net/sock.h | 10 ++++++-
5 files changed, 48 insertions(+), 35 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 5e11905..196ede4 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -365,19 +365,21 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk)
#endif /* IS_ENABLED(CONFIG_IPV6) */
#define INET6_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif)\
- (((__sk)->sk_hash == (__hash)) && sock_net((__sk)) == (__net) && \
- ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \
- ((__sk)->sk_family == AF_INET6) && \
- ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr)) && \
- ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr)) && \
- (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
+ (((__sk)->sk_hash == (__hash)) && \
+ ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \
+ ((__sk)->sk_family == AF_INET6) && \
+ ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr)) && \
+ ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr)) && \
+ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+ net_eq(sock_net(__sk), (__net)))
#define INET6_TW_MATCH(__sk, __net, __hash, __saddr, __daddr, __ports, __dif) \
- (((__sk)->sk_hash == (__hash)) && sock_net((__sk)) == (__net) && \
- (*((__portpair *)&(inet_twsk(__sk)->tw_dport)) == (__ports)) && \
- ((__sk)->sk_family == PF_INET6) && \
- (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_daddr, (__saddr))) && \
- (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_rcv_saddr, (__daddr))) && \
- (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
+ (((__sk)->sk_hash == (__hash)) && \
+ (*((__portpair *)&(inet_twsk(__sk)->tw_dport)) == (__ports)) && \
+ ((__sk)->sk_family == PF_INET6) && \
+ (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_daddr, (__saddr))) && \
+ (ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_rcv_saddr, (__daddr))) && \
+ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+ net_eq(sock_net(__sk), (__net)))
#endif /* _IPV6_H */
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 54be028..cd6766a 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -300,29 +300,33 @@ typedef __u64 __bitwise __addrpair;
((__force __u64)(__be32)(__saddr)));
#endif /* __BIG_ENDIAN */
#define INET_MATCH(__sk, __net, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
- (((__sk)->sk_hash == (__hash)) && net_eq(sock_net(__sk), (__net)) && \
- ((*((__addrpair *)&(inet_sk(__sk)->inet_daddr))) == (__cookie)) && \
- ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \
- (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
+ (((__sk)->sk_hash == (__hash)) && \
+ ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \
+ ((*((__addrpair *)&(inet_sk(__sk)->inet_daddr))) == (__cookie)) && \
+ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+ net_eq(sock_net(__sk), (__net)))
#define INET_TW_MATCH(__sk, __net, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
- (((__sk)->sk_hash == (__hash)) && net_eq(sock_net(__sk), (__net)) && \
- ((*((__addrpair *)&(inet_twsk(__sk)->tw_daddr))) == (__cookie)) && \
+ (((__sk)->sk_hash == (__hash)) && \
((*((__portpair *)&(inet_twsk(__sk)->tw_dport))) == (__ports)) && \
- (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
+ ((*((__addrpair *)&(inet_twsk(__sk)->tw_daddr))) == (__cookie)) && \
+ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+ net_eq(sock_net(__sk), (__net)))
#else /* 32-bit arch */
#define INET_ADDR_COOKIE(__name, __saddr, __daddr)
#define INET_MATCH(__sk, __net, __hash, __cookie, __saddr, __daddr, __ports, __dif) \
- (((__sk)->sk_hash == (__hash)) && net_eq(sock_net(__sk), (__net)) && \
- (inet_sk(__sk)->inet_daddr == (__saddr)) && \
- (inet_sk(__sk)->inet_rcv_saddr == (__daddr)) && \
- ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \
- (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
+ (((__sk)->sk_hash == (__hash)) && \
+ ((*((__portpair *)&(inet_sk(__sk)->inet_dport))) == (__ports)) && \
+ (inet_sk(__sk)->inet_daddr == (__saddr)) && \
+ (inet_sk(__sk)->inet_rcv_saddr == (__daddr)) && \
+ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+ net_eq(sock_net(__sk), (__net)))
#define INET_TW_MATCH(__sk, __net, __hash,__cookie, __saddr, __daddr, __ports, __dif) \
- (((__sk)->sk_hash == (__hash)) && net_eq(sock_net(__sk), (__net)) && \
- (inet_twsk(__sk)->tw_daddr == (__saddr)) && \
- (inet_twsk(__sk)->tw_rcv_saddr == (__daddr)) && \
- ((*((__portpair *)&(inet_twsk(__sk)->tw_dport))) == (__ports)) && \
- (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
+ (((__sk)->sk_hash == (__hash)) && \
+ ((*((__portpair *)&(inet_twsk(__sk)->tw_dport))) == (__ports)) && \
+ (inet_twsk(__sk)->tw_daddr == (__saddr)) && \
+ (inet_twsk(__sk)->tw_rcv_saddr == (__daddr)) && \
+ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))) && \
+ net_eq(sock_net(__sk), (__net)))
#endif /* 64-bit arch */
/*
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 256c1ed..ae6cfa5 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -144,9 +144,9 @@ struct inet_sock {
/* Socket demultiplex comparisons on incoming packets. */
#define inet_daddr sk.__sk_common.skc_daddr
#define inet_rcv_saddr sk.__sk_common.skc_rcv_saddr
+#define inet_dport sk.__sk_common.skc_dport
+#define inet_num sk.__sk_common.skc_num
- __be16 inet_dport;
- __u16 inet_num;
__be32 inet_saddr;
__s16 uc_ttl;
__u16 cmsg_flags;
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index ba52c83..671dbb7 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -112,6 +112,9 @@ struct inet_timewait_sock {
#define tw_net __tw_common.skc_net
#define tw_daddr __tw_common.skc_daddr
#define tw_rcv_saddr __tw_common.skc_rcv_saddr
+#define tw_dport __tw_common.skc_dport
+#define tw_num __tw_common.skc_num
+
int tw_timeout;
volatile unsigned char tw_substate;
unsigned char tw_rcv_wscale;
@@ -119,8 +122,6 @@ struct inet_timewait_sock {
/* Socket demultiplex comparisons on incoming packets. */
/* these three are in inet_sock */
__be16 tw_sport;
- __be16 tw_dport;
- __u16 tw_num;
kmemcheck_bitfield_begin(flags);
/* And these are ours. */
unsigned int tw_ipv6only : 1,
diff --git a/include/net/sock.h b/include/net/sock.h
index c945fba..e4bab2e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -132,6 +132,8 @@ struct net;
* @skc_rcv_saddr: Bound local IPv4 addr
* @skc_hash: hash value used with various protocol lookup tables
* @skc_u16hashes: two u16 hash values used by UDP lookup tables
+ * @skc_dport: placeholder for inet_dport/tw_dport
+ * @skc_num: placeholder for inet_num/tw_num
* @skc_family: network address family
* @skc_state: Connection state
* @skc_reuse: %SO_REUSEADDR setting
@@ -149,8 +151,8 @@ struct net;
* for struct sock and struct inet_timewait_sock.
*/
struct sock_common {
- /* skc_daddr and skc_rcv_saddr must be grouped :
- * cf INET_MATCH() and INET_TW_MATCH()
+ /* skc_daddr and skc_rcv_saddr must be grouped on a 8 bytes aligned
+ * address on 64bit arches : cf INET_MATCH() and INET_TW_MATCH()
*/
__be32 skc_daddr;
__be32 skc_rcv_saddr;
@@ -159,6 +161,10 @@ struct sock_common {
unsigned int skc_hash;
__u16 skc_u16hashes[2];
};
+ /* skc_dport && skc_num must be grouped as well */
+ __be16 skc_dport;
+ __u16 skc_num;
+
unsigned short skc_family;
volatile unsigned char skc_state;
unsigned char skc_reuse;
^ permalink raw reply related
* Re: [RFC net-next PATCH V1 7/9] net: frag queue locking per hash bucket
From: Jesper Dangaard Brouer @ 2012-11-27 15:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Florian Westphal, netdev, Pablo Neira Ayuso,
Thomas Graf, Cong Wang, Patrick McHardy, Paul E. McKenney,
Herbert Xu
In-Reply-To: <20121123130836.18764.9297.stgit@dragon>
On Fri, 2012-11-23 at 14:08 +0100, Jesper Dangaard Brouer wrote:
> DO NOT apply - patch not finished, can cause on OOPS/PANIC during hash rebuild
>
> This patch implements per hash bucket locking for the frag queue
> hash. This removes two write locks, and the only remaining write
> lock is for protecting hash rebuild. This essentially reduce the
> readers-writer lock to a rebuild lock.
>
> NOT-Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Last bug mentioned, were not the only one... fixing hopefully the last bug in this patch.
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 1620a21..447423f 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -35,20 +35,27 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
> unsigned long now = jiffies;
> int i;
>
> + /* Per bucket lock NOT needed here, due to write lock protection */
> write_lock(&f->lock);
> +
> get_random_bytes(&f->rnd, sizeof(u32));
> for (i = 0; i < INETFRAGS_HASHSZ; i++) {
> + struct inet_frag_bucket *hb;
> struct inet_frag_queue *q;
> struct hlist_node *p, *n;
>
> - hlist_for_each_entry_safe(q, p, n, &f->hash[i], list) {
> + hb = &f->hash[i];
> + hlist_for_each_entry_safe(q, p, n, &hb->chain, list) {
> unsigned int hval = f->hashfn(q);
>
> if (hval != i) {
> + struct inet_frag_bucket *hb_dest;
> +
> hlist_del(&q->list);
>
> /* Relink to new hash chain. */
> - hlist_add_head(&q->list, &f->hash[hval]);
> + hb_dest = &f->hash[hval];
> + hlist_add_head(&q->list, &hb->chain);
The above line were wrong, it should have been:
hlist_add_head(&q->list, &hb_dest->chain);
> }
> }
> }
The patch seem quite stable now. My test is to adjust to rebuild
interval to 2 sec and then run 4x 10G with two fragments (packet size
1472*2) to create as many fragments as possible (approx 300
inet_frag_queue elements).
30 min test run:
3726+3896+3960+3608 = 15190 Mbit/s
(For reproducers, note, that changing ipfrag_secret_interval (e.g.
sysctl -w net/ipv4/ipfrag_secret_interval=2), first take effect after
the first interval/timer expires, which default is 10 min)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH 1/2] smsc75xx: refactor entering suspend modes
From: Bjørn Mork @ 2012-11-27 14:50 UTC (permalink / raw)
To: Steve Glendinning
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1354026482-10443-2-git-send-email-steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org>
I believe the drivers/net/usb patches should be CCed to linux-usb for
review, because they often touch USB specific things. So I added that
CC and did not strip any of the quoted text.
Steve Glendinning <steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org> writes:
> This patch splits out the logic for entering suspend modes
> to separate functions, to reduce the complexity of the
> smsc75xx_suspend function.
>
> Signed-off-by: Steve Glendinning <steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org>
> ---
> drivers/net/usb/smsc75xx.c | 62 +++++++++++++++++++++++++++-----------------
> 1 file changed, 38 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
> index 953c4f4..4655c01 100644
> --- a/drivers/net/usb/smsc75xx.c
> +++ b/drivers/net/usb/smsc75xx.c
> @@ -1213,6 +1213,42 @@ static int smsc75xx_write_wuff(struct usbnet *dev, int filter, u32 wuf_cfg,
> return 0;
> }
>
> +static int smsc75xx_enter_suspend0(struct usbnet *dev)
> +{
> + u32 val;
> + int ret;
> +
> + ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> + check_warn_return(ret, "Error reading PMT_CTL\n");
> +
> + val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
> + val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
> +
> + ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> + check_warn_return(ret, "Error writing PMT_CTL\n");
> +
> + smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
As mentioned in another comment to the smsc95xx driver: This is weird.
Do you really need to do that?
This is an USB interface driver. The USB device is handled by the
generic "usb" driver, which will do the right thing. See
drivers/usb/generic.c and drivers/usb/core/hub.c
generic_suspend() calls usb_port_suspend() which does:
/* enable remote wakeup when appropriate; this lets the device
* wake up the upstream hub (including maybe the root hub).
*
* NOTE: OTG devices may issue remote wakeup (or SRP) even when
* we don't explicitly enable it here.
*/
if (udev->do_remote_wakeup) {
if (!hub_is_superspeed(hub->hdev)) {
status = usb_control_msg(udev, usb_sndctrlpipe(udev, 0),
USB_REQ_SET_FEATURE, USB_RECIP_DEVICE,
USB_DEVICE_REMOTE_WAKEUP, 0,
NULL, 0,
USB_CTRL_SET_TIMEOUT);
} else {
/* Assume there's only one function on the USB 3.0
* device and enable remote wake for the first
* interface. FIXME if the interface association
* descriptor shows there's more than one function.
*/
status = usb_control_msg(udev, usb_sndctrlpipe(udev, 0),
USB_REQ_SET_FEATURE,
USB_RECIP_INTERFACE,
USB_INTRF_FUNC_SUSPEND,
USB_INTRF_FUNC_SUSPEND_RW |
USB_INTRF_FUNC_SUSPEND_LP,
NULL, 0,
USB_CTRL_SET_TIMEOUT);
}
So you should not need to touch the USB device feature directly from your
interface driver.
> +
> + return 0;
> +}
> +
> +static int smsc75xx_enter_suspend2(struct usbnet *dev)
> +{
> + u32 val;
> + int ret;
> +
> + ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> + check_warn_return(ret, "Error reading PMT_CTL\n");
> +
> + val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
> + val |= PMT_CTL_SUS_MODE_2;
> +
> + ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> + check_warn_return(ret, "Error writing PMT_CTL\n");
> +
> + return 0;
> +}
> +
> static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
> {
> struct usbnet *dev = usb_get_intfdata(intf);
> @@ -1244,17 +1280,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
> ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> check_warn_return(ret, "Error writing PMT_CTL\n");
>
> - /* enter suspend2 mode */
> - ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> - check_warn_return(ret, "Error reading PMT_CTL\n");
> -
> - val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
> - val |= PMT_CTL_SUS_MODE_2;
> -
> - ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> - check_warn_return(ret, "Error writing PMT_CTL\n");
> -
> - return 0;
> + return smsc75xx_enter_suspend2(dev);
> }
>
> if (pdata->wolopts & (WAKE_MCAST | WAKE_ARP)) {
> @@ -1368,19 +1394,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
>
> /* some wol options are enabled, so enter SUSPEND0 */
> netdev_info(dev->net, "entering SUSPEND0 mode\n");
> -
> - ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> - check_warn_return(ret, "Error reading PMT_CTL\n");
> -
> - val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
> - val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
> -
> - ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> - check_warn_return(ret, "Error writing PMT_CTL\n");
> -
> - smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
> -
> - return 0;
> + return smsc75xx_enter_suspend0(dev);
> }
>
> static int smsc75xx_resume(struct usb_interface *intf)
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails
From: Vlad Yasevich @ 2012-11-27 14:49 UTC (permalink / raw)
To: Tommi Rantala
Cc: linux-sctp, netdev, Neil Horman, Sridhar Samudrala,
David S. Miller, Dave Jones
In-Reply-To: <1354024906-1925-1-git-send-email-tt.rantala@gmail.com>
On 11/27/2012 09:01 AM, Tommi Rantala wrote:
> Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
> reproducible e.g. with the sendto() syscall by passing invalid
> user space pointer in the second argument:
>
> #include <string.h>
> #include <arpa/inet.h>
> #include <sys/socket.h>
>
> int main(void)
> {
> int fd;
> struct sockaddr_in sa;
>
> fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
> if (fd < 0)
> return 1;
>
> memset(&sa, 0, sizeof(sa));
> sa.sin_family = AF_INET;
> sa.sin_addr.s_addr = inet_addr("127.0.0.1");
> sa.sin_port = htons(11111);
>
> sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
>
> return 0;
> }
>
> As far as I can tell, the leak has been around since ~2003.
>
> Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
-vlad
> ---
> net/sctp/chunk.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index 7c2df9c..f2aebdb 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -284,7 +284,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
> goto errout;
> err = sctp_user_addto_chunk(chunk, offset, len, msgh->msg_iov);
> if (err < 0)
> - goto errout;
> + goto errout_chunk_free;
>
> offset += len;
>
> @@ -324,7 +324,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
> __skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr
> - (__u8 *)chunk->skb->data);
> if (err < 0)
> - goto errout;
> + goto errout_chunk_free;
>
> sctp_datamsg_assign(msg, chunk);
> list_add_tail(&chunk->frag_list, &msg->chunks);
> @@ -332,6 +332,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
>
> return msg;
>
> +errout_chunk_free:
> + sctp_chunk_free(chunk);
> +
> errout:
> list_for_each_safe(pos, temp, &msg->chunks) {
> list_del_init(pos);
>
^ permalink raw reply
* [PATCH 2/2] net/davinci_emac: use clk_{prepare|unprepare}
From: Sekhar Nori @ 2012-11-27 14:47 UTC (permalink / raw)
To: David S. Miller
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
Mike Turquette
In-Reply-To: <1354027635-32627-1-git-send-email-nsekhar-l0cyMroinI0@public.gmane.org>
Use clk_prepare()/clk_unprepare() in the driver since common
clock framework needs these to be called before clock is enabled.
This is in preparation of common clock framework migration
for DaVinci.
Cc: Mike Turquette <mturquette-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Signed-off-by: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
---
drivers/net/ethernet/ti/davinci_emac.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 7be04dc..e7b3b94 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -352,6 +352,7 @@ struct emac_priv {
/*platform specific members*/
void (*int_enable) (void);
void (*int_disable) (void);
+ struct clk *clk;
};
/* EMAC TX Host Error description strings */
@@ -1870,19 +1871,29 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
dev_err(&pdev->dev, "failed to get EMAC clock\n");
return -EBUSY;
}
+
+ rc = clk_prepare(emac_clk);
+ if (rc) {
+ dev_err(&pdev->dev, "emac clock prepare failed.\n");
+ return rc;
+ }
+
emac_bus_frequency = clk_get_rate(emac_clk);
/* TODO: Probe PHY here if possible */
ndev = alloc_etherdev(sizeof(struct emac_priv));
- if (!ndev)
- return -ENOMEM;
+ if (!ndev) {
+ rc = -ENOMEM;
+ goto no_etherdev;
+ }
platform_set_drvdata(pdev, ndev);
priv = netdev_priv(ndev);
priv->pdev = pdev;
priv->ndev = ndev;
priv->msg_enable = netif_msg_init(debug_level, DAVINCI_EMAC_DEBUG);
+ priv->clk = emac_clk;
spin_lock_init(&priv->lock);
@@ -2020,6 +2031,8 @@ no_cpdma_chan:
cpdma_ctlr_destroy(priv->dma);
no_pdata:
free_netdev(ndev);
+no_etherdev:
+ clk_unprepare(emac_clk);
return rc;
}
@@ -2048,6 +2061,8 @@ static int __devexit davinci_emac_remove(struct platform_device *pdev)
unregister_netdev(ndev);
free_netdev(ndev);
+ clk_unprepare(priv->clk);
+
return 0;
}
--
1.7.10.1
^ permalink raw reply related
* [PATCH 1/2] net/davinci_emac: use devres APIs
From: Sekhar Nori @ 2012-11-27 14:47 UTC (permalink / raw)
To: David S. Miller
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/
Use devres APIs where possible to simplify error handling
in driver probe.
While at it, also rename the goto targets in error path to
introduce some consistency in how they are named.
Signed-off-by: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
---
drivers/net/ethernet/ti/davinci_emac.c | 46 +++++++++++---------------------
1 file changed, 16 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index fce89a0..7be04dc 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1865,21 +1865,18 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
/* obtain emac clock from kernel */
- emac_clk = clk_get(&pdev->dev, NULL);
+ emac_clk = devm_clk_get(&pdev->dev, NULL);
if (IS_ERR(emac_clk)) {
dev_err(&pdev->dev, "failed to get EMAC clock\n");
return -EBUSY;
}
emac_bus_frequency = clk_get_rate(emac_clk);
- clk_put(emac_clk);
/* TODO: Probe PHY here if possible */
ndev = alloc_etherdev(sizeof(struct emac_priv));
- if (!ndev) {
- rc = -ENOMEM;
- goto no_ndev;
- }
+ if (!ndev)
+ return -ENOMEM;
platform_set_drvdata(pdev, ndev);
priv = netdev_priv(ndev);
@@ -1893,7 +1890,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
if (!pdata) {
dev_err(&pdev->dev, "no platform data\n");
rc = -ENODEV;
- goto probe_quit;
+ goto no_pdata;
}
/* MAC addr and PHY mask , RMII enable info from platform_data */
@@ -1913,23 +1910,23 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
if (!res) {
dev_err(&pdev->dev,"error getting res\n");
rc = -ENOENT;
- goto probe_quit;
+ goto no_pdata;
}
priv->emac_base_phys = res->start + pdata->ctrl_reg_offset;
size = resource_size(res);
- if (!request_mem_region(res->start, size, ndev->name)) {
+ if (!devm_request_mem_region(&pdev->dev, res->start,
+ size, ndev->name)) {
dev_err(&pdev->dev, "failed request_mem_region() for regs\n");
rc = -ENXIO;
- goto probe_quit;
+ goto no_pdata;
}
- priv->remap_addr = ioremap(res->start, size);
+ priv->remap_addr = devm_ioremap(&pdev->dev, res->start, size);
if (!priv->remap_addr) {
dev_err(&pdev->dev, "unable to map IO\n");
rc = -ENOMEM;
- release_mem_region(res->start, size);
- goto probe_quit;
+ goto no_pdata;
}
priv->emac_base = priv->remap_addr + pdata->ctrl_reg_offset;
ndev->base_addr = (unsigned long)priv->remap_addr;
@@ -1962,7 +1959,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
if (!priv->dma) {
dev_err(&pdev->dev, "error initializing DMA\n");
rc = -ENOMEM;
- goto no_dma;
+ goto no_pdata;
}
priv->txchan = cpdma_chan_create(priv->dma, tx_chan_num(EMAC_DEF_TX_CH),
@@ -1971,14 +1968,14 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
emac_rx_handler);
if (WARN_ON(!priv->txchan || !priv->rxchan)) {
rc = -ENOMEM;
- goto no_irq_res;
+ goto no_cpdma_chan;
}
res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (!res) {
dev_err(&pdev->dev, "error getting irq res\n");
rc = -ENOENT;
- goto no_irq_res;
+ goto no_cpdma_chan;
}
ndev->irq = res->start;
@@ -2000,7 +1997,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
if (rc) {
dev_err(&pdev->dev, "error in register_netdev\n");
rc = -ENODEV;
- goto no_irq_res;
+ goto no_cpdma_chan;
}
@@ -2015,20 +2012,14 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
return 0;
-no_irq_res:
+no_cpdma_chan:
if (priv->txchan)
cpdma_chan_destroy(priv->txchan);
if (priv->rxchan)
cpdma_chan_destroy(priv->rxchan);
cpdma_ctlr_destroy(priv->dma);
-no_dma:
- res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- release_mem_region(res->start, resource_size(res));
- iounmap(priv->remap_addr);
-
-probe_quit:
+no_pdata:
free_netdev(ndev);
-no_ndev:
return rc;
}
@@ -2041,14 +2032,12 @@ no_ndev:
*/
static int __devexit davinci_emac_remove(struct platform_device *pdev)
{
- struct resource *res;
struct net_device *ndev = platform_get_drvdata(pdev);
struct emac_priv *priv = netdev_priv(ndev);
dev_notice(&ndev->dev, "DaVinci EMAC: davinci_emac_remove()\n");
platform_set_drvdata(pdev, NULL);
- res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (priv->txchan)
cpdma_chan_destroy(priv->txchan);
@@ -2056,10 +2045,7 @@ static int __devexit davinci_emac_remove(struct platform_device *pdev)
cpdma_chan_destroy(priv->rxchan);
cpdma_ctlr_destroy(priv->dma);
- release_mem_region(res->start, resource_size(res));
-
unregister_netdev(ndev);
- iounmap(priv->remap_addr);
free_netdev(ndev);
return 0;
--
1.7.10.1
^ permalink raw reply related
* Re: [PATCH 1/1] net: cpts: fix for build break after ARM SoC integration
From: Paul Walmsley @ 2012-11-27 14:42 UTC (permalink / raw)
To: Mugunthan V N
Cc: netdev, davem, linux-arm-kernel, linux-omap, b-cousson,
richardcochran
In-Reply-To: <1354012034-31686-1-git-send-email-mugunthanvnm@ti.com>
On Tue, 27 Nov 2012, Mugunthan V N wrote:
> CC drivers/net/ethernet/ti/cpts.o
> drivers/net/ethernet/ti/cpts.c:30:24: fatal error: plat/clock.h: No such file or directory
> compilation terminated.
> make[4]: *** [drivers/net/ethernet/ti/cpts.o] Error 1
> make[3]: *** [drivers/net/ethernet/ti] Error 2
> make[2]: *** [drivers/net/ethernet] Error 2
> make[1]: *** [drivers/net] Error 2
>
> fix for build break as the header file is removed from plat-omap as part of
> the below patch
>
> commit a135eaae524acba1509a3b19c97fae556e4da7cd
> Author: Paul Walmsley <paul@pwsan.com>
> Date: Thu Sep 27 10:33:34 2012 -0600
>
> ARM: OMAP: remove plat/clock.h
>
> Remove arch/arm/plat-omap/include/plat/clock.h by merging it into
> arch/arm/mach-omap1/clock.h and arch/arm/mach-omap2/clock.h.
> The goal here is to facilitate ARM single image kernels by removing
> includes via the "plat/" symlink.
>
> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
- Paul
^ permalink raw reply
* Re: [PATCH] smsc95xx: fix suspend buffer overflow
From: Bjørn Mork @ 2012-11-27 14:34 UTC (permalink / raw)
To: Steve Glendinning; +Cc: netdev, dan.carpenter
In-Reply-To: <1354022623-7317-1-git-send-email-steve.glendinning@shawell.net>
Steve Glendinning <steve.glendinning@shawell.net> writes:
> This patch fixes a buffer overflow introduced by bbd9f9e, where
> the filter_mask array is accessed beyond its bounds.
>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
> ---
> drivers/net/usb/smsc95xx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index 79d495d..6cdc504 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1281,7 +1281,7 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
> }
>
> if (pdata->wolopts & (WAKE_BCAST | WAKE_MCAST | WAKE_ARP | WAKE_UCAST)) {
> - u32 *filter_mask = kzalloc(32, GFP_KERNEL);
> + u32 *filter_mask = kzalloc(sizeof(u32) * 32, GFP_KERNEL);
> u32 command[2];
> u32 offset[2];
> u32 crc[4];
I wonder... all these magic constants (32, 2, 2, 4) obviously relate to
the maximum number of supported filters (8). It would be much easier to
avoid such bugs if the code documented this. Like
u32 *filter_mask = kzalloc(4 * sizeof(u32) * N, GFP_KERNEL);
u32 command[N/4];
u32 offset[N/4];
u32 crc[N/2];
And even better if you let the base types be "native" size so you could
avoid all the complicated indexing math:
u8 *filter_mask = kzalloc(4 * sizeof(u32) * N, GFP_KERNEL);
u8 command[N];
u8 offset[N];
u16 crc[N];
Yes, you will then have to do type conversions when writing to the chip,
but I believe the overall code will be much easier to follow with this
command[filter/4] |= 0x05UL << ((filter % 4) * 8);
offset[filter/4] |= 0x00 << ((filter % 4) * 8);
crc[filter/2] |= smsc_crc(bcast, 6, filter);
replaced by the IMHO more obvious
command[filter] = 0x05UL;
offset[filter] = 0x00;
crc[filter] = bitrev16(crc16(0xFFFF, bcast, 6));
BTW, the smsc_crc() function cannot work. It returns a u16 which it
sometimes will attemt to shift 16 bits...
And you don't test the kzalloc() return value.
And if I am really going to be a nitpick (which comes naturally to me
:-), then I don't think you need to allocate anything at all. You never
set more than a few bits in the first byte of the filter mask. Why not
create a small helper function which writes a filter mask with these
bits and fill the rest with zeroes?
Something along the lines of
int write_filter(struct usbnet *dev, u8 firstbyte)
{
int i, ret = 0;
u32 v = (u32)firstbyte << 24;
for (i = 0; i < 4 && !ret; i++) {
ret = smsc95xx_write_reg_nopm(dev, WUFF, v);
v = 0;
}
return ret;
}
u8 filter_mask_byte[N];
..
filter_mask_byte[filter] = 0x3F;
..
for (i = 0; i < wuff_filter_count; i++) {
ret = write_filter(dev, filter_mask_byte[i]);
check_warn_return(ret, "Error writing WUFF\n");
}
Bjørn
^ permalink raw reply
* [PATCH 2/2] smsc75xx: support PHY wakeup source
From: Steve Glendinning @ 2012-11-27 14:28 UTC (permalink / raw)
To: netdev; +Cc: Steve Glendinning
In-Reply-To: <1354026482-10443-1-git-send-email-steve.glendinning@shawell.net>
This patch enables LAN7500 family devices to wake from suspend
on either link up or link down events.
It also adds _nopm versions of mdio access functions, so we can
safely call them from suspend and resume functions
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
drivers/net/usb/smsc75xx.c | 168 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 151 insertions(+), 17 deletions(-)
diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index 4655c01..8f92d81 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -54,7 +54,7 @@
#define USB_PRODUCT_ID_LAN7500 (0x7500)
#define USB_PRODUCT_ID_LAN7505 (0x7505)
#define RXW_PADDING 2
-#define SUPPORTED_WAKE (WAKE_UCAST | WAKE_BCAST | \
+#define SUPPORTED_WAKE (WAKE_PHY | WAKE_UCAST | WAKE_BCAST | \
WAKE_MCAST | WAKE_ARP | WAKE_MAGIC)
#define check_warn(ret, fmt, args...) \
@@ -185,14 +185,15 @@ static int smsc75xx_clear_feature(struct usbnet *dev, u32 feature)
/* Loop until the read is completed with timeout
* called with phy_mutex held */
-static int smsc75xx_phy_wait_not_busy(struct usbnet *dev)
+static __must_check int __smsc75xx_phy_wait_not_busy(struct usbnet *dev,
+ int in_pm)
{
unsigned long start_time = jiffies;
u32 val;
int ret;
do {
- ret = smsc75xx_read_reg(dev, MII_ACCESS, &val);
+ ret = __smsc75xx_read_reg(dev, MII_ACCESS, &val, in_pm);
check_warn_return(ret, "Error reading MII_ACCESS\n");
if (!(val & MII_ACCESS_BUSY))
@@ -202,7 +203,8 @@ static int smsc75xx_phy_wait_not_busy(struct usbnet *dev)
return -EIO;
}
-static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
+static int __smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx,
+ int in_pm)
{
struct usbnet *dev = netdev_priv(netdev);
u32 val, addr;
@@ -211,7 +213,7 @@ static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
mutex_lock(&dev->phy_mutex);
/* confirm MII not busy */
- ret = smsc75xx_phy_wait_not_busy(dev);
+ ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
check_warn_goto_done(ret, "MII is busy in smsc75xx_mdio_read\n");
/* set the address, index & direction (read from PHY) */
@@ -220,13 +222,13 @@ static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
addr = ((phy_id << MII_ACCESS_PHY_ADDR_SHIFT) & MII_ACCESS_PHY_ADDR)
| ((idx << MII_ACCESS_REG_ADDR_SHIFT) & MII_ACCESS_REG_ADDR)
| MII_ACCESS_READ | MII_ACCESS_BUSY;
- ret = smsc75xx_write_reg(dev, MII_ACCESS, addr);
+ ret = __smsc75xx_write_reg(dev, MII_ACCESS, addr, in_pm);
check_warn_goto_done(ret, "Error writing MII_ACCESS\n");
- ret = smsc75xx_phy_wait_not_busy(dev);
+ ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
check_warn_goto_done(ret, "Timed out reading MII reg %02X\n", idx);
- ret = smsc75xx_read_reg(dev, MII_DATA, &val);
+ ret = __smsc75xx_read_reg(dev, MII_DATA, &val, in_pm);
check_warn_goto_done(ret, "Error reading MII_DATA\n");
ret = (u16)(val & 0xFFFF);
@@ -236,8 +238,8 @@ done:
return ret;
}
-static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
- int regval)
+static void __smsc75xx_mdio_write(struct net_device *netdev, int phy_id,
+ int idx, int regval, int in_pm)
{
struct usbnet *dev = netdev_priv(netdev);
u32 val, addr;
@@ -246,11 +248,11 @@ static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
mutex_lock(&dev->phy_mutex);
/* confirm MII not busy */
- ret = smsc75xx_phy_wait_not_busy(dev);
+ ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
check_warn_goto_done(ret, "MII is busy in smsc75xx_mdio_write\n");
val = regval;
- ret = smsc75xx_write_reg(dev, MII_DATA, val);
+ ret = __smsc75xx_write_reg(dev, MII_DATA, val, in_pm);
check_warn_goto_done(ret, "Error writing MII_DATA\n");
/* set the address, index & direction (write to PHY) */
@@ -259,16 +261,39 @@ static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
addr = ((phy_id << MII_ACCESS_PHY_ADDR_SHIFT) & MII_ACCESS_PHY_ADDR)
| ((idx << MII_ACCESS_REG_ADDR_SHIFT) & MII_ACCESS_REG_ADDR)
| MII_ACCESS_WRITE | MII_ACCESS_BUSY;
- ret = smsc75xx_write_reg(dev, MII_ACCESS, addr);
+ ret = __smsc75xx_write_reg(dev, MII_ACCESS, addr, in_pm);
check_warn_goto_done(ret, "Error writing MII_ACCESS\n");
- ret = smsc75xx_phy_wait_not_busy(dev);
+ ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
check_warn_goto_done(ret, "Timed out writing MII reg %02X\n", idx);
done:
mutex_unlock(&dev->phy_mutex);
}
+static int smsc75xx_mdio_read_nopm(struct net_device *netdev, int phy_id,
+ int idx)
+{
+ return __smsc75xx_mdio_read(netdev, phy_id, idx, 1);
+}
+
+static void smsc75xx_mdio_write_nopm(struct net_device *netdev, int phy_id,
+ int idx, int regval)
+{
+ __smsc75xx_mdio_write(netdev, phy_id, idx, regval, 1);
+}
+
+static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
+{
+ return __smsc75xx_mdio_read(netdev, phy_id, idx, 0);
+}
+
+static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
+ int regval)
+{
+ __smsc75xx_mdio_write(netdev, phy_id, idx, regval, 0);
+}
+
static int smsc75xx_wait_eeprom(struct usbnet *dev)
{
unsigned long start_time = jiffies;
@@ -1232,6 +1257,32 @@ static int smsc75xx_enter_suspend0(struct usbnet *dev)
return 0;
}
+static int smsc75xx_enter_suspend1(struct usbnet *dev)
+{
+ u32 val;
+ int ret;
+
+ ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+ check_warn_return(ret, "Error reading PMT_CTL");
+
+ val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
+ val |= PMT_CTL_SUS_MODE_1;
+
+ ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+ check_warn_return(ret, "Error writing PMT_CTL");
+
+ /* clear wol status, enable energy detection */
+ val &= ~PMT_CTL_WUPS;
+ val |= (PMT_CTL_WUPS_ED | PMT_CTL_ED_EN);
+
+ ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+ check_warn_return(ret, "Error writing PMT_CTL");
+
+ smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
+
+ return 0;
+}
+
static int smsc75xx_enter_suspend2(struct usbnet *dev)
{
u32 val;
@@ -1249,18 +1300,61 @@ static int smsc75xx_enter_suspend2(struct usbnet *dev)
return 0;
}
+static int smsc75xx_enable_phy_wakeup_interrupts(struct usbnet *dev, u16 mask)
+{
+ struct mii_if_info *mii = &dev->mii;
+ int ret;
+
+ netdev_dbg(dev->net, "enabling PHY wakeup interrupts");
+
+ /* read to clear */
+ ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, PHY_INT_SRC);
+ check_warn_return(ret, "Error reading PHY_INT_SRC");
+
+ /* enable interrupt source */
+ ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, PHY_INT_MASK);
+ check_warn_return(ret, "Error reading PHY_INT_MASK");
+
+ ret |= mask;
+
+ smsc75xx_mdio_write_nopm(dev->net, mii->phy_id, PHY_INT_MASK, ret);
+
+ return 0;
+}
+
+static int smsc75xx_link_ok_nopm(struct usbnet *dev)
+{
+ struct mii_if_info *mii = &dev->mii;
+ int ret;
+
+ /* first, a dummy read, needed to latch some MII phys */
+ ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, MII_BMSR);
+ check_warn_return(ret, "Error reading MII_BMSR");
+
+ ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, MII_BMSR);
+ check_warn_return(ret, "Error reading MII_BMSR");
+
+ return !!(ret & BMSR_LSTATUS);
+}
+
static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
{
struct usbnet *dev = usb_get_intfdata(intf);
struct smsc75xx_priv *pdata = (struct smsc75xx_priv *)(dev->data[0]);
+ u32 val, link_up;
int ret;
- u32 val;
ret = usbnet_suspend(intf, message);
check_warn_return(ret, "usbnet_suspend error\n");
- /* if no wol options set, enter lowest power SUSPEND2 mode */
- if (!(pdata->wolopts & SUPPORTED_WAKE)) {
+ /* determine if link is up using only _nopm functions */
+ link_up = smsc75xx_link_ok_nopm(dev);
+
+ /* if no wol options set, or if link is down and we're not waking on
+ * PHY activity, enter lowest power SUSPEND2 mode
+ */
+ if (!(pdata->wolopts & SUPPORTED_WAKE) ||
+ !(link_up || (pdata->wolopts & WAKE_PHY))) {
netdev_info(dev->net, "entering SUSPEND2 mode\n");
/* disable energy detect (link up) & wake up events */
@@ -1283,6 +1377,33 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
return smsc75xx_enter_suspend2(dev);
}
+ if (pdata->wolopts & WAKE_PHY) {
+ ret = smsc75xx_enable_phy_wakeup_interrupts(dev,
+ (PHY_INT_MASK_ANEG_COMP | PHY_INT_MASK_LINK_DOWN));
+ check_warn_return(ret, "error enabling PHY wakeup ints");
+
+ /* if link is down then configure EDPD and enter SUSPEND1,
+ * otherwise enter SUSPEND0 below
+ */
+ if (!link_up) {
+ struct mii_if_info *mii = &dev->mii;
+ netdev_info(dev->net, "entering SUSPEND1 mode");
+
+ /* enable energy detect power-down mode */
+ ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id,
+ PHY_MODE_CTRL_STS);
+ check_warn_return(ret, "Error reading PHY_MODE_CTRL_STS");
+
+ ret |= MODE_CTRL_STS_EDPWRDOWN;
+
+ smsc75xx_mdio_write_nopm(dev->net, mii->phy_id,
+ PHY_MODE_CTRL_STS, ret);
+
+ /* enter SUSPEND1 mode */
+ return smsc75xx_enter_suspend1(dev);
+ }
+ }
+
if (pdata->wolopts & (WAKE_MCAST | WAKE_ARP)) {
int i, filter = 0;
@@ -1349,6 +1470,19 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
ret = smsc75xx_write_reg_nopm(dev, WUCSR, val);
check_warn_return(ret, "Error writing WUCSR\n");
+ if (pdata->wolopts & WAKE_PHY) {
+ netdev_info(dev->net, "enabling PHY wakeup\n");
+ ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+ check_warn_return(ret, "Error reading PMT_CTL");
+
+ /* clear wol status, enable energy detection */
+ val &= ~PMT_CTL_WUPS;
+ val |= (PMT_CTL_WUPS_ED | PMT_CTL_ED_EN);
+
+ ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+ check_warn_return(ret, "Error writing PMT_CTL");
+ }
+
if (pdata->wolopts & WAKE_MAGIC) {
netdev_info(dev->net, "enabling magic packet wakeup\n");
ret = smsc75xx_read_reg_nopm(dev, WUCSR, &val);
--
1.7.10.4
^ permalink raw reply related
* [PATCH 1/2] smsc75xx: refactor entering suspend modes
From: Steve Glendinning @ 2012-11-27 14:28 UTC (permalink / raw)
To: netdev; +Cc: Steve Glendinning
In-Reply-To: <1354026482-10443-1-git-send-email-steve.glendinning@shawell.net>
This patch splits out the logic for entering suspend modes
to separate functions, to reduce the complexity of the
smsc75xx_suspend function.
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
drivers/net/usb/smsc75xx.c | 62 +++++++++++++++++++++++++++-----------------
1 file changed, 38 insertions(+), 24 deletions(-)
diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index 953c4f4..4655c01 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -1213,6 +1213,42 @@ static int smsc75xx_write_wuff(struct usbnet *dev, int filter, u32 wuf_cfg,
return 0;
}
+static int smsc75xx_enter_suspend0(struct usbnet *dev)
+{
+ u32 val;
+ int ret;
+
+ ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+ check_warn_return(ret, "Error reading PMT_CTL\n");
+
+ val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
+ val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
+
+ ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+ check_warn_return(ret, "Error writing PMT_CTL\n");
+
+ smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
+
+ return 0;
+}
+
+static int smsc75xx_enter_suspend2(struct usbnet *dev)
+{
+ u32 val;
+ int ret;
+
+ ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+ check_warn_return(ret, "Error reading PMT_CTL\n");
+
+ val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
+ val |= PMT_CTL_SUS_MODE_2;
+
+ ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+ check_warn_return(ret, "Error writing PMT_CTL\n");
+
+ return 0;
+}
+
static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
{
struct usbnet *dev = usb_get_intfdata(intf);
@@ -1244,17 +1280,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
check_warn_return(ret, "Error writing PMT_CTL\n");
- /* enter suspend2 mode */
- ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
- check_warn_return(ret, "Error reading PMT_CTL\n");
-
- val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
- val |= PMT_CTL_SUS_MODE_2;
-
- ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
- check_warn_return(ret, "Error writing PMT_CTL\n");
-
- return 0;
+ return smsc75xx_enter_suspend2(dev);
}
if (pdata->wolopts & (WAKE_MCAST | WAKE_ARP)) {
@@ -1368,19 +1394,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
/* some wol options are enabled, so enter SUSPEND0 */
netdev_info(dev->net, "entering SUSPEND0 mode\n");
-
- ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
- check_warn_return(ret, "Error reading PMT_CTL\n");
-
- val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
- val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
-
- ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
- check_warn_return(ret, "Error writing PMT_CTL\n");
-
- smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
-
- return 0;
+ return smsc75xx_enter_suspend0(dev);
}
static int smsc75xx_resume(struct usb_interface *intf)
--
1.7.10.4
^ permalink raw reply related
* [PATCH 0/2] smsc75xx enhancements
From: Steve Glendinning @ 2012-11-27 14:28 UTC (permalink / raw)
To: netdev; +Cc: Steve Glendinning
This patchset implements wake on PHY (link up or link down) for
smsc75xx, please consider for net-next.
Steve Glendinning (2):
smsc75xx: refactor entering suspend modes
smsc75xx: support PHY wakeup source
drivers/net/usb/smsc75xx.c | 224 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 186 insertions(+), 38 deletions(-)
--
1.7.10.4
^ permalink raw reply
* [PATCH] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails
From: Tommi Rantala @ 2012-11-27 14:01 UTC (permalink / raw)
To: linux-sctp, netdev
Cc: Neil Horman, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
Dave Jones, Tommi Rantala
In-Reply-To: <20121126.173429.323283427379416132.davem@davemloft.net>
Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
int main(void)
{
int fd;
struct sockaddr_in sa;
fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);
sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
return 0;
}
As far as I can tell, the leak has been around since ~2003.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
---
net/sctp/chunk.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 7c2df9c..f2aebdb 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -284,7 +284,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
goto errout;
err = sctp_user_addto_chunk(chunk, offset, len, msgh->msg_iov);
if (err < 0)
- goto errout;
+ goto errout_chunk_free;
offset += len;
@@ -324,7 +324,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
__skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr
- (__u8 *)chunk->skb->data);
if (err < 0)
- goto errout;
+ goto errout_chunk_free;
sctp_datamsg_assign(msg, chunk);
list_add_tail(&chunk->frag_list, &msg->chunks);
@@ -332,6 +332,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
return msg;
+errout_chunk_free:
+ sctp_chunk_free(chunk);
+
errout:
list_for_each_safe(pos, temp, &msg->chunks) {
list_del_init(pos);
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH RFC 3/5] printk: modify printk interface for syslog_namespace
From: Serge Hallyn @ 2012-11-27 13:58 UTC (permalink / raw)
To: Libo Chen
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman
In-Reply-To: <50B4BF64.6010707-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Quoting Libo Chen (chenlibo.3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> From: Libo Chen <clbchenlibo.chen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>
> On 2012-11-25 12:28, Serge E. Hallyn wrote:
> > Quoting Libo Chen (chenlibo.3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> >> On 2012/11/22 1:49, Serge E. Hallyn wrote:
> >>
> >>> I notice that you haven't made any changes to the struct cont. I
> >>> suspect this means that to-be-continued msgs from one ns can be
> >>> erroneously mixed with another ns.
> >>>
> >> Yes, I confirmed this problem. There will be erroneously mixed with another ns.
> >> Thank you very much.
> >>
> >>> You said you don't mind putting the syslogns into the userns. If
> >>> there's no reason not to do that, then we should do so as it will
> >>> remove a bunch of code (plus the use of a new CLONE flag) from your
> >>> patch, and the new syslog(NEW_NS) command from mine.
> >>>
> >> I agree with you, both are removable.
> >>
> >>> Now IMO the ideal place for syslog_ns would be in the devices ns,
> >>> but that does not yet exist, and may never. The bonus to that would
> >>> be that the consoles sort of belong there. I avoid this by not
> >>> having consoles in child syslog namespaces. You put the console in
> >>> the ns. I haven't looked closely enough to see if what you do is
> >>> ok (will do so soon).
> >>>
> >>> WOuld you mind looking through my patch to see if it suffices for
> >>> your needs? Where it does not, patches would be greatly appreciated
> >>> if simple enough.
> >>
> >> follow your patch, I can see inject message by "dmesg call" in container, is right?
> >
> > If I understand you right, yes.
> >
> >> I am worry that I debug or see messages from serial ports console in some embedded system,
> >> since console belongs to init_syslog, so the message in container can`t be printed.
> >
> > Sorry, I don't understand which way you're going with that. Could you
> > rephrase? You want to prevent console messages from going to a
> > container? (That should definately not happen) Or something else?
> >
>
> I reviewed your patch, and found that console could only print messages
> belonging to init_syslog.
>
> So the message belongs to container syslog can not be printed from console,
> but only "dmesg call" in user space. Is that right?
>
> For example, the messages can not be outputed automatically from serial port
> as a kind of consoles on some embedded system.
Oh, I see. I basically thought this was a feature, not a problem :) But
that wasn't meant to be a core part of my patchset, rather I wasn't quite
sure how best to handle it, so I put it off for later. My main concern is
that if consoles in containers are supported, this must NOT lead to
kernel module loading from the container.
> And I am not sure if there are no other problems.
Ok, I will write a new patch which would (a) try to address the consoles,
(b) move the syslogns into the user_ns (making it no longer a syslog_ns),
and (c) adding some users of ns_printk (borrowing the ones from your
set for starters).
thanks,
-serge
^ permalink raw reply
* Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line
From: Eric Dumazet @ 2012-11-27 13:58 UTC (permalink / raw)
To: Ling Ma; +Cc: linux-kernel, netdev
In-Reply-To: <CAOGi=dPQWC8hgt4jhMEHcVPb6j+jMTguNAchiLjdvvHjarCW4Q@mail.gmail.com>
On Tue, 2012-11-27 at 21:48 +0800, Ling Ma wrote:
> Ling: in the looking-up routine, hash value is the most important key,
> if it is matched, the other values have most possibility to be
> satisfied, and CFW is limited by memory bandwidth(64bit usually), so
> we only move hash value as critical first word.
In practice, we have at most one TCP socket per hash slot.
99.9999 % of lookups need all fields to complete.
Your patch introduces a misalignment error. I am not sure all 64 bit
arches are able to cope with that gracefully.
It seems all CWF docs I could find are very old stuff, mostly academic,
without good performance data.
I was asking for up2date statements from Intel/AMD/... about current
cpus and current memory. Because optimizing for 10 years olds cpus is
not worth the pain.
I am assuming cpus are implementing the CWF/ER automatically, and that
only prefetches could have a slight disadvantage if the needed word is
not the first word in the cache line. Its not clear why the prefetch()
hint could not also use CWF. It seems it also could be done by the
hardware.
So before random patches in linux kernel adding their possible bugs, we
need a good study.
Thanks
^ permalink raw reply
* Re: BQL support in gianfar causes network hickup
From: Eric Dumazet @ 2012-11-27 13:49 UTC (permalink / raw)
To: Keitel, Tino (ALC NetworX GmbH)
Cc: Tino Keitel, Paul Gortmaker, netdev@vger.kernel.org
In-Reply-To: <1354023162.7553.1708.camel@edumazet-glaptop>
On Tue, 2012-11-27 at 05:32 -0800, Eric Dumazet wrote:
> Can you reproduce the problem without PTP running, or disabled in the
> driver ?
>
> (comment the "priv->hwts_tx_en = 1;" line)
>
>
> This looks like we miss an interrupt ( or TXBD_INTERRUPT not correctly
> set)
>
> And it could be a bug occurring if we try to send one skb with fragments
> and skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
>
>
By the way are any errata flagged in gfar_detect_errata() ?
^ permalink raw reply
* Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line
From: Ling Ma @ 2012-11-27 13:48 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-kernel, netdev
In-Reply-To: <1353912241.30446.1257.camel@edumazet-glaptop>
> networking patches should be sent to netdev.
>
> (I understand this patch is more a generic one, but at least CC netdev)
Ling: OK, this is my first inet patch, I will send to netdev later.
> You give no performance numbers for this change...
Ling: after I get machine, I will send out test result.
> I never heard of this CWF/ER, where are the official Intel documents
> about this, and what models really benefit from it ?
Ling:
Arm implemented it.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388f/Caccifbd.html
AMD also used it.
http://classes.soe.ucsc.edu/cmpe202/Fall04/papers/opteron.pdf
> Also, why not moving skc_net as well ?
>
> BTW, skc_daddr & skc_rcv_saddr are 'critical' as well, we use them in
> INET_MATCH()
Ling: in the looking-up routine, hash value is the most important key,
if it is matched, the other values have most possibility to be
satisfied, and CFW is limited by memory bandwidth(64bit usually), so
we only move hash value as critical first word.
Thanks
Ling
^ permalink raw reply
* Re: BQL support in gianfar causes network hickup
From: Eric Dumazet @ 2012-11-27 13:32 UTC (permalink / raw)
To: Keitel, Tino (ALC NetworX GmbH)
Cc: Tino Keitel, Paul Gortmaker, netdev@vger.kernel.org
In-Reply-To: <9AA65D849A88EB44B5D9B6A8BA098E23040A60D6EE71@Exchange1.lawo.de>
On Tue, 2012-11-27 at 13:42 +0100, Keitel, Tino (ALC NetworX GmbH)
wrote:
> On Di, 2012-11-27 at 04:36 -0800, Eric Dumazet wrote:
> >
> > Can you reproduce the problem using a single cpu ?
>
> Yes, it is a single-CPU system.
Can you reproduce the problem without PTP running, or disabled in the
driver ?
(comment the "priv->hwts_tx_en = 1;" line)
This looks like we miss an interrupt ( or TXBD_INTERRUPT not correctly
set)
And it could be a bug occurring if we try to send one skb with fragments
and skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
^ permalink raw reply
* Ethernet deferred 'end of transmit' processing.
From: David Laight @ 2012-11-27 13:28 UTC (permalink / raw)
To: netdev
Eric and I have just had a private discussion about deferring
(or not) the ethernet 'end of tx' processing.
Below is Eric's last email.
> > Subject: RE: performance regression on HiperSockets depending on MTU size
> >
> > On Mon, 2012-11-26 at 16:38 +0000, David Laight wrote:
> > > > For example, I had to change mlx4 driver for the same problem : Make
> > > > sure a TX packet can be "TX completed" in a short amount of time.
> > >
> > > I'm intrigued that Linux is going that way.
> > > It (effectively) requires the hardware generate an interrupt
> > > for every transmit packet in order to get high throughput.
> > >
> > > I remember carefully designing ethernet drivers to avoid
> > > taking 'tx done' interrupts unless absolutely necessary
> > > in order to reduce system interrupt load.
> > > Some modern hardware probably allows finer control of 'tx done'
> > > interrupts, but it won't be universal.
> > >
> > > I realise that hardware TX segmentation offload can cause a
> > > single tx ring entry to take a significant amount of time to
> > > transmit - so allowing a lot of packets to sit in the tx
> > > ring causes latency issues.
> > >
> > > But there has to be a better solution than requiring every
> > > tx to complete very quickly - especially if the tx flow
> > > is actually a lot of small packets.
> > >
> > > David
> > >
> >
> > 20 years ago, interrupts were expensive so you had to batch packets.
> >
> > In 2012, we want low latencies, because hardware is fast and is able to
> > cope with the requirement.
> >
> > Instead of one cpu, we now have 24 cpus or more per host.
> >
> > And if there is enough load, NAPI will really avoid interrupts, and you
> > get full batch advantages (lowering the number of cpu cycles per packet)
>
> AFAICT some of the stuff being done to get 10G+ speeds is
> actually similar to what I was doing trying to saturate
> 10M ethernet. Network speeds have increased by a factor
> of (about) 800, cpu clock speeds only by 100 or so
> (we were doing quad cpu sparc systems with quite slow
> cache coherency operations).
> Somewhere in the last 20 years a lot of code has got very lazy!
>
> Using 'source allocated' byte counts for flow control
> (which is what I presume the socket send code does) so that
> each socket has a limited amount of live data in the kernel
> and can't allocate a new buffer (skb) until the amount of
> kernel memory allocated to the 'live' buffers decreases
> (ie a transmit completes) certainly works a lot better
> that the target queue size flow control attempted by SYSV
> STREAMS (which doesn't work very well at all!).
>
> What might work is to allow the ethernet driver to reassign
> some bytes of the SKB from the socket (or other source) to
> the transmit interface - then it need not request end of tx
> immediately for those bytes.
It seems you understood how it currently works.
> The amount it could take can be quite small - possibly one
> or two maximal sized ring entries, or (say) 100us of network
> time.
>
> With multiple flows this will make little difference to the
> size of the burst that each socket gets to add into the
> interfaces tx queue (unlike increasing the socket tx buffer).
> But with a single flow it will let the socket get the next
> tx data queued even if the tx interrupts are deferred.
>
> The only time it doesn't help is when the next transmit
> can't be done until the reference count on the skb decreases.
> (We had some NFS code like that!)
If you read the code, you'll see current implementation is able to keep
a 20Gbe link busy with a single tcp flow, with 2 TSO packets posted on
the device.
A TSO packet is about 545040 bits on wire, or 27 us.
That's 36694 interrupts per second. Even my phone is able to sustain this
rate of interrupts.
But if the device holds the TX completion interrupt for 100 us,
performance of a single TCP flow is hurt. I don't think it's hard to
understand.
mlx4 driver handles 40Gbe links, 13 us is the needed value, not 100 us.
Please post these mails to netdev, there is no secret to protect.
^ permalink raw reply
* Re: [PATCH v3 8/7] pppoatm: fix missing wakeup in pppoatm_send()
From: David Woodhouse @ 2012-11-27 13:27 UTC (permalink / raw)
To: Chas Williams (CONTRACTOR); +Cc: Krzysztof Mazur, netdev, linux-kernel, davem
In-Reply-To: <201211112257.qABMvhP4021769@thirdoffive.cmf.nrl.navy.mil>
[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]
On Sun, 2012-11-11 at 17:57 -0500, Chas Williams (CONTRACTOR) wrote:
> In message <1352667081.9449.135.camel@shinybook.infradead.org>,David Woodhouse writes:
> >Acked-by: David Woodhouse <David.Woodhouse@intel.com> for your new
> >version of patch #6 (returning DROP_PACKET for !VF_READY), and your
> >followup to my patch #8, adding the 'need_wakeup' flag. Which we might
> >as well merge into (the pppoatm part of) my patch.
> >
> >Chas, are you happy with the generic ATM part of that? And the
> >nomenclature? I didn't want to call it 'release_cb' like the core socket
> >code does, because we use 'release' to mean something different in ATM.
> >So I called it 'unlock_cb' instead...
>
> i really would prefer not to use a strange name since it might confuse
> larger group of people who are more familiar with the traditional meaning
> of this function. vcc_release() isnt exported so we could rename it if
> things get too confusing.
>
> i have to look at this a bit more but we might be able to use release_cb
> to get rid of the null push to detach the underlying protocol. that would
> be somewhat nice.
In the meantime, should I resend this patch with the name 'release_cb'
instead of 'unlock_cb'? I'll just put a comment in to make sure it isn't
confused with vcc_release(), and if we need to change vcc_release()
later we can.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply
* Re: [PATCH RFC 3/5] printk: modify printk interface for syslog_namespace
From: Libo Chen @ 2012-11-27 13:25 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman
In-Reply-To: <20121125042802.GB4523-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
From: Libo Chen <clbchenlibo.chen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
On 2012-11-25 12:28, Serge E. Hallyn wrote:
> Quoting Libo Chen (chenlibo.3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
>> On 2012/11/22 1:49, Serge E. Hallyn wrote:
>>
>>> I notice that you haven't made any changes to the struct cont. I
>>> suspect this means that to-be-continued msgs from one ns can be
>>> erroneously mixed with another ns.
>>>
>> Yes, I confirmed this problem. There will be erroneously mixed with another ns.
>> Thank you very much.
>>
>>> You said you don't mind putting the syslogns into the userns. If
>>> there's no reason not to do that, then we should do so as it will
>>> remove a bunch of code (plus the use of a new CLONE flag) from your
>>> patch, and the new syslog(NEW_NS) command from mine.
>>>
>> I agree with you, both are removable.
>>
>>> Now IMO the ideal place for syslog_ns would be in the devices ns,
>>> but that does not yet exist, and may never. The bonus to that would
>>> be that the consoles sort of belong there. I avoid this by not
>>> having consoles in child syslog namespaces. You put the console in
>>> the ns. I haven't looked closely enough to see if what you do is
>>> ok (will do so soon).
>>>
>>> WOuld you mind looking through my patch to see if it suffices for
>>> your needs? Where it does not, patches would be greatly appreciated
>>> if simple enough.
>>
>> follow your patch, I can see inject message by "dmesg call" in container, is right?
>
> If I understand you right, yes.
>
>> I am worry that I debug or see messages from serial ports console in some embedded system,
>> since console belongs to init_syslog, so the message in container can`t be printed.
>
> Sorry, I don't understand which way you're going with that. Could you
> rephrase? You want to prevent console messages from going to a
> container? (That should definately not happen) Or something else?
>
I reviewed your patch, and found that console could only print messages
belonging to init_syslog.
So the message belongs to container syslog can not be printed from console,
but only "dmesg call" in user space. Is that right?
For example, the messages can not be outputed automatically from serial port
as a kind of consoles on some embedded system.
And I am not sure if there are no other problems.
thanks!
>>> Note I'm not at all wedded to my patchset. I'm happy to go with
>>> something else entirely. My set was just a proof of concept.
>
> thanks,
> -serge
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox