Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/1] net: cpts: fix for build break after ARM SoC integration
From: Richard Cochran @ 2012-11-27 12:47 UTC (permalink / raw)
  To: Mugunthan V N
  Cc: netdev, davem, linux-arm-kernel, linux-omap, b-cousson, paul
In-Reply-To: <1354012034-31686-1-git-send-email-mugunthanvnm@ti.com>

On Tue, Nov 27, 2012 at 03:57:14PM +0530, Mugunthan V N wrote:
>   CC      drivers/net/ethernet/ti/cpts.o
> drivers/net/ethernet/ti/cpts.c:30:24: fatal error: plat/clock.h: No such file or directory
> compilation terminated.
> make[4]: *** [drivers/net/ethernet/ti/cpts.o] Error 1
> make[3]: *** [drivers/net/ethernet/ti] Error 2
> make[2]: *** [drivers/net/ethernet] Error 2
> make[1]: *** [drivers/net] Error 2
> 
> fix for build break as the header file is removed from plat-omap as part of
> the below patch

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* [PATCH] ping,tracepath doc: Fix missing end tags.
From: Jan Synacek @ 2012-11-27 13:01 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev, Jan Synacek

Signed-off-by: Jan Synacek <jsynacek@redhat.com>
---
 doc/ping.sgml      | 1 +
 doc/tracepath.sgml | 1 +
 2 files changed, 2 insertions(+)

diff --git a/doc/ping.sgml b/doc/ping.sgml
index f77276b..fb3c3ac 100644
--- a/doc/ping.sgml
+++ b/doc/ping.sgml
@@ -130,6 +130,7 @@ If value is zero, kernel allocates random flow label.
   <listitem><para>
 Show help.
   </para></listitem>
+ </varlistentry>
  <varlistentry>
   <term><option>-i <replaceable/interval/</option></term>
   <listitem><para>
diff --git a/doc/tracepath.sgml b/doc/tracepath.sgml
index 8da7cc0..19c3903 100644
--- a/doc/tracepath.sgml
+++ b/doc/tracepath.sgml
@@ -72,6 +72,7 @@ Sets the initial packet length to <replaceable/pktlen/ instead of
   <listitem><para>
 Sets the initial destination port to use.
   </para></listitem>
+ </varlistentry>
 </variablelist>
 </refsect1>
 
-- 
1.7.11.7

^ permalink raw reply related

* Re: [net-next RFC v2] net_cls: traffic counter based on classification control cgroup
From: Daniel Wagner @ 2012-11-27 13:02 UTC (permalink / raw)
  To: Alexey Perevalov
  Cc: Glauber Costa, netdev-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Kyungmin Park
In-Reply-To: <50B49DEA.7010000-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

Hi Alexey,

On 27.11.2012 12:03, Glauber Costa wrote:
> On 11/27/2012 02:56 PM, Alexey Perevalov wrote:
>> Hello.
>>
>> It's second version of patch I already sent to netdev.
>>
>> The main goal of this patch it's counting traffic for process placed to
>> net_cls cgroup (ingress and egress).
>> It's based on res_counters and holds counter per network interfaces.
>>
>> Description of patch.
>> It handles packets in net/core/dev.c for egress and in
>> /net/ipv4/tcp.c|udp.c for ingress.
>> These places were chosen because we need to know also network interface.
>>
>> Cgroup fs interface provides following files additional to existing
>> net_cls files:
>> net_cls.ifacename.usage_in_bytes
>> Containing rcv/snd lines.
>> Also this patch adds to net_cls ability to handle a network device
>> registration.
>>
>> It could be included or excluded in compile time.
>> I moved the menu entry for "Control group classifier" from network/QoS to
>> General Option/Control Group.
>>
>> I'm waiting for you comments.
>>
> 
> Daniel Wagner is working on something a lot similar.

Yes, basically what I try to do is explained by this excellent article

https://lwn.net/Articles/523058/

The short version: Per application routing and statistics. 

I have two PoC implementation doing this. Both implementation have the same key
idea which is to set SO_MARK per application. The routing and statistics would 
then be done by a bunch iptables rules.

In the first implementation extends net_cls to set SO_MARK:

void sock_update_classid(struct sock *sk, struct task_struct *task)
 {
        u32 classid;
+       u32 mark;
 
        classid = task_cls_classid(task);
        if (classid != sk->sk_classid)
                sk->sk_classid = classid;
+
+       mark = task_cls_mark(task);
+       if (mark != sk->sk_mark)
+               sk->sk_mark = mark;
 }

The second implementation is adding a new iptables matcher which matches
on LSM contexts. Then you can do something like this:

iptables -t mangle -A OUTPUT -m secmark --secctx unconfined_u:unconfined_r:foo_t:s0-s0:c0.c1023 -j MARK --set-mark 200

> Maybe you should be in contact, in case you are not yet.
> 
> A few general comments:
> 1) res_counters are incredibly expensive. If you are more interested in
> counting than you are in limiting, they may not be your best choice.
> 
> 2) When Daniel exposed his use case to me, it gave me the impression
> that "counting traffic" is something that is totally doable by having a
> dedicated interface in a separate namespace. Basically, we already count
> traffic (rx and tx) for all interfaces anyway, so it suggests that it
> could be an interesting way to see the problem.

Moving applications into separate net namespaces is for sure a valid solution. 
Though there is a one drawback in this approach. The namespaces need to be 
attached to a bridge and then some NATting. That means every application
would get it's own IP address. This might be okay for your certain use
cases but I am still trying to work around this. Glauber and I had some
discussion about this and he suggested to allow the physical networking
device to be attached to several namespaces (e.g. via macvlan). Every
namespace would get the same IP address. Unfortunately, this would result in
the same mess as several physical devices on a network get the same
IP address assigned. 

> AFAIK, Daniel is still measuring this. But it would be great to know if
> that could work for your use case as well.

I have not started to measure :(

cheers,
daniel

^ permalink raw reply

* Re: [PATCH] ping,tracepath doc: Fix missing end tags.
From: YOSHIFUJI Hideaki @ 2012-11-27 13:09 UTC (permalink / raw)
  To: Jan Synacek; +Cc: netdev, YOSHIFUJI Hideaki
In-Reply-To: <1354021311-16760-1-git-send-email-jsynacek@redhat.com>

Hello.

Jan Synacek wrote:
> Signed-off-by: Jan Synacek <jsynacek@redhat.com>
> ---
>  doc/ping.sgml      | 1 +
>  doc/tracepath.sgml | 1 +
>  2 files changed, 2 insertions(+)
> 

Applied, thank you.

--yoshfuji

^ permalink raw reply

* Re: smsc95xx: detect chip revision specific features
From: Steve Glendinning @ 2012-11-27 13:21 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: netdev
In-Reply-To: <20121127123957.GT6186@mwanda>

>> > drivers/net/usb/smsc95xx.c
>> >   1283          if (pdata->wolopts & (WAKE_BCAST | WAKE_MCAST | WAKE_ARP | WAKE_UCAST)) {
>> >   1284                  u32 *filter_mask = kzalloc(32, GFP_KERNEL);
>> >                                                    ^^
>> > We allocate 8 unsigned 32 bit values.  I think this is the mistake here
>> > actually.  It is a typo and should say:

<snip>

On re-reading the datasheet we *do* need 32 u32's here so you were
right the first time!  Patch on its way shortly.

Steve

^ permalink raw reply

* [PATCH] smsc95xx: fix suspend buffer overflow
From: Steve Glendinning @ 2012-11-27 13:23 UTC (permalink / raw)
  To: netdev; +Cc: dan.carpenter, Steve Glendinning

This patch fixes a buffer overflow introduced by bbd9f9e, where
the filter_mask array is accessed beyond its bounds.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
 drivers/net/usb/smsc95xx.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 79d495d..6cdc504 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1281,7 +1281,7 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 	}
 
 	if (pdata->wolopts & (WAKE_BCAST | WAKE_MCAST | WAKE_ARP | WAKE_UCAST)) {
-		u32 *filter_mask = kzalloc(32, GFP_KERNEL);
+		u32 *filter_mask = kzalloc(sizeof(u32) * 32, GFP_KERNEL);
 		u32 command[2];
 		u32 offset[2];
 		u32 crc[4];
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH RFC 3/5] printk: modify printk interface for syslog_namespace
From: Libo Chen @ 2012-11-27 13:25 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman
In-Reply-To: <20121125042802.GB4523-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>

From: Libo Chen <clbchenlibo.chen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

On 2012-11-25 12:28, Serge E. Hallyn wrote:
> Quoting Libo Chen (chenlibo.3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
>> On 2012/11/22 1:49, Serge E. Hallyn wrote:
>>
>>> I notice that you haven't made any changes to the struct cont.  I
>>> suspect this means that to-be-continued msgs from one ns can be
>>> erroneously mixed with another ns.
>>>
>> Yes, I confirmed this problem. There will be erroneously mixed with another ns.
>> Thank you very much.
>>
>>> You said you don't mind putting the syslogns into the userns.  If
>>> there's no reason not to do that, then we should do so as it will
>>> remove a bunch of code (plus the use of a new CLONE flag) from your
>>> patch, and the new syslog(NEW_NS) command from mine.
>>>
>> I agree with you, both are removable.
>>
>>> Now IMO the ideal place for syslog_ns would be in the devices ns,
>>> but that does not yet exist, and may never.  The bonus to that would
>>> be that the consoles sort of belong there.  I avoid this by not
>>> having consoles in child syslog namespaces.  You put the console in
>>> the ns.  I haven't looked closely enough to see if what you do is
>>> ok (will do so soon).
>>>
>>> WOuld you mind looking through my patch to see if it suffices for
>>> your needs?  Where it does not, patches would be greatly appreciated
>>> if simple enough.
>>
>> follow your patch, I can see inject message by "dmesg call" in container, is right?
> 
> If I understand you right, yes.
> 
>> I am worry that I debug  or see messages from serial ports console in some embedded system,
>> since console belongs to init_syslog,  so the message in container can`t be printed. 
> 
> Sorry, I don't understand which way you're going with that.  Could you
> rephrase?  You want to prevent console messages from going to a
> container?  (That should definately not happen)  Or something else?
> 

I reviewed your patch, and found that console could only print messages
belonging to init_syslog.

So the message belongs to container syslog can not be printed from console,
but only "dmesg call" in user space.  Is that right?

For example, the messages can not be outputed automatically from serial port
as a kind of consoles on some embedded system.

And I am not sure if there are no other problems.

thanks!


>>> Note I'm not at all wedded to my patchset.  I'm happy to go with
>>> something else entirely.  My set was just a proof of concept.
> 
> thanks,
> -serge
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
> 
> 

^ permalink raw reply

* Re: [PATCH v3 8/7] pppoatm: fix missing wakeup in pppoatm_send()
From: David Woodhouse @ 2012-11-27 13:27 UTC (permalink / raw)
  To: Chas Williams (CONTRACTOR); +Cc: Krzysztof Mazur, netdev, linux-kernel, davem
In-Reply-To: <201211112257.qABMvhP4021769@thirdoffive.cmf.nrl.navy.mil>

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On Sun, 2012-11-11 at 17:57 -0500, Chas Williams (CONTRACTOR) wrote:
> In message <1352667081.9449.135.camel@shinybook.infradead.org>,David Woodhouse writes:
> >Acked-by: David Woodhouse <David.Woodhouse@intel.com> for your new
> >version of patch #6 (returning DROP_PACKET for !VF_READY), and your
> >followup to my patch #8, adding the 'need_wakeup' flag. Which we might
> >as well merge into (the pppoatm part of) my patch.
> >
> >Chas, are you happy with the generic ATM part of that? And the
> >nomenclature? I didn't want to call it 'release_cb' like the core socket
> >code does, because we use 'release' to mean something different in ATM.
> >So I called it 'unlock_cb' instead...
> 
> i really would prefer not to use a strange name since it might confuse
> larger group of people who are more familiar with the traditional meaning
> of this function.  vcc_release() isnt exported so we could rename it if
> things get too confusing.
> 
> i have to look at this a bit more but we might be able to use release_cb
> to get rid of the null push to detach the underlying protocol.  that would
> be somewhat nice.

In the meantime, should I resend this patch with the name 'release_cb'
instead of 'unlock_cb'? I'll just put a comment in to make sure it isn't
confused with vcc_release(), and if we need to change vcc_release()
later we can.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Ethernet deferred 'end of transmit' processing.
From: David Laight @ 2012-11-27 13:28 UTC (permalink / raw)
  To: netdev

Eric and I have just had a private discussion about deferring
(or not) the ethernet 'end of tx' processing.
Below is Eric's last email.

> > Subject: RE: performance regression on HiperSockets depending on MTU size
> > 
> > On Mon, 2012-11-26 at 16:38 +0000, David Laight wrote:
> > > > For example, I had to change mlx4 driver for the same problem : Make
> > > > sure a TX packet can be "TX completed" in a short amount of time.
> > >
> > > I'm intrigued that Linux is going that way.
> > > It (effectively) requires the hardware generate an interrupt
> > > for every transmit packet in order to get high throughput.
> > >
> > > I remember carefully designing ethernet drivers to avoid
> > > taking 'tx done' interrupts unless absolutely necessary
> > > in order to reduce system interrupt load.
> > > Some modern hardware probably allows finer control of 'tx done'
> > > interrupts, but it won't be universal.
> > >
> > > I realise that hardware TX segmentation offload can cause a
> > > single tx ring entry to take a significant amount of time to
> > > transmit - so allowing a lot of packets to sit in the tx
> > > ring causes latency issues.
> > >
> > > But there has to be a better solution than requiring every
> > > tx to complete very quickly - especially if the tx flow
> > > is actually a lot of small packets.
> > >
> > > 	David
> > >
> > 
> > 20 years ago, interrupts were expensive so you had to batch packets.
> > 
> > In 2012, we want low latencies, because hardware is fast and is able to
> > cope with the requirement.
> > 
> > Instead of one cpu, we now have 24 cpus or more per host.
> > 
> > And if there is enough load, NAPI will really avoid interrupts, and you
> > get full batch advantages (lowering the number of cpu cycles per packet)
>
> AFAICT some of the stuff being done to get 10G+ speeds is
> actually similar to what I was doing trying to saturate
> 10M ethernet. Network speeds have increased by a factor
> of (about) 800, cpu clock speeds only by 100 or so
> (we were doing quad cpu sparc systems with quite slow
> cache coherency operations).
> Somewhere in the last 20 years a lot of code has got very lazy!
> 
> Using 'source allocated' byte counts for flow control
> (which is what I presume the socket send code does) so that
> each socket has a limited amount of live data in the kernel
> and can't allocate a new buffer (skb) until the amount of
> kernel memory allocated to the 'live' buffers decreases
> (ie a transmit completes) certainly works a lot better
> that the target queue size flow control attempted by SYSV
> STREAMS (which doesn't work very well at all!).
> 
> What might work is to allow the ethernet driver to reassign
> some bytes of the SKB from the socket (or other source) to
> the transmit interface - then it need not request end of tx
> immediately for those bytes.

It seems you understood how it currently works.

> The amount it could take can be quite small - possibly one
> or two maximal sized ring entries, or (say) 100us of network
> time.
> 
> With multiple flows this will make little difference to the
> size of the burst that each socket gets to add into the
> interfaces tx queue (unlike increasing the socket tx buffer).
> But with a single flow it will let the socket get the next
> tx data queued even if the tx interrupts are deferred.
> 
> The only time it doesn't help is when the next transmit
> can't be done until the reference count on the skb decreases.
> (We had some NFS code like that!)

If you read the code, you'll see current implementation is able to keep
a 20Gbe link busy with a single tcp flow, with 2 TSO packets posted on
the device.

A TSO packet is about 545040 bits on wire, or 27 us.

That's 36694 interrupts per second. Even my phone is able to sustain this
rate of interrupts.

But if the device holds the TX completion interrupt for 100 us,
performance of a single TCP flow is hurt. I don't think it's hard to
understand.

mlx4 driver handles 40Gbe links, 13 us is the needed value, not 100 us.

Please post these mails to netdev, there is no secret to protect.




^ permalink raw reply

* Re: BQL support in gianfar causes network hickup
From: Eric Dumazet @ 2012-11-27 13:32 UTC (permalink / raw)
  To: Keitel, Tino (ALC NetworX GmbH)
  Cc: Tino Keitel, Paul Gortmaker, netdev@vger.kernel.org
In-Reply-To: <9AA65D849A88EB44B5D9B6A8BA098E23040A60D6EE71@Exchange1.lawo.de>

On Tue, 2012-11-27 at 13:42 +0100, Keitel, Tino (ALC NetworX GmbH)
wrote:
> On Di, 2012-11-27 at 04:36 -0800, Eric Dumazet wrote:
> > 
> > Can you reproduce the problem using a single cpu ?
> 
> Yes, it is a single-CPU system.

Can you reproduce the problem without PTP running, or disabled in the
driver ?

(comment the "priv->hwts_tx_en = 1;" line)


This looks like we miss an interrupt ( or TXBD_INTERRUPT not correctly
set)

And it could be a bug occurring if we try to send one skb with fragments
and skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP 

^ permalink raw reply

* Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line
From: Ling Ma @ 2012-11-27 13:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev
In-Reply-To: <1353912241.30446.1257.camel@edumazet-glaptop>

> networking patches should be sent to netdev.
>
> (I understand this patch is more a generic one, but at least CC netdev)
Ling: OK, this is my first inet patch, I will send to netdev later.

> You give no performance numbers for this change...
Ling: after I get machine, I will send out test result.

> I never heard of this CWF/ER, where are the official Intel documents
> about this, and what models really benefit from it ?
Ling:
Arm implemented it.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388f/Caccifbd.html
AMD also used it.
http://classes.soe.ucsc.edu/cmpe202/Fall04/papers/opteron.pdf

> Also, why not moving skc_net as well ?
>
> BTW, skc_daddr & skc_rcv_saddr are 'critical' as well, we use them in
> INET_MATCH()
Ling: in the looking-up routine, hash value is the most important key,
if it is matched,  the other values have most possibility to be
satisfied, and CFW is limited by memory bandwidth(64bit usually), so
we only move hash value as critical first word.

Thanks
Ling

^ permalink raw reply

* Re: BQL support in gianfar causes network hickup
From: Eric Dumazet @ 2012-11-27 13:49 UTC (permalink / raw)
  To: Keitel, Tino (ALC NetworX GmbH)
  Cc: Tino Keitel, Paul Gortmaker, netdev@vger.kernel.org
In-Reply-To: <1354023162.7553.1708.camel@edumazet-glaptop>

On Tue, 2012-11-27 at 05:32 -0800, Eric Dumazet wrote:

> Can you reproduce the problem without PTP running, or disabled in the
> driver ?
> 
> (comment the "priv->hwts_tx_en = 1;" line)
> 
> 
> This looks like we miss an interrupt ( or TXBD_INTERRUPT not correctly
> set)
> 
> And it could be a bug occurring if we try to send one skb with fragments
> and skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP 
> 
> 

By the way are any errata flagged in gfar_detect_errata() ?

^ permalink raw reply

* Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line
From: Eric Dumazet @ 2012-11-27 13:58 UTC (permalink / raw)
  To: Ling Ma; +Cc: linux-kernel, netdev
In-Reply-To: <CAOGi=dPQWC8hgt4jhMEHcVPb6j+jMTguNAchiLjdvvHjarCW4Q@mail.gmail.com>

On Tue, 2012-11-27 at 21:48 +0800, Ling Ma wrote:

> Ling: in the looking-up routine, hash value is the most important key,
> if it is matched,  the other values have most possibility to be
> satisfied, and CFW is limited by memory bandwidth(64bit usually), so
> we only move hash value as critical first word.

In practice, we have at most one TCP socket per hash slot.
99.9999 % of lookups need all fields to complete.

Your patch introduces a misalignment error. I am not sure all 64 bit
arches are able to cope with that gracefully.

It seems all CWF docs I could find are very old stuff, mostly academic,
without good performance data.

I was asking for up2date statements from Intel/AMD/... about current
cpus and current memory. Because optimizing for 10 years olds cpus is
not worth the pain.

I am assuming cpus are implementing the CWF/ER automatically, and that
only prefetches could have a slight disadvantage if the needed word is
not the first word in the cache line. Its not clear why the prefetch()
hint could not also use CWF. It seems it also could be done by the
hardware.

So before random patches in linux kernel adding their possible bugs, we
need a good study.

Thanks

^ permalink raw reply

* Re: [PATCH RFC 3/5] printk: modify printk interface for syslog_namespace
From: Serge Hallyn @ 2012-11-27 13:58 UTC (permalink / raw)
  To: Libo Chen
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman
In-Reply-To: <50B4BF64.6010707-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Quoting Libo Chen (chenlibo.3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> From: Libo Chen <clbchenlibo.chen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> On 2012-11-25 12:28, Serge E. Hallyn wrote:
> > Quoting Libo Chen (chenlibo.3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> >> On 2012/11/22 1:49, Serge E. Hallyn wrote:
> >>
> >>> I notice that you haven't made any changes to the struct cont.  I
> >>> suspect this means that to-be-continued msgs from one ns can be
> >>> erroneously mixed with another ns.
> >>>
> >> Yes, I confirmed this problem. There will be erroneously mixed with another ns.
> >> Thank you very much.
> >>
> >>> You said you don't mind putting the syslogns into the userns.  If
> >>> there's no reason not to do that, then we should do so as it will
> >>> remove a bunch of code (plus the use of a new CLONE flag) from your
> >>> patch, and the new syslog(NEW_NS) command from mine.
> >>>
> >> I agree with you, both are removable.
> >>
> >>> Now IMO the ideal place for syslog_ns would be in the devices ns,
> >>> but that does not yet exist, and may never.  The bonus to that would
> >>> be that the consoles sort of belong there.  I avoid this by not
> >>> having consoles in child syslog namespaces.  You put the console in
> >>> the ns.  I haven't looked closely enough to see if what you do is
> >>> ok (will do so soon).
> >>>
> >>> WOuld you mind looking through my patch to see if it suffices for
> >>> your needs?  Where it does not, patches would be greatly appreciated
> >>> if simple enough.
> >>
> >> follow your patch, I can see inject message by "dmesg call" in container, is right?
> > 
> > If I understand you right, yes.
> > 
> >> I am worry that I debug  or see messages from serial ports console in some embedded system,
> >> since console belongs to init_syslog,  so the message in container can`t be printed. 
> > 
> > Sorry, I don't understand which way you're going with that.  Could you
> > rephrase?  You want to prevent console messages from going to a
> > container?  (That should definately not happen)  Or something else?
> > 
> 
> I reviewed your patch, and found that console could only print messages
> belonging to init_syslog.
> 
> So the message belongs to container syslog can not be printed from console,
> but only "dmesg call" in user space.  Is that right?
> 
> For example, the messages can not be outputed automatically from serial port
> as a kind of consoles on some embedded system.

Oh, I see.  I basically thought this was a feature, not a problem :)  But
that wasn't meant to be a core part of my patchset, rather I wasn't quite
sure how best to handle it, so I put it off for later.  My main concern is
that if consoles in containers are supported, this must NOT lead to
kernel module loading from the container.

> And I am not sure if there are no other problems.

Ok, I will write a new patch which would (a) try to address the consoles,
(b) move the syslogns into the user_ns (making it no longer a syslog_ns),
and (c) adding some users of ns_printk (borrowing the ones from your
set for starters).

thanks,
-serge

^ permalink raw reply

* [PATCH] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails
From: Tommi Rantala @ 2012-11-27 14:01 UTC (permalink / raw)
  To: linux-sctp, netdev
  Cc: Neil Horman, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
	Dave Jones, Tommi Rantala
In-Reply-To: <20121126.173429.323283427379416132.davem@davemloft.net>

Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:

 #include <string.h>
 #include <arpa/inet.h>
 #include <sys/socket.h>

 int main(void)
 {
         int fd;
         struct sockaddr_in sa;

         fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
         if (fd < 0)
                 return 1;

         memset(&sa, 0, sizeof(sa));
         sa.sin_family = AF_INET;
         sa.sin_addr.s_addr = inet_addr("127.0.0.1");
         sa.sin_port = htons(11111);

         sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

         return 0;
 }

As far as I can tell, the leak has been around since ~2003.

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
---
 net/sctp/chunk.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 7c2df9c..f2aebdb 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -284,7 +284,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 			goto errout;
 		err = sctp_user_addto_chunk(chunk, offset, len, msgh->msg_iov);
 		if (err < 0)
-			goto errout;
+			goto errout_chunk_free;
 
 		offset += len;
 
@@ -324,7 +324,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 		__skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr
 			   - (__u8 *)chunk->skb->data);
 		if (err < 0)
-			goto errout;
+			goto errout_chunk_free;
 
 		sctp_datamsg_assign(msg, chunk);
 		list_add_tail(&chunk->frag_list, &msg->chunks);
@@ -332,6 +332,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 
 	return msg;
 
+errout_chunk_free:
+	sctp_chunk_free(chunk);
+
 errout:
 	list_for_each_safe(pos, temp, &msg->chunks) {
 		list_del_init(pos);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 0/2] smsc75xx enhancements
From: Steve Glendinning @ 2012-11-27 14:28 UTC (permalink / raw)
  To: netdev; +Cc: Steve Glendinning

This patchset implements wake on PHY (link up or link down) for
smsc75xx, please consider for net-next.

Steve Glendinning (2):
  smsc75xx: refactor entering suspend modes
  smsc75xx: support PHY wakeup source

 drivers/net/usb/smsc75xx.c |  224 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 186 insertions(+), 38 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [PATCH 1/2] smsc75xx: refactor entering suspend modes
From: Steve Glendinning @ 2012-11-27 14:28 UTC (permalink / raw)
  To: netdev; +Cc: Steve Glendinning
In-Reply-To: <1354026482-10443-1-git-send-email-steve.glendinning@shawell.net>

This patch splits out the logic for entering suspend modes
to separate functions, to reduce the complexity of the
smsc75xx_suspend function.

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
 drivers/net/usb/smsc75xx.c |   62 +++++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 24 deletions(-)

diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index 953c4f4..4655c01 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -1213,6 +1213,42 @@ static int smsc75xx_write_wuff(struct usbnet *dev, int filter, u32 wuf_cfg,
 	return 0;
 }
 
+static int smsc75xx_enter_suspend0(struct usbnet *dev)
+{
+	u32 val;
+	int ret;
+
+	ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+	check_warn_return(ret, "Error reading PMT_CTL\n");
+
+	val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
+	val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
+
+	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+	check_warn_return(ret, "Error writing PMT_CTL\n");
+
+	smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
+
+	return 0;
+}
+
+static int smsc75xx_enter_suspend2(struct usbnet *dev)
+{
+	u32 val;
+	int ret;
+
+	ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+	check_warn_return(ret, "Error reading PMT_CTL\n");
+
+	val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
+	val |= PMT_CTL_SUS_MODE_2;
+
+	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+	check_warn_return(ret, "Error writing PMT_CTL\n");
+
+	return 0;
+}
+
 static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
 {
 	struct usbnet *dev = usb_get_intfdata(intf);
@@ -1244,17 +1280,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
 		ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
 		check_warn_return(ret, "Error writing PMT_CTL\n");
 
-		/* enter suspend2 mode */
-		ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
-		check_warn_return(ret, "Error reading PMT_CTL\n");
-
-		val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
-		val |= PMT_CTL_SUS_MODE_2;
-
-		ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
-		check_warn_return(ret, "Error writing PMT_CTL\n");
-
-		return 0;
+		return smsc75xx_enter_suspend2(dev);
 	}
 
 	if (pdata->wolopts & (WAKE_MCAST | WAKE_ARP)) {
@@ -1368,19 +1394,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
 
 	/* some wol options are enabled, so enter SUSPEND0 */
 	netdev_info(dev->net, "entering SUSPEND0 mode\n");
-
-	ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
-	check_warn_return(ret, "Error reading PMT_CTL\n");
-
-	val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
-	val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
-
-	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
-	check_warn_return(ret, "Error writing PMT_CTL\n");
-
-	smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
-
-	return 0;
+	return smsc75xx_enter_suspend0(dev);
 }
 
 static int smsc75xx_resume(struct usb_interface *intf)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/2] smsc75xx: support PHY wakeup source
From: Steve Glendinning @ 2012-11-27 14:28 UTC (permalink / raw)
  To: netdev; +Cc: Steve Glendinning
In-Reply-To: <1354026482-10443-1-git-send-email-steve.glendinning@shawell.net>

This patch enables LAN7500 family devices to wake from suspend
on either link up or link down events.

It also adds _nopm versions of mdio access functions, so we can
safely call them from suspend and resume functions

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
---
 drivers/net/usb/smsc75xx.c |  168 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 151 insertions(+), 17 deletions(-)

diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index 4655c01..8f92d81 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -54,7 +54,7 @@
 #define USB_PRODUCT_ID_LAN7500		(0x7500)
 #define USB_PRODUCT_ID_LAN7505		(0x7505)
 #define RXW_PADDING			2
-#define SUPPORTED_WAKE			(WAKE_UCAST | WAKE_BCAST | \
+#define SUPPORTED_WAKE			(WAKE_PHY | WAKE_UCAST | WAKE_BCAST | \
 					 WAKE_MCAST | WAKE_ARP | WAKE_MAGIC)
 
 #define check_warn(ret, fmt, args...) \
@@ -185,14 +185,15 @@ static int smsc75xx_clear_feature(struct usbnet *dev, u32 feature)
 
 /* Loop until the read is completed with timeout
  * called with phy_mutex held */
-static int smsc75xx_phy_wait_not_busy(struct usbnet *dev)
+static __must_check int __smsc75xx_phy_wait_not_busy(struct usbnet *dev,
+						     int in_pm)
 {
 	unsigned long start_time = jiffies;
 	u32 val;
 	int ret;
 
 	do {
-		ret = smsc75xx_read_reg(dev, MII_ACCESS, &val);
+		ret = __smsc75xx_read_reg(dev, MII_ACCESS, &val, in_pm);
 		check_warn_return(ret, "Error reading MII_ACCESS\n");
 
 		if (!(val & MII_ACCESS_BUSY))
@@ -202,7 +203,8 @@ static int smsc75xx_phy_wait_not_busy(struct usbnet *dev)
 	return -EIO;
 }
 
-static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
+static int __smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx,
+				int in_pm)
 {
 	struct usbnet *dev = netdev_priv(netdev);
 	u32 val, addr;
@@ -211,7 +213,7 @@ static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
 	mutex_lock(&dev->phy_mutex);
 
 	/* confirm MII not busy */
-	ret = smsc75xx_phy_wait_not_busy(dev);
+	ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
 	check_warn_goto_done(ret, "MII is busy in smsc75xx_mdio_read\n");
 
 	/* set the address, index & direction (read from PHY) */
@@ -220,13 +222,13 @@ static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
 	addr = ((phy_id << MII_ACCESS_PHY_ADDR_SHIFT) & MII_ACCESS_PHY_ADDR)
 		| ((idx << MII_ACCESS_REG_ADDR_SHIFT) & MII_ACCESS_REG_ADDR)
 		| MII_ACCESS_READ | MII_ACCESS_BUSY;
-	ret = smsc75xx_write_reg(dev, MII_ACCESS, addr);
+	ret = __smsc75xx_write_reg(dev, MII_ACCESS, addr, in_pm);
 	check_warn_goto_done(ret, "Error writing MII_ACCESS\n");
 
-	ret = smsc75xx_phy_wait_not_busy(dev);
+	ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
 	check_warn_goto_done(ret, "Timed out reading MII reg %02X\n", idx);
 
-	ret = smsc75xx_read_reg(dev, MII_DATA, &val);
+	ret = __smsc75xx_read_reg(dev, MII_DATA, &val, in_pm);
 	check_warn_goto_done(ret, "Error reading MII_DATA\n");
 
 	ret = (u16)(val & 0xFFFF);
@@ -236,8 +238,8 @@ done:
 	return ret;
 }
 
-static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
-				int regval)
+static void __smsc75xx_mdio_write(struct net_device *netdev, int phy_id,
+				  int idx, int regval, int in_pm)
 {
 	struct usbnet *dev = netdev_priv(netdev);
 	u32 val, addr;
@@ -246,11 +248,11 @@ static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
 	mutex_lock(&dev->phy_mutex);
 
 	/* confirm MII not busy */
-	ret = smsc75xx_phy_wait_not_busy(dev);
+	ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
 	check_warn_goto_done(ret, "MII is busy in smsc75xx_mdio_write\n");
 
 	val = regval;
-	ret = smsc75xx_write_reg(dev, MII_DATA, val);
+	ret = __smsc75xx_write_reg(dev, MII_DATA, val, in_pm);
 	check_warn_goto_done(ret, "Error writing MII_DATA\n");
 
 	/* set the address, index & direction (write to PHY) */
@@ -259,16 +261,39 @@ static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
 	addr = ((phy_id << MII_ACCESS_PHY_ADDR_SHIFT) & MII_ACCESS_PHY_ADDR)
 		| ((idx << MII_ACCESS_REG_ADDR_SHIFT) & MII_ACCESS_REG_ADDR)
 		| MII_ACCESS_WRITE | MII_ACCESS_BUSY;
-	ret = smsc75xx_write_reg(dev, MII_ACCESS, addr);
+	ret = __smsc75xx_write_reg(dev, MII_ACCESS, addr, in_pm);
 	check_warn_goto_done(ret, "Error writing MII_ACCESS\n");
 
-	ret = smsc75xx_phy_wait_not_busy(dev);
+	ret = __smsc75xx_phy_wait_not_busy(dev, in_pm);
 	check_warn_goto_done(ret, "Timed out writing MII reg %02X\n", idx);
 
 done:
 	mutex_unlock(&dev->phy_mutex);
 }
 
+static int smsc75xx_mdio_read_nopm(struct net_device *netdev, int phy_id,
+				   int idx)
+{
+	return __smsc75xx_mdio_read(netdev, phy_id, idx, 1);
+}
+
+static void smsc75xx_mdio_write_nopm(struct net_device *netdev, int phy_id,
+				     int idx, int regval)
+{
+	__smsc75xx_mdio_write(netdev, phy_id, idx, regval, 1);
+}
+
+static int smsc75xx_mdio_read(struct net_device *netdev, int phy_id, int idx)
+{
+	return __smsc75xx_mdio_read(netdev, phy_id, idx, 0);
+}
+
+static void smsc75xx_mdio_write(struct net_device *netdev, int phy_id, int idx,
+				int regval)
+{
+	__smsc75xx_mdio_write(netdev, phy_id, idx, regval, 0);
+}
+
 static int smsc75xx_wait_eeprom(struct usbnet *dev)
 {
 	unsigned long start_time = jiffies;
@@ -1232,6 +1257,32 @@ static int smsc75xx_enter_suspend0(struct usbnet *dev)
 	return 0;
 }
 
+static int smsc75xx_enter_suspend1(struct usbnet *dev)
+{
+	u32 val;
+	int ret;
+
+	ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+	check_warn_return(ret, "Error reading PMT_CTL");
+
+	val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
+	val |= PMT_CTL_SUS_MODE_1;
+
+	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+	check_warn_return(ret, "Error writing PMT_CTL");
+
+	/* clear wol status, enable energy detection */
+	val &= ~PMT_CTL_WUPS;
+	val |= (PMT_CTL_WUPS_ED | PMT_CTL_ED_EN);
+
+	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+	check_warn_return(ret, "Error writing PMT_CTL");
+
+	smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
+
+	return 0;
+}
+
 static int smsc75xx_enter_suspend2(struct usbnet *dev)
 {
 	u32 val;
@@ -1249,18 +1300,61 @@ static int smsc75xx_enter_suspend2(struct usbnet *dev)
 	return 0;
 }
 
+static int smsc75xx_enable_phy_wakeup_interrupts(struct usbnet *dev, u16 mask)
+{
+	struct mii_if_info *mii = &dev->mii;
+	int ret;
+
+	netdev_dbg(dev->net, "enabling PHY wakeup interrupts");
+
+	/* read to clear */
+	ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, PHY_INT_SRC);
+	check_warn_return(ret, "Error reading PHY_INT_SRC");
+
+	/* enable interrupt source */
+	ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, PHY_INT_MASK);
+	check_warn_return(ret, "Error reading PHY_INT_MASK");
+
+	ret |= mask;
+
+	smsc75xx_mdio_write_nopm(dev->net, mii->phy_id, PHY_INT_MASK, ret);
+
+	return 0;
+}
+
+static int smsc75xx_link_ok_nopm(struct usbnet *dev)
+{
+	struct mii_if_info *mii = &dev->mii;
+	int ret;
+
+	/* first, a dummy read, needed to latch some MII phys */
+	ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, MII_BMSR);
+	check_warn_return(ret, "Error reading MII_BMSR");
+
+	ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id, MII_BMSR);
+	check_warn_return(ret, "Error reading MII_BMSR");
+
+	return !!(ret & BMSR_LSTATUS);
+}
+
 static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
 {
 	struct usbnet *dev = usb_get_intfdata(intf);
 	struct smsc75xx_priv *pdata = (struct smsc75xx_priv *)(dev->data[0]);
+	u32 val, link_up;
 	int ret;
-	u32 val;
 
 	ret = usbnet_suspend(intf, message);
 	check_warn_return(ret, "usbnet_suspend error\n");
 
-	/* if no wol options set, enter lowest power SUSPEND2 mode */
-	if (!(pdata->wolopts & SUPPORTED_WAKE)) {
+	/* determine if link is up using only _nopm functions */
+	link_up = smsc75xx_link_ok_nopm(dev);
+
+	/* if no wol options set, or if link is down and we're not waking on
+	 * PHY activity, enter lowest power SUSPEND2 mode
+	 */
+	if (!(pdata->wolopts & SUPPORTED_WAKE) ||
+		!(link_up || (pdata->wolopts & WAKE_PHY))) {
 		netdev_info(dev->net, "entering SUSPEND2 mode\n");
 
 		/* disable energy detect (link up) & wake up events */
@@ -1283,6 +1377,33 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
 		return smsc75xx_enter_suspend2(dev);
 	}
 
+	if (pdata->wolopts & WAKE_PHY) {
+		ret = smsc75xx_enable_phy_wakeup_interrupts(dev,
+			(PHY_INT_MASK_ANEG_COMP | PHY_INT_MASK_LINK_DOWN));
+		check_warn_return(ret, "error enabling PHY wakeup ints");
+
+		/* if link is down then configure EDPD and enter SUSPEND1,
+		 * otherwise enter SUSPEND0 below
+		 */
+		if (!link_up) {
+			struct mii_if_info *mii = &dev->mii;
+			netdev_info(dev->net, "entering SUSPEND1 mode");
+
+			/* enable energy detect power-down mode */
+			ret = smsc75xx_mdio_read_nopm(dev->net, mii->phy_id,
+				PHY_MODE_CTRL_STS);
+			check_warn_return(ret, "Error reading PHY_MODE_CTRL_STS");
+
+			ret |= MODE_CTRL_STS_EDPWRDOWN;
+
+			smsc75xx_mdio_write_nopm(dev->net, mii->phy_id,
+				PHY_MODE_CTRL_STS, ret);
+
+			/* enter SUSPEND1 mode */
+			return smsc75xx_enter_suspend1(dev);
+		}
+	}
+
 	if (pdata->wolopts & (WAKE_MCAST | WAKE_ARP)) {
 		int i, filter = 0;
 
@@ -1349,6 +1470,19 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
 	ret = smsc75xx_write_reg_nopm(dev, WUCSR, val);
 	check_warn_return(ret, "Error writing WUCSR\n");
 
+	if (pdata->wolopts & WAKE_PHY) {
+		netdev_info(dev->net, "enabling PHY wakeup\n");
+		ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
+		check_warn_return(ret, "Error reading PMT_CTL");
+
+		/* clear wol status, enable energy detection */
+		val &= ~PMT_CTL_WUPS;
+		val |= (PMT_CTL_WUPS_ED | PMT_CTL_ED_EN);
+
+		ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
+		check_warn_return(ret, "Error writing PMT_CTL");
+	}
+
 	if (pdata->wolopts & WAKE_MAGIC) {
 		netdev_info(dev->net, "enabling magic packet wakeup\n");
 		ret = smsc75xx_read_reg_nopm(dev, WUCSR, &val);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] smsc95xx: fix suspend buffer overflow
From: Bjørn Mork @ 2012-11-27 14:34 UTC (permalink / raw)
  To: Steve Glendinning; +Cc: netdev, dan.carpenter
In-Reply-To: <1354022623-7317-1-git-send-email-steve.glendinning@shawell.net>

Steve Glendinning <steve.glendinning@shawell.net> writes:

> This patch fixes a buffer overflow introduced by bbd9f9e, where
> the filter_mask array is accessed beyond its bounds.
>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
> ---
>  drivers/net/usb/smsc95xx.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index 79d495d..6cdc504 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1281,7 +1281,7 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
>  	}
>  
>  	if (pdata->wolopts & (WAKE_BCAST | WAKE_MCAST | WAKE_ARP | WAKE_UCAST)) {
> -		u32 *filter_mask = kzalloc(32, GFP_KERNEL);
> +		u32 *filter_mask = kzalloc(sizeof(u32) * 32, GFP_KERNEL);
>  		u32 command[2];
>  		u32 offset[2];
>  		u32 crc[4];

I wonder... all these magic constants (32, 2, 2, 4) obviously relate to
the maximum number of supported filters (8).  It would be much easier to
avoid such bugs if the code documented this.  Like

		u32 *filter_mask = kzalloc(4 * sizeof(u32) * N, GFP_KERNEL);
 		u32 command[N/4];
  		u32 offset[N/4];
  		u32 crc[N/2];


And even better if you let the base types be "native" size so you could
avoid all the complicated indexing math:

		u8 *filter_mask = kzalloc(4 * sizeof(u32) * N, GFP_KERNEL);
 		u8 command[N];
  		u8 offset[N];
  		u16 crc[N];

Yes, you will then have to do type conversions when writing to the chip,
but I believe the overall code will be much easier to follow with this

			command[filter/4] |= 0x05UL << ((filter % 4) * 8);
			offset[filter/4] |= 0x00 << ((filter % 4) * 8);
			crc[filter/2] |= smsc_crc(bcast, 6, filter);

replaced by the IMHO more obvious

			command[filter] = 0x05UL;
			offset[filter] = 0x00;
			crc[filter] =  bitrev16(crc16(0xFFFF, bcast, 6));


BTW, the smsc_crc() function cannot work.  It returns a u16 which it
sometimes will attemt to shift 16 bits...

And you don't test the kzalloc() return value.

And if I am really going to be a nitpick (which comes naturally to me
:-), then I don't think you need to allocate anything at all.  You never
set more than a few bits in the first byte of the filter mask.  Why not
create a small helper function which writes a filter mask with these
bits and fill the rest with zeroes?

Something along the lines of

int write_filter(struct usbnet *dev, u8 firstbyte)
{
        int i, ret = 0;
        u32 v = (u32)firstbyte << 24;

        for (i = 0; i < 4 && !ret; i++) {
                ret = smsc95xx_write_reg_nopm(dev, WUFF, v);
                v = 0;
        }
        return ret;
}

    u8 filter_mask_byte[N];
..
    filter_mask_byte[filter] = 0x3F;
..
    for (i = 0; i < wuff_filter_count; i++) {
	ret = write_filter(dev, filter_mask_byte[i]);
	check_warn_return(ret, "Error writing WUFF\n");
    }
   





Bjørn

^ permalink raw reply

* Re: [PATCH 1/1] net: cpts: fix for build break after ARM SoC integration
From: Paul Walmsley @ 2012-11-27 14:42 UTC (permalink / raw)
  To: Mugunthan V N
  Cc: netdev, davem, linux-arm-kernel, linux-omap, b-cousson,
	richardcochran
In-Reply-To: <1354012034-31686-1-git-send-email-mugunthanvnm@ti.com>

On Tue, 27 Nov 2012, Mugunthan V N wrote:

>   CC      drivers/net/ethernet/ti/cpts.o
> drivers/net/ethernet/ti/cpts.c:30:24: fatal error: plat/clock.h: No such file or directory
> compilation terminated.
> make[4]: *** [drivers/net/ethernet/ti/cpts.o] Error 1
> make[3]: *** [drivers/net/ethernet/ti] Error 2
> make[2]: *** [drivers/net/ethernet] Error 2
> make[1]: *** [drivers/net] Error 2
> 
> fix for build break as the header file is removed from plat-omap as part of
> the below patch
> 
> commit a135eaae524acba1509a3b19c97fae556e4da7cd
> Author: Paul Walmsley <paul@pwsan.com>
> Date:   Thu Sep 27 10:33:34 2012 -0600
> 
>     ARM: OMAP: remove plat/clock.h
> 
>     Remove arch/arm/plat-omap/include/plat/clock.h by merging it into
>     arch/arm/mach-omap1/clock.h and arch/arm/mach-omap2/clock.h.
>     The goal here is to facilitate ARM single image kernels by removing
>     includes via the "plat/" symlink.
> 
> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>

Acked-by: Paul Walmsley <paul@pwsan.com>


- Paul

^ permalink raw reply

* [PATCH 1/2] net/davinci_emac: use devres APIs
From: Sekhar Nori @ 2012-11-27 14:47 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/

Use devres APIs where possible to simplify error handling
in driver probe.

While at it, also rename the goto targets in error path to
introduce some consistency in how they are named.

Signed-off-by: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
---
 drivers/net/ethernet/ti/davinci_emac.c |   46 +++++++++++---------------------
 1 file changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index fce89a0..7be04dc 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1865,21 +1865,18 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 
 
 	/* obtain emac clock from kernel */
-	emac_clk = clk_get(&pdev->dev, NULL);
+	emac_clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(emac_clk)) {
 		dev_err(&pdev->dev, "failed to get EMAC clock\n");
 		return -EBUSY;
 	}
 	emac_bus_frequency = clk_get_rate(emac_clk);
-	clk_put(emac_clk);
 
 	/* TODO: Probe PHY here if possible */
 
 	ndev = alloc_etherdev(sizeof(struct emac_priv));
-	if (!ndev) {
-		rc = -ENOMEM;
-		goto no_ndev;
-	}
+	if (!ndev)
+		return -ENOMEM;
 
 	platform_set_drvdata(pdev, ndev);
 	priv = netdev_priv(ndev);
@@ -1893,7 +1890,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	if (!pdata) {
 		dev_err(&pdev->dev, "no platform data\n");
 		rc = -ENODEV;
-		goto probe_quit;
+		goto no_pdata;
 	}
 
 	/* MAC addr and PHY mask , RMII enable info from platform_data */
@@ -1913,23 +1910,23 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	if (!res) {
 		dev_err(&pdev->dev,"error getting res\n");
 		rc = -ENOENT;
-		goto probe_quit;
+		goto no_pdata;
 	}
 
 	priv->emac_base_phys = res->start + pdata->ctrl_reg_offset;
 	size = resource_size(res);
-	if (!request_mem_region(res->start, size, ndev->name)) {
+	if (!devm_request_mem_region(&pdev->dev, res->start,
+				     size, ndev->name)) {
 		dev_err(&pdev->dev, "failed request_mem_region() for regs\n");
 		rc = -ENXIO;
-		goto probe_quit;
+		goto no_pdata;
 	}
 
-	priv->remap_addr = ioremap(res->start, size);
+	priv->remap_addr = devm_ioremap(&pdev->dev, res->start, size);
 	if (!priv->remap_addr) {
 		dev_err(&pdev->dev, "unable to map IO\n");
 		rc = -ENOMEM;
-		release_mem_region(res->start, size);
-		goto probe_quit;
+		goto no_pdata;
 	}
 	priv->emac_base = priv->remap_addr + pdata->ctrl_reg_offset;
 	ndev->base_addr = (unsigned long)priv->remap_addr;
@@ -1962,7 +1959,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	if (!priv->dma) {
 		dev_err(&pdev->dev, "error initializing DMA\n");
 		rc = -ENOMEM;
-		goto no_dma;
+		goto no_pdata;
 	}
 
 	priv->txchan = cpdma_chan_create(priv->dma, tx_chan_num(EMAC_DEF_TX_CH),
@@ -1971,14 +1968,14 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 				       emac_rx_handler);
 	if (WARN_ON(!priv->txchan || !priv->rxchan)) {
 		rc = -ENOMEM;
-		goto no_irq_res;
+		goto no_cpdma_chan;
 	}
 
 	res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
 	if (!res) {
 		dev_err(&pdev->dev, "error getting irq res\n");
 		rc = -ENOENT;
-		goto no_irq_res;
+		goto no_cpdma_chan;
 	}
 	ndev->irq = res->start;
 
@@ -2000,7 +1997,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	if (rc) {
 		dev_err(&pdev->dev, "error in register_netdev\n");
 		rc = -ENODEV;
-		goto no_irq_res;
+		goto no_cpdma_chan;
 	}
 
 
@@ -2015,20 +2012,14 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 
 	return 0;
 
-no_irq_res:
+no_cpdma_chan:
 	if (priv->txchan)
 		cpdma_chan_destroy(priv->txchan);
 	if (priv->rxchan)
 		cpdma_chan_destroy(priv->rxchan);
 	cpdma_ctlr_destroy(priv->dma);
-no_dma:
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	release_mem_region(res->start, resource_size(res));
-	iounmap(priv->remap_addr);
-
-probe_quit:
+no_pdata:
 	free_netdev(ndev);
-no_ndev:
 	return rc;
 }
 
@@ -2041,14 +2032,12 @@ no_ndev:
  */
 static int __devexit davinci_emac_remove(struct platform_device *pdev)
 {
-	struct resource *res;
 	struct net_device *ndev = platform_get_drvdata(pdev);
 	struct emac_priv *priv = netdev_priv(ndev);
 
 	dev_notice(&ndev->dev, "DaVinci EMAC: davinci_emac_remove()\n");
 
 	platform_set_drvdata(pdev, NULL);
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 
 	if (priv->txchan)
 		cpdma_chan_destroy(priv->txchan);
@@ -2056,10 +2045,7 @@ static int __devexit davinci_emac_remove(struct platform_device *pdev)
 		cpdma_chan_destroy(priv->rxchan);
 	cpdma_ctlr_destroy(priv->dma);
 
-	release_mem_region(res->start, resource_size(res));
-
 	unregister_netdev(ndev);
-	iounmap(priv->remap_addr);
 	free_netdev(ndev);
 
 	return 0;
-- 
1.7.10.1

^ permalink raw reply related

* [PATCH 2/2] net/davinci_emac: use clk_{prepare|unprepare}
From: Sekhar Nori @ 2012-11-27 14:47 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
	Mike Turquette
In-Reply-To: <1354027635-32627-1-git-send-email-nsekhar-l0cyMroinI0@public.gmane.org>

Use clk_prepare()/clk_unprepare() in the driver since common
clock framework needs these to be called before clock is enabled.

This is in preparation of common clock framework migration
for DaVinci.

Cc: Mike Turquette <mturquette-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Signed-off-by: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
---
 drivers/net/ethernet/ti/davinci_emac.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 7be04dc..e7b3b94 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -352,6 +352,7 @@ struct emac_priv {
 	/*platform specific members*/
 	void (*int_enable) (void);
 	void (*int_disable) (void);
+	struct clk *clk;
 };
 
 /* EMAC TX Host Error description strings */
@@ -1870,19 +1871,29 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 		dev_err(&pdev->dev, "failed to get EMAC clock\n");
 		return -EBUSY;
 	}
+
+	rc = clk_prepare(emac_clk);
+	if (rc) {
+		dev_err(&pdev->dev, "emac clock prepare failed.\n");
+		return rc;
+	}
+
 	emac_bus_frequency = clk_get_rate(emac_clk);
 
 	/* TODO: Probe PHY here if possible */
 
 	ndev = alloc_etherdev(sizeof(struct emac_priv));
-	if (!ndev)
-		return -ENOMEM;
+	if (!ndev) {
+		rc = -ENOMEM;
+		goto no_etherdev;
+	}
 
 	platform_set_drvdata(pdev, ndev);
 	priv = netdev_priv(ndev);
 	priv->pdev = pdev;
 	priv->ndev = ndev;
 	priv->msg_enable = netif_msg_init(debug_level, DAVINCI_EMAC_DEBUG);
+	priv->clk = emac_clk;
 
 	spin_lock_init(&priv->lock);
 
@@ -2020,6 +2031,8 @@ no_cpdma_chan:
 	cpdma_ctlr_destroy(priv->dma);
 no_pdata:
 	free_netdev(ndev);
+no_etherdev:
+	clk_unprepare(emac_clk);
 	return rc;
 }
 
@@ -2048,6 +2061,8 @@ static int __devexit davinci_emac_remove(struct platform_device *pdev)
 	unregister_netdev(ndev);
 	free_netdev(ndev);
 
+	clk_unprepare(priv->clk);
+
 	return 0;
 }
 
-- 
1.7.10.1

^ permalink raw reply related

* Re: [PATCH] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails
From: Vlad Yasevich @ 2012-11-27 14:49 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: linux-sctp, netdev, Neil Horman, Sridhar Samudrala,
	David S. Miller, Dave Jones
In-Reply-To: <1354024906-1925-1-git-send-email-tt.rantala@gmail.com>

On 11/27/2012 09:01 AM, Tommi Rantala wrote:
> Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
> reproducible e.g. with the sendto() syscall by passing invalid
> user space pointer in the second argument:
>
>   #include <string.h>
>   #include <arpa/inet.h>
>   #include <sys/socket.h>
>
>   int main(void)
>   {
>           int fd;
>           struct sockaddr_in sa;
>
>           fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
>           if (fd < 0)
>                   return 1;
>
>           memset(&sa, 0, sizeof(sa));
>           sa.sin_family = AF_INET;
>           sa.sin_addr.s_addr = inet_addr("127.0.0.1");
>           sa.sin_port = htons(11111);
>
>           sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
>
>           return 0;
>   }
>
> As far as I can tell, the leak has been around since ~2003.
>
> Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

> ---
>   net/sctp/chunk.c |    7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index 7c2df9c..f2aebdb 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -284,7 +284,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
>   			goto errout;
>   		err = sctp_user_addto_chunk(chunk, offset, len, msgh->msg_iov);
>   		if (err < 0)
> -			goto errout;
> +			goto errout_chunk_free;
>
>   		offset += len;
>
> @@ -324,7 +324,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
>   		__skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr
>   			   - (__u8 *)chunk->skb->data);
>   		if (err < 0)
> -			goto errout;
> +			goto errout_chunk_free;
>
>   		sctp_datamsg_assign(msg, chunk);
>   		list_add_tail(&chunk->frag_list, &msg->chunks);
> @@ -332,6 +332,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
>
>   	return msg;
>
> +errout_chunk_free:
> +	sctp_chunk_free(chunk);
> +
>   errout:
>   	list_for_each_safe(pos, temp, &msg->chunks) {
>   		list_del_init(pos);
>

^ permalink raw reply

* Re: [PATCH 1/2] smsc75xx: refactor entering suspend modes
From: Bjørn Mork @ 2012-11-27 14:50 UTC (permalink / raw)
  To: Steve Glendinning
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1354026482-10443-2-git-send-email-steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org>

I believe the drivers/net/usb patches should be CCed to linux-usb for
review, because they often touch USB specific things.  So I added that
CC and did not strip any of the quoted text.


Steve Glendinning <steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org> writes:

> This patch splits out the logic for entering suspend modes
> to separate functions, to reduce the complexity of the
> smsc75xx_suspend function.
>
> Signed-off-by: Steve Glendinning <steve.glendinning-nksJyM/082jR7s880joybQ@public.gmane.org>
> ---
>  drivers/net/usb/smsc75xx.c |   62 +++++++++++++++++++++++++++-----------------
>  1 file changed, 38 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
> index 953c4f4..4655c01 100644
> --- a/drivers/net/usb/smsc75xx.c
> +++ b/drivers/net/usb/smsc75xx.c
> @@ -1213,6 +1213,42 @@ static int smsc75xx_write_wuff(struct usbnet *dev, int filter, u32 wuf_cfg,
>  	return 0;
>  }
>  
> +static int smsc75xx_enter_suspend0(struct usbnet *dev)
> +{
> +	u32 val;
> +	int ret;
> +
> +	ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> +	check_warn_return(ret, "Error reading PMT_CTL\n");
> +
> +	val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
> +	val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
> +
> +	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> +	check_warn_return(ret, "Error writing PMT_CTL\n");
> +
> +	smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);

As mentioned in another comment to the smsc95xx driver: This is weird.
Do you really need to do that?

This is an USB interface driver.  The USB device is handled by the
generic "usb" driver, which will do the right thing.  See 
drivers/usb/generic.c and drivers/usb/core/hub.c


generic_suspend() calls usb_port_suspend() which does:

        /* enable remote wakeup when appropriate; this lets the device
         * wake up the upstream hub (including maybe the root hub).
         *
         * NOTE:  OTG devices may issue remote wakeup (or SRP) even when
         * we don't explicitly enable it here.
         */
        if (udev->do_remote_wakeup) {
                if (!hub_is_superspeed(hub->hdev)) {
                        status = usb_control_msg(udev, usb_sndctrlpipe(udev, 0),
                                        USB_REQ_SET_FEATURE, USB_RECIP_DEVICE,
                                        USB_DEVICE_REMOTE_WAKEUP, 0,
                                        NULL, 0,
                                        USB_CTRL_SET_TIMEOUT);
                } else {
                        /* Assume there's only one function on the USB 3.0
                         * device and enable remote wake for the first
                         * interface. FIXME if the interface association
                         * descriptor shows there's more than one function.
                         */
                        status = usb_control_msg(udev, usb_sndctrlpipe(udev, 0),
                                        USB_REQ_SET_FEATURE,
                                        USB_RECIP_INTERFACE,
                                        USB_INTRF_FUNC_SUSPEND,
                                        USB_INTRF_FUNC_SUSPEND_RW |
                                        USB_INTRF_FUNC_SUSPEND_LP,
                                        NULL, 0,
                                        USB_CTRL_SET_TIMEOUT);
                }




So you should not need to touch the USB device feature directly from your
interface driver.


> +
> +	return 0;
> +}
> +
> +static int smsc75xx_enter_suspend2(struct usbnet *dev)
> +{
> +	u32 val;
> +	int ret;
> +
> +	ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> +	check_warn_return(ret, "Error reading PMT_CTL\n");
> +
> +	val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
> +	val |= PMT_CTL_SUS_MODE_2;
> +
> +	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> +	check_warn_return(ret, "Error writing PMT_CTL\n");
> +
> +	return 0;
> +}
> +
>  static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
>  {
>  	struct usbnet *dev = usb_get_intfdata(intf);
> @@ -1244,17 +1280,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
>  		ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
>  		check_warn_return(ret, "Error writing PMT_CTL\n");
>  
> -		/* enter suspend2 mode */
> -		ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> -		check_warn_return(ret, "Error reading PMT_CTL\n");
> -
> -		val &= ~(PMT_CTL_SUS_MODE | PMT_CTL_WUPS | PMT_CTL_PHY_RST);
> -		val |= PMT_CTL_SUS_MODE_2;
> -
> -		ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> -		check_warn_return(ret, "Error writing PMT_CTL\n");
> -
> -		return 0;
> +		return smsc75xx_enter_suspend2(dev);
>  	}
>  
>  	if (pdata->wolopts & (WAKE_MCAST | WAKE_ARP)) {
> @@ -1368,19 +1394,7 @@ static int smsc75xx_suspend(struct usb_interface *intf, pm_message_t message)
>  
>  	/* some wol options are enabled, so enter SUSPEND0 */
>  	netdev_info(dev->net, "entering SUSPEND0 mode\n");
> -
> -	ret = smsc75xx_read_reg_nopm(dev, PMT_CTL, &val);
> -	check_warn_return(ret, "Error reading PMT_CTL\n");
> -
> -	val &= (~(PMT_CTL_SUS_MODE | PMT_CTL_PHY_RST));
> -	val |= PMT_CTL_SUS_MODE_0 | PMT_CTL_WOL_EN | PMT_CTL_WUPS;
> -
> -	ret = smsc75xx_write_reg_nopm(dev, PMT_CTL, val);
> -	check_warn_return(ret, "Error writing PMT_CTL\n");
> -
> -	smsc75xx_set_feature(dev, USB_DEVICE_REMOTE_WAKEUP);
> -
> -	return 0;
> +	return smsc75xx_enter_suspend0(dev);
>  }
>  
>  static int smsc75xx_resume(struct usb_interface *intf)
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC net-next PATCH V1 7/9] net: frag queue locking per hash bucket
From: Jesper Dangaard Brouer @ 2012-11-27 15:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Florian Westphal, netdev, Pablo Neira Ayuso,
	Thomas Graf, Cong Wang, Patrick McHardy, Paul E. McKenney,
	Herbert Xu
In-Reply-To: <20121123130836.18764.9297.stgit@dragon>

On Fri, 2012-11-23 at 14:08 +0100, Jesper Dangaard Brouer wrote:
> DO NOT apply - patch not finished, can cause on OOPS/PANIC during hash rebuild
> 
> This patch implements per hash bucket locking for the frag queue
> hash.  This removes two write locks, and the only remaining write
> lock is for protecting hash rebuild.  This essentially reduce the
> readers-writer lock to a rebuild lock.
> 
> NOT-Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Last bug mentioned, were not the only one... fixing hopefully the last bug in this patch.


> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 1620a21..447423f 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -35,20 +35,27 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
>  	unsigned long now = jiffies;
>  	int i;
>  
> +	/* Per bucket lock NOT needed here, due to write lock protection */
>  	write_lock(&f->lock);
> +
>  	get_random_bytes(&f->rnd, sizeof(u32));
>  	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
> +		struct inet_frag_bucket *hb;
>  		struct inet_frag_queue *q;
>  		struct hlist_node *p, *n;
>  
> -		hlist_for_each_entry_safe(q, p, n, &f->hash[i], list) {
> +		hb = &f->hash[i];
> +		hlist_for_each_entry_safe(q, p, n, &hb->chain, list) {
>  			unsigned int hval = f->hashfn(q);
>  
>  			if (hval != i) {
> +				struct inet_frag_bucket *hb_dest;
> +
>  				hlist_del(&q->list);
>  
>  				/* Relink to new hash chain. */
> -				hlist_add_head(&q->list, &f->hash[hval]);
> +				hb_dest = &f->hash[hval];
> +				hlist_add_head(&q->list, &hb->chain);

The above line were wrong, it should have been:
   hlist_add_head(&q->list, &hb_dest->chain);

>  			}
>  		}
>  	}

The patch seem quite stable now.  My test is to adjust to rebuild
interval to 2 sec and then run 4x 10G with two fragments (packet size
1472*2) to create as many fragments as possible (approx 300
inet_frag_queue elements).

30 min test run:
 3726+3896+3960+3608 = 15190 Mbit/s

(For reproducers, note, that changing ipfrag_secret_interval (e.g.
sysctl -w net/ipv4/ipfrag_secret_interval=2), first take effect after
the first interval/timer expires, which default is 10 min)


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox