Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Eric Dumazet @ 2012-05-22 17:24 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Tom Herbert, netdev
In-Reply-To: <eb8cdd693530010d6736baede0cfebd8@visp.net.lb>

On Tue, 2012-05-22 at 20:11 +0300, Denys Fedoryshchenko wrote:

> By the way, if BQL limit is going lower than MTU, is it considered as a 
> bug?
> If yes, i can try to upload 3.4 to some servers and add condition to 
> WARN_ON if limit < 1500.

There is no problem with BQL limit going lower than the max packet size.

(With TSO it can be 64K)

Remember BQL allows one packet to be sent to device, regardless of its
size.

Next packet might be blocked/stay in Qdisc

If your workload is mostly idle, but sending bursts of 3 packets, then
only one is immediately sent.

Next packets shall wait the TX completion of first packet.

^ permalink raw reply

* Re: [RFC] net: skb_head_is_locked() should use skb_header_cloned()
From: Alexander Duyck @ 2012-05-22 17:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1337666034.3361.50.camel@edumazet-glaptop>

On 05/21/2012 10:53 PM, Eric Dumazet wrote:
> Hi David and Alexander
>
> There is no hurry since net-next is closed, but I hit the following
> problem :
>
> When IPv6 conntracking is enabled, code from
> net/ipv6/netfilter/nf_conntrack_reasm.c does a cloning of all skbs to
> build a shadow.
>
> Then we run : (skb here is the head of the 'shadow skb' )
>
> void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
>                         struct net_device *in, struct net_device *out,
>                         int (*okfn)(struct sk_buff *))
> {
>         struct sk_buff *s, *s2;
>
>         for (s = NFCT_FRAG6_CB(skb)->orig; s;) {
>                 nf_conntrack_put_reasm(s->nfct_reasm);
>                 nf_conntrack_get_reasm(skb);
>                 s->nfct_reasm = skb;
>
>                 s2 = s->next;
>                 s->next = NULL;
>
>                 NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s, in, out, okfn,
>                                NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
>                 s = s2;
>         }
>         nf_conntrack_put_reasm(skb);
> }
>
> So when all original skbs are fed to real IPv6 reassembly code, their
> clones are still alive and we hit the condition in skb_try_coalesce() :
>
> if (skb_head_is_locked(from))
> 	return false;
>
> I was wondering if skb_head_is_locked() should be changed to :
>
> if (!skb->head_frag || skb_header_cloned(skb))
> 	return false;
>
> Then we could add skb_header_release() calls on the clones of course in
> net/ipv6/netfilter/nf_conntrack_reasm.c 
>
> Not-Yet-Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/skbuff.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0e50171..6509ee1 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2587,7 +2587,7 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size)
>   */
>  static inline bool skb_head_is_locked(const struct sk_buff *skb)
>  {
> -	return !skb->head_frag || skb_cloned(skb);
> +	return !skb->head_frag || skb_header_cloned(skb);
>  }
>  #endif	/* __KERNEL__ */
>  #endif	/* _LINUX_SKBUFF_H */
>
>
The problem is that the whole reason for checking skb_cloned was to
avoid reference count issues between the skb and the page.  We should
only be using the reference count in one or the other and not both. 
Otherwise we open up the possibility of a data corruption if someone
misinterprets a skb_shinfo()->dataref == 1, or skb_header_cloned
returning false when we have the buffer shared between both the sk_buff
and a page.

The skb_header_cloned check only verifies that the portion between
skb->head and skb->data is currently being unused by the other clones. 
It doesn't guarantee that skb->head is not being used by any other
sk_buff.  As such we run the same risk of messing up the dataref
counting if we were to use it.

The way I see it there are 2 solutions.  The first would be to just
split the reference counts and make it so that calls like skb_cloned
have to check both dataref and page count if skb->head_frag is set.  The
second option would be to look at something like pskb_expand_head where
we could generate a new head fragment and then memcpy the data over to
that frag in order to "unlock" the head.

Thanks,

Alex

^ permalink raw reply

* Using jiffies for tcp_time_stamp?
From: Srećko Jurić-Kavelj @ 2012-05-22 17:21 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CAACrLC39Xdm3vTKUsjz43ZPyEq_vHxR-_Uf56SjSm+kUqxOqZg@mail.gmail.com>

Hi,

Recently I tackled round trip time estimation of a TCP connection.
After implementing a straight-forward approach (time stamping sending
and receiving of data using clock_gettime) I found this article:
http://linuxgazette.net/136/pfeiffer.html (using getsockopt() to get
struct tcp_info). The tcp_info structure conveniently has a rtt field.

Using the first method I get 1-3 ms RTT, and by using the second I get
>=10 ms RTT.

By looking at the code it's clear that the time stamping is done with
jiffies, and my kernel has CONFIG_HZ=100.

I understand that this is for performance reasons (and the RTT
smoothing filter is implemented with bit shifting operations), but
would using a more precise time stamp have significant impact on
performance? Since RTT is used to compute RTO, wouldn't there be any
benefits of having more accurate estimate of this value?

Best regards,

Srećko Jurić-Kavelj, dipl.ing. (Ms.E.E)
Research and Teaching Assistant at University of Zagreb
(Faculty of Electrical Engineering and Computing, Department of
Control and Computer Engineering)

E-mail: srecko.juric-kavelj@fer.hr
URL: http://www.fer.hr/srecko.juric-kavelj

Sanctus Hieronymus: "Parce mihi, Domine, quia dalmata sum!"

^ permalink raw reply

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Denys Fedoryshchenko @ 2012-05-22 17:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, netdev
In-Reply-To: <1337589620.3361.23.camel@edumazet-glaptop>

On 2012-05-21 11:40, Eric Dumazet wrote:
> On Mon, 2012-05-21 at 10:30 +0200, Eric Dumazet wrote:
>> On Mon, 2012-05-21 at 11:06 +0300, Denys Fedoryshchenko wrote:
>>
>> > Not sure it is a lot of time, after all it is 2 x core quad 
>> machine,
>> > should be enough fast for pings.
>> > It will cause stalls on small packets even more seems.
>> >
>> > Tested latest git, net-next, still the same, stalls.
>> > hardware latency detector are silent by the way, so there is no
>> > significant SMI.
>> >
>>
>> I am trying to reproduce your problem here with no luck yet.
>>
>> I wonder of softirq are correctly scheduled on your machine
>>
>
> By the way, fact you have 8 cpus is irrelevant.
>
> Only one cpu has queued the NET_TX_SOFTIRQ softirq (serviced by
> net_tx_action())
>
>
> If this cpu is busy servicing other stuff, no other cpu will help.
>
By the way, if BQL limit is going lower than MTU, is it considered as a 
bug?
If yes, i can try to upload 3.4 to some servers and add condition to 
WARN_ON if limit < 1500.

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* Re: tcp timestamp issues with google servers
From: Rick Jones @ 2012-05-22 16:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Eric Dumazet, netdev, linux-kernel
In-Reply-To: <87mx50rz34.fsf@tucsk.pomaz.szeredi.hu>

On 05/22/2012 08:54 AM, Miklos Szeredi wrote:
> Eric Dumazet<eric.dumazet@gmail.com>  writes:
>
>> On Tue, 2012-05-22 at 17:25 +0200, Miklos Szeredi wrote:
>>
>>> So it appears.  The IP address is certainly registered to Google.
>>
>> Good, but you could have a middlebox doing transparent proxying.
>>
>> The SYNACK could be send by this box.
>
> Okay.  Is there a way to find out whether there is a middlebox or not?

The source IP in the trace was a 192.168 IP - is it possible/desirable 
to reproduce the problem without the device doing NAT in the path?

What is your "public" IP address?  Given that, and the IP address to 
which you are connecting, it should be possible to validate the RTT you 
are seeing.  If the geographic/topological location of the destination 
Google IP address is far enough from your public source IP that would 
show whether  the RTT you are seeing is even physically possible and so 
could suggest there is a middlebox (other than your NAT), though it 
couldn't show there was not a middlebox.

rick jones

^ permalink raw reply

* Re: tcp timestamp issues with google servers
From: Eric Dumazet @ 2012-05-22 16:48 UTC (permalink / raw)
  To: Vijay Subramanian; +Cc: Miklos Szeredi, netdev, linux-kernel
In-Reply-To: <CAGK4HS9cGnOcoLAL1pggWvMG1B40rWyqc3BHsddUvnii4EvTcQ@mail.gmail.com>

On Tue, 2012-05-22 at 09:38 -0700, Vijay Subramanian wrote:
> > Okay.  Is there a way to find out whether there is a middlebox or not?
> >
> 
> Miklos,
> Maybe tcptraceroute[1] can help you figure this out.
> 
> Hope this helps.
> Vijay
> 
> [1] http://michael.toren.net/code/tcptraceroute/

The transparent proxy can intercept TCP connections to port 80/443, and
let ICMP being NATed by the box.

So its better to check of the delay between SYN and SYNACK is roughly
independent of the HTTP server.

If you have very large range of delays, you can conclude its not a
transparent proxy.

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-22 16:45 UTC (permalink / raw)
  To: Kieran Mansley; +Cc: Ben Hutchings, netdev
In-Reply-To: <1337704382.1698.53.camel@kjm-desktop.uk.level5networks.com>

On Tue, 2012-05-22 at 17:32 +0100, Kieran Mansley wrote:
> On Tue, 2012-05-22 at 18:12 +0200, Eric Dumazet wrote:
> > 
> > __tcp_select_window() ( more precisely tcp_space() takes into account
> > memory used in receive/ofo queue, but not frames in backlog queue)
> > 
> > So if you send bursts, it might explain TCP stack continues to
> > advertise
> > a too big window, instead of anticipate the problem.
> > 
> > Please try the following patch :
> > 
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index e79aa48..82382cb 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -1042,8 +1042,9 @@ static inline int tcp_win_from_space(int space)
> >  /* Note: caller must be prepared to deal with negative returns */ 
> >  static inline int tcp_space(const struct sock *sk)
> >  {
> > -       return tcp_win_from_space(sk->sk_rcvbuf -
> > -                                 atomic_read(&sk->sk_rmem_alloc));
> > +       int used = atomic_read(&sk->sk_rmem_alloc) +
> > sk->sk_backlog.len;
> > +
> > +       return tcp_win_from_space(sk->sk_rcvbuf - used);
> >  } 
> >  
> >  static inline int tcp_full_space(const struct sock *sk)
> 
> 
> I can give this a try (not sure when - probably later this week) but I
> think this it is back to front.  The patch above will reduce the
> advertised window by sk_backlog.len, but at the time that the window was
> advertised that allowed the dropped packets to be sent the backlog was
> empty.  It is later, when the kernel is waking the application and takes
> the socket lock that the backlog starts to be used and the drop happens.
> But reducing the window advertised at this point is futile - the packets
> that will be dropped are already in flight.
> 

Not really. If we receive these packets while backlog is empty, then the
sender violates TCP rules.

We advertise tcp window directly from memory we are allowed to consume.

(On the premise sender behaves correctly, not sending bytes in small
packets)


> The problem exists because the backlog has a tighter limit on it than
> the receive window does; I think the backlog should be able to accept
> sk_rcvbuf bytes in addition to what is already in the receive buffer (or
> up to the advertised receive window if that's smaller).  At the moment
> it will only accept sk_rcvbuf bytes including what is already in the
> receive buffer.  The logic being that in this case we're using the
> backlog because it's in the process of emptying the receive buffer into
> the application, and so the receive buffer will very soon be empty, and
> so we will very soon be able to accept sk_rcvbuf bytes.  This is evident
> from the packet capture as the kernel stack is quite happy to accept the
> significant quantity of data that arrives as part of the same burst
> immediately after it has dropped a couple of packets.
> 

This is not evident from the capture, you are mistaken.

tcpdump captures packets before tcp stack, it doesnt say if they are :

1) queued in receive of ofo queue
2) queued in socket backlog
3) dropped because we hit socket rcvbuf limit

If socket lock is hold by the user, packets are queued to backlog, or
dropped.

Then, when socket lock is about to be released, we process the backlog.

^ permalink raw reply

* Re: tcp timestamp issues with google servers
From: Vijay Subramanian @ 2012-05-22 16:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Eric Dumazet, netdev, linux-kernel
In-Reply-To: <87mx50rz34.fsf@tucsk.pomaz.szeredi.hu>

> Okay.  Is there a way to find out whether there is a middlebox or not?
>

Miklos,
Maybe tcptraceroute[1] can help you figure this out.

Hope this helps.
Vijay

[1] http://michael.toren.net/code/tcptraceroute/

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Kieran Mansley @ 2012-05-22 16:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, netdev
In-Reply-To: <1337703170.3361.217.camel@edumazet-glaptop>

On Tue, 2012-05-22 at 18:12 +0200, Eric Dumazet wrote:
> 
> __tcp_select_window() ( more precisely tcp_space() takes into account
> memory used in receive/ofo queue, but not frames in backlog queue)
> 
> So if you send bursts, it might explain TCP stack continues to
> advertise
> a too big window, instead of anticipate the problem.
> 
> Please try the following patch :
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index e79aa48..82382cb 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1042,8 +1042,9 @@ static inline int tcp_win_from_space(int space)
>  /* Note: caller must be prepared to deal with negative returns */ 
>  static inline int tcp_space(const struct sock *sk)
>  {
> -       return tcp_win_from_space(sk->sk_rcvbuf -
> -                                 atomic_read(&sk->sk_rmem_alloc));
> +       int used = atomic_read(&sk->sk_rmem_alloc) +
> sk->sk_backlog.len;
> +
> +       return tcp_win_from_space(sk->sk_rcvbuf - used);
>  } 
>  
>  static inline int tcp_full_space(const struct sock *sk)

I can give this a try (not sure when - probably later this week) but I
think this it is back to front.  The patch above will reduce the
advertised window by sk_backlog.len, but at the time that the window was
advertised that allowed the dropped packets to be sent the backlog was
empty.  It is later, when the kernel is waking the application and takes
the socket lock that the backlog starts to be used and the drop happens.
But reducing the window advertised at this point is futile - the packets
that will be dropped are already in flight.

The problem exists because the backlog has a tighter limit on it than
the receive window does; I think the backlog should be able to accept
sk_rcvbuf bytes in addition to what is already in the receive buffer (or
up to the advertised receive window if that's smaller).  At the moment
it will only accept sk_rcvbuf bytes including what is already in the
receive buffer.  The logic being that in this case we're using the
backlog because it's in the process of emptying the receive buffer into
the application, and so the receive buffer will very soon be empty, and
so we will very soon be able to accept sk_rcvbuf bytes.  This is evident
from the packet capture as the kernel stack is quite happy to accept the
significant quantity of data that arrives as part of the same burst
immediately after it has dropped a couple of packets.

Perhaps it would be easier for me to write a patch to show this
suggested solution?

Kieran

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-22 16:12 UTC (permalink / raw)
  To: Kieran Mansley; +Cc: Ben Hutchings, netdev
In-Reply-To: <1337699379.1698.30.camel@kjm-desktop.uk.level5networks.com>

On Tue, 2012-05-22 at 16:09 +0100, Kieran Mansley wrote:
> On Tue, 2012-05-22 at 11:30 +0200, Eric Dumazet wrote:
> > Also can you post a pcap capture of problematic flow ?
> 
> I'll email this to you directly. The capture is generated with netserver
> on the system under test, and NetPerf sending from a similar server.
> I've only included the first 1000 frames to keep the capture size down.
> There are 7 retransmissions in that capture, and the TCPBacklogDrops
> counter incremented by 7 during the test, so I'm happy to say they are
> the cause of the drops.
> 
> The system under test was running net-next.
> 
> I've not tried with another NIC (e.g. tg3) but will see if I can find
> one to test.

Or you could change sfc to allow its frames being coalesced.

> 
> I've got a feeling that the drops might be easier to reproduce if I
> taskset the netserver process to a different package than the one that
> is handling the network interrupt for that NIC.  This fits with my
> earlier theory in that it is likely to increase the overhead of waking
> the user-level process to satisfy the read and so increase the time
> during which received packets could overflow the backlog.  Having a
> relatively aggressive sending TCP also helps, e.g. one that is
> configured to open its congestion window quickly, as this will produce
> more intensive bursts.

__tcp_select_window() ( more precisely tcp_space() takes into account
memory used in receive/ofo queue, but not frames in backlog queue)

So if you send bursts, it might explain TCP stack continues to advertise
a too big window, instead of anticipate the problem.

Please try the following patch :

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e79aa48..82382cb 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1042,8 +1042,9 @@ static inline int tcp_win_from_space(int space)
 /* Note: caller must be prepared to deal with negative returns */ 
 static inline int tcp_space(const struct sock *sk)
 {
-	return tcp_win_from_space(sk->sk_rcvbuf -
-				  atomic_read(&sk->sk_rmem_alloc));
+	int used = atomic_read(&sk->sk_rmem_alloc) + sk->sk_backlog.len;
+
+	return tcp_win_from_space(sk->sk_rcvbuf - used);
 } 
 
 static inline int tcp_full_space(const struct sock *sk)

^ permalink raw reply related

* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Shirley Ma @ 2012-05-22 15:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, eric.dumazet, netdev, linux-kernel, ebiederm,
	davem
In-Reply-To: <4FBB64F7.5090801@redhat.com>

On Tue, 2012-05-22 at 18:05 +0800, Jason Wang wrote:
> On 05/21/2012 11:42 PM, Shirley Ma wrote:
> > On Mon, 2012-05-21 at 14:05 +0800, Jason Wang wrote:
> >>>> - tx polling depends on skb_orphan() which is often called by
> >> device
> >>>> driver when it place the packet into the queue of the devices
> >> instead
> >>>> of  when the packets were sent. So it was too early for vhost to
> be
> >>>> notified.
> >>> Then do you think it's better to replace with vhost_poll_queue
> here
> >>> instead?
> >> Just like what does this patch do - calling vhost_poll_queue() in
> >> vhost_zerocopy_callback().
> >>>> - it only works when the pending DMAs exceeds VHOST_MAX_PEND,
> it's
> >>>> highly possible that guest needs to be notified when the pending
> >>>> packets
> >>>> isn't so much.
> >>> In which situation the guest needs to be notified when there is no
> >> TX
> >>> besides buffers run out?
> >> Consider guest call virtqueue_enable_cb_delayed() which means it
> only
> >> need to be notified when 3/4 of pending buffers ( about 178 buffers
> >> (256-MAX_SKB_FRAGS-2)*3/4 ) were sent by host. So vhost_net would
> >> notify
> >> guest when about 60 buffers were pending. Since tx polling is only
> >> enabled when pending packets exceeds VHOST_MAX_PEND 128, so tx work
> >> would not be notified to run and guest would never get the
> interrupt
> >> it
> >> expected to re-enable the queue.
> > So it seems we still need vhost_enable_notify() in handle_tx when
> there
> > is no tx in zerocopy case.
> >
> > Do you know which one is more expensive: the cost of
> vhost_poll_queue()
> > in each zerocopy callback or calling vhost_enable_notify()?
> 
> Didn't follow here, do you mean vhost_signal() here? 

I meant removing the code in handle_tx for zerocopy as below:

+	if (zcopy) {
                        /* If more outstanding DMAs, queue the work.
                         * Handle upend_idx wrap around
                         */
                        num_pends = likely(vq->upend_idx >= vq->done_idx) ?
                                    (vq->upend_idx - vq->done_idx) :
                                    (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
+			/* zerocopy vhost_enable_notify is under zerocopy callback
+			 * since it could be too early to notify here */
+			break;
+	}
-                       if (unlikely(num_pends > VHOST_MAX_PEND)) {
-                                tx_poll_start(net, sock);
-                                set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
-                                break;
-                        }
                        if (unlikely(vhost_enable_notify(&net->dev, vq))) {
                                vhost_disable_notify(&net->dev, vq);
                                continue;
                        }
                        break;

Thanks
Shirley

^ permalink raw reply

* Re: tcp timestamp issues with google servers
From: Miklos Szeredi @ 2012-05-22 15:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel
In-Reply-To: <1337701259.3361.208.camel@edumazet-glaptop>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Tue, 2012-05-22 at 17:25 +0200, Miklos Szeredi wrote:
>
>> So it appears.  The IP address is certainly registered to Google.
>
> Good, but you could have a middlebox doing transparent proxying.
>
> The SYNACK could be send by this box.

Okay.  Is there a way to find out whether there is a middlebox or not?

Thanks,
Miklos

^ permalink raw reply

* Re: tcp timestamp issues with google servers
From: Eric Dumazet @ 2012-05-22 15:40 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: netdev, linux-kernel
In-Reply-To: <87zk90s0em.fsf@tucsk.pomaz.szeredi.hu>

On Tue, 2012-05-22 at 17:25 +0200, Miklos Szeredi wrote:

> So it appears.  The IP address is certainly registered to Google.

Good, but you could have a middlebox doing transparent proxying.

The SYNACK could be send by this box.

^ permalink raw reply

* Re: tcp timestamp issues with google servers
From: Miklos Szeredi @ 2012-05-22 15:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel
In-Reply-To: <1337278363.3403.39.camel@edumazet-glaptop>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Thu, 2012-05-17 at 11:39 +0200, Miklos Szeredi wrote:
>> Sometimes connection to google.com, gmail.com and other google servers
>> doesn't work or takes ages to connect.  When this hits it hits all
>> google servers at the same time and it's persistent.  It never happens
>> to anything other than google.  Rebooting helps.  Rarely it goes away
>> spontaneously.
>> 
>> Apparently google is sometimes replying with an invalid TSecr timestamp
>> value (smaller than the one sent in the last packet) and this confuses
>> the Linux TCP stack which either discards the packet or sends a Reset.
>> 
>> Network dump attached.
>> 
>> I found only a couple of references to this issue:
>> 
>> http://gotchas.livejournal.com/3028.html
>> 
>> http://groups.google.com/group/comp.os.linux.networking/browse_thread/thread/29f56feded11b42a
>> 
>> Turning tcp timestamps fixes the issue:
>> 
>>   sysctl -w net.ipv4.tcp_timestamps=0
>> 
>> Not sure why this happens only to me and a very few others.
>> 
>> It appears to be an issue with google TCP stack (is it a modified
>> stack?) but I thought about issues in my network switch (restarting it
>> doesn't help) or something in the ISP, but those look unlikely.
>> 
>> Any ideas?
>> 
>> Thanks,
>> Miklos
>> 
>> 
>> 
>>   1   0.000000 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSV=35355050 TSER=0 WS=5
>>   2   0.002730 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=0 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184565067 TSER=35325344 WS=6
>
>
> Do you really have 2730 usec RTT between you and this (Google ?)
> server ?

So it appears.  The IP address is certainly registered to Google.

Thanks,
Miklos

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Kieran Mansley @ 2012-05-22 15:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, netdev
In-Reply-To: <1337679045.3361.154.camel@edumazet-glaptop>

On Tue, 2012-05-22 at 11:30 +0200, Eric Dumazet wrote:
> Also can you post a pcap capture of problematic flow ?

I'll email this to you directly. The capture is generated with netserver
on the system under test, and NetPerf sending from a similar server.
I've only included the first 1000 frames to keep the capture size down.
There are 7 retransmissions in that capture, and the TCPBacklogDrops
counter incremented by 7 during the test, so I'm happy to say they are
the cause of the drops.

The system under test was running net-next.

I've not tried with another NIC (e.g. tg3) but will see if I can find
one to test.

I've got a feeling that the drops might be easier to reproduce if I
taskset the netserver process to a different package than the one that
is handling the network interrupt for that NIC.  This fits with my
earlier theory in that it is likely to increase the overhead of waking
the user-level process to satisfy the read and so increase the time
during which received packets could overflow the backlog.  Having a
relatively aggressive sending TCP also helps, e.g. one that is
configured to open its congestion window quickly, as this will produce
more intensive bursts.

Kieran

^ permalink raw reply

* Re: [PATCH] net: Surpress kmemleak messages on sysctl paths
From: Steven Rostedt @ 2012-05-22 14:51 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, LKML, netdev, viro, tixxdz
In-Reply-To: <878vgk1do2.fsf@xmission.com>

On Tue, 2012-05-22 at 08:41 -0600, Eric W. Biederman wrote:
> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > The network code allocates ctl_table_headers that are used for the life
> > of the kernel. These headers are registered and never unregistered. The
> > head pointer is allocated and not referenced, as it never needs to be
> > unregistered, and the kmemleak detector triggers these as false
> > positives:
> 
> The fix for this should already be merged into Linus's tree from the
> net-next tree for 3.5.

Ah, I didn't look at net-next. I just looked at 3.4 and didn't see
anything. If that's the case, simply ignore :-)

-- Steve

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Sergio Correia @ 2012-05-22 14:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1337697203.3361.190.camel@edumazet-glaptop>

Hi Eric,

On Tue, May 22, 2012 at 11:33 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-05-22 at 10:57 -0300, Sergio Correia wrote:
>> On Tue, May 22, 2012 at 2:43 AM, Sergio Correia <lists@uece.net> wrote:
>> > So far it has happened only once.
>> > Last commit is 471368557a734c6c486ee757952c902b36e7fd01.
>> >
>> >
>> > [ 3726.624387] ------------[ cut here ]------------
>> > [ 3726.624398] WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
>> > [ 3726.624400] Hardware name: N53SV
>> > [ 3726.624402] cleanup rbuf bug: copied A4D1F126 seq A4D1F126 rcvnxt A4D1F126
>> > [ 3726.624404] Modules linked in:
>> > [ 3726.624407] Pid: 1416, comm: transmission-gt Not tainted 3.4.0-git+ #52
>> > [ 3726.624409] Call Trace:
>> > [ 3726.624415]  [<ffffffff81035eba>] warn_slowpath_common+0x7a/0xb0
>> > [ 3726.624419]  [<ffffffff81035f91>] warn_slowpath_fmt+0x41/0x50
>> > [ 3726.624507]  [<ffffffff81aa57a5>] ? sub_preempt_count+0x65/0xc0
>> > [ 3726.624510]  [<ffffffff819101cf>] tcp_cleanup_rbuf+0x4f/0x110
>> > [ 3726.624514]  [<ffffffff819112b7>] tcp_recvmsg+0x637/0xa60
>> > [ 3726.624518]  [<ffffffff81849310>] ? release_sock+0xe0/0x110
>> > [ 3726.624522]  [<ffffffff81934a34>] inet_recvmsg+0x94/0xc0
>> > [ 3726.624534]  [<ffffffff81844792>] sock_aio_read.part.8+0x142/0x170
>> > [ 3726.624537]  [<ffffffff818447c0>] ? sock_aio_read.part.8+0x170/0x170
>> > [ 3726.624540]  [<ffffffff818447e1>] sock_aio_read+0x21/0x30
>> > [ 3726.624544]  [<ffffffff81124b0a>] do_sync_readv_writev+0xca/0x110
>> > [ 3726.624548]  [<ffffffff8140f582>] ? security_file_permission+0x92/0xb0
>> > [ 3726.624552]  [<ffffffff8112425c>] ? rw_verify_area+0x5c/0xe0
>> > [ 3726.624555]  [<ffffffff81124de6>] do_readv_writev+0xd6/0x1e0
>> > [ 3726.624558]  [<ffffffff8184336b>] ? sock_do_ioctl+0x2b/0x70
>> > [ 3726.624562]  [<ffffffff81135abf>] ? do_vfs_ioctl+0x8f/0x530
>> > [ 3726.624566]  [<ffffffff8141278f>] ? file_has_perm+0x8f/0xa0
>> > [ 3726.624569]  [<ffffffff81124f7d>] vfs_readv+0x2d/0x50
>> > [ 3726.624572]  [<ffffffff81124fe5>] sys_readv+0x45/0xb0
>> > [ 3726.624575]  [<ffffffff81aa9062>] system_call_fastpath+0x16/0x1b
>> > [ 3726.624578] ---[ end trace 6dc5d813929e5e6f ]---
>>
>> Checked this morning, and my dmesg now is basically composed of this
>> warning over and over and over.
>
> Is it wifi adapter ?
>

Yes, it's an Atheros AR9285 adapter.
This morning I did a make mrproper before rebuilding the kernel
(should I always do that?), but the warning has just appeared again.

> Please send "netstat -s" output
>

netstat -s
Ip:
    49826 total packets received
    3 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    49771 incoming packets delivered
    36344 requests sent out
Icmp:
    129 ICMP messages received
    1 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 129
    156 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 156
IcmpMsg:
        InType3: 129
        OutType3: 156
Tcp:
    1150 active connections openings
    42 passive connection openings
    63 failed connection attempts
    45 connection resets received
    29 connections established
    25457 segments received
    23088 segments send out
    689 segments retransmited
    0 bad segments received.
    347 resets sent
Udp:
    23927 packets received
    155 packets to unknown port received.
    0 packet receive errors
    12411 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    19 invalid SYN cookies received
    1 resets received for embryonic SYN_RECV sockets
    122 TCP sockets finished time wait in fast timer
    668 delayed acks sent
    Quick ack mode was activated 228 times
    2 packets directly queued to recvmsg prequeue.
    2 bytes directly received in process context from prequeue
    10825 packet headers predicted
    2914 acknowledgments not containing data payload received
    1355 predicted acknowledgments
    3 times recovered from packet loss by selective acknowledgements
    1 congestion windows recovered without slow start by DSACK
    36 congestion windows recovered without slow start after partial ack
    3 fast retransmits
    229 other TCP timeouts
    1 SACK retransmits failed
    441 DSACKs sent for old packets
    11 DSACKs sent for out of order packets
    31 DSACKs received
    35 connections reset due to unexpected data
    32 connections reset due to early user close
    3 connections aborted due to timeout
    TCPDSACKIgnoredNoUndo: 6
    TCPSackShiftFallback: 11
    TCPRcvCoalesce: 8467
IpExt:
    OutMcastPkts: 13
    InBcastPkts: 271
    OutBcastPkts: 184
    InOctets: 53951318
    OutOctets: 5022932
    OutMcastOctets: 2091
    InBcastOctets: 40466
    OutBcastOctets: 32946

^ permalink raw reply

* Re: [PATCH] net: Surpress kmemleak messages on sysctl paths
From: Eric W. Biederman @ 2012-05-22 14:41 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: David Miller, LKML, netdev, viro, tixxdz
In-Reply-To: <1337647392.13348.14.camel@gandalf.stny.rr.com>

Steven Rostedt <rostedt@goodmis.org> writes:

> The network code allocates ctl_table_headers that are used for the life
> of the kernel. These headers are registered and never unregistered. The
> head pointer is allocated and not referenced, as it never needs to be
> unregistered, and the kmemleak detector triggers these as false
> positives:

The fix for this should already be merged into Linus's tree from the
net-next tree for 3.5.

Eric

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Eric Dumazet @ 2012-05-22 14:33 UTC (permalink / raw)
  To: Sergio Correia; +Cc: netdev
In-Reply-To: <CAJyhjX1VKQpAbqkxnWFNrRSVBuSRYaNQSUXYWqYfoE0GnmVokQ@mail.gmail.com>

On Tue, 2012-05-22 at 10:57 -0300, Sergio Correia wrote:
> On Tue, May 22, 2012 at 2:43 AM, Sergio Correia <lists@uece.net> wrote:
> > So far it has happened only once.
> > Last commit is 471368557a734c6c486ee757952c902b36e7fd01.
> >
> >
> > [ 3726.624387] ------------[ cut here ]------------
> > [ 3726.624398] WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
> > [ 3726.624400] Hardware name: N53SV
> > [ 3726.624402] cleanup rbuf bug: copied A4D1F126 seq A4D1F126 rcvnxt A4D1F126
> > [ 3726.624404] Modules linked in:
> > [ 3726.624407] Pid: 1416, comm: transmission-gt Not tainted 3.4.0-git+ #52
> > [ 3726.624409] Call Trace:
> > [ 3726.624415]  [<ffffffff81035eba>] warn_slowpath_common+0x7a/0xb0
> > [ 3726.624419]  [<ffffffff81035f91>] warn_slowpath_fmt+0x41/0x50
> > [ 3726.624507]  [<ffffffff81aa57a5>] ? sub_preempt_count+0x65/0xc0
> > [ 3726.624510]  [<ffffffff819101cf>] tcp_cleanup_rbuf+0x4f/0x110
> > [ 3726.624514]  [<ffffffff819112b7>] tcp_recvmsg+0x637/0xa60
> > [ 3726.624518]  [<ffffffff81849310>] ? release_sock+0xe0/0x110
> > [ 3726.624522]  [<ffffffff81934a34>] inet_recvmsg+0x94/0xc0
> > [ 3726.624534]  [<ffffffff81844792>] sock_aio_read.part.8+0x142/0x170
> > [ 3726.624537]  [<ffffffff818447c0>] ? sock_aio_read.part.8+0x170/0x170
> > [ 3726.624540]  [<ffffffff818447e1>] sock_aio_read+0x21/0x30
> > [ 3726.624544]  [<ffffffff81124b0a>] do_sync_readv_writev+0xca/0x110
> > [ 3726.624548]  [<ffffffff8140f582>] ? security_file_permission+0x92/0xb0
> > [ 3726.624552]  [<ffffffff8112425c>] ? rw_verify_area+0x5c/0xe0
> > [ 3726.624555]  [<ffffffff81124de6>] do_readv_writev+0xd6/0x1e0
> > [ 3726.624558]  [<ffffffff8184336b>] ? sock_do_ioctl+0x2b/0x70
> > [ 3726.624562]  [<ffffffff81135abf>] ? do_vfs_ioctl+0x8f/0x530
> > [ 3726.624566]  [<ffffffff8141278f>] ? file_has_perm+0x8f/0xa0
> > [ 3726.624569]  [<ffffffff81124f7d>] vfs_readv+0x2d/0x50
> > [ 3726.624572]  [<ffffffff81124fe5>] sys_readv+0x45/0xb0
> > [ 3726.624575]  [<ffffffff81aa9062>] system_call_fastpath+0x16/0x1b
> > [ 3726.624578] ---[ end trace 6dc5d813929e5e6f ]---
> 
> Checked this morning, and my dmesg now is basically composed of this
> warning over and over and over.

Is it wifi adapter ?

Please send "netstat -s" output

^ permalink raw reply

* tc filter u32 match
From: Nieścierowicz Adam @ 2012-05-22 13:45 UTC (permalink / raw)
  To: netdev

Hello,

I'm in the process of building a new shaper, when adding support for 
802.1q
vlan noticed that u32 can catch network traffic without giving 4 bytes
offset. How is this possible?

My environment:

eth2 - network card
eth2.200 - vlan

/sbin/tc filter add dev eth2 parent 1:0 prio 5 handle 35: protocol ip 
u32 divisor 256
/sbin/tc filter add dev eth2 protocol ip parent 1:0 prio 5 u32 ht 800:: 
match ip dst 31.41.208.32/27 hashkey mask 0x000000ff at 16 link 35:
/sbin/tc filter add dev eth2 protocol ip parent 1: prio 1 u32 ht 35:24: 
match ip dst 31.41.208.36 flowid 1:2e5

Here you can see the hits in the rule
filter parent 1: protocol ip pref 5 u32 fh 35:24:800 order 2048 key ht 
35 bkt 24 flowid 1:2e5  (rule hit 44037 success 44037)
   match 1f29d024/ffffffff at 16 (success 44037 )


I found a similar question here 
http://serverfault.com/questions/370795/tc-u32-how-to-match-l2-protocols-in-recent-kernels

Thanks

^ permalink raw reply

* tc filter u32 match
From: Nieścierowicz Adam @ 2012-05-22 13:42 UTC (permalink / raw)
  To: netdev

Hello,

I'm in the process of building a new shaper, when adding support for 
802.1q
vlan noticed that u32 can catch network traffic without giving 4 bytes
offset. How is this possible?

My environment:

eth2 - network card
eth2.200 - vlan

/sbin/tc filter add dev eth2 parent 1:0 prio 5 handle 35: protocol ip 
u32 divisor 256
/sbin/tc filter add dev eth2 protocol ip parent 1:0 prio 5 u32 ht 800:: 
match ip dst 31.41.208.32/27 hashkey mask 0x000000ff at 16 link 35:
/sbin/tc filter add dev eth2 protocol ip parent 1: prio 1 u32 ht 35:24: 
match ip dst 31.41.208.36 flowid 1:2e5

Here you can see the hits in the rule
filter parent 1: protocol ip pref 5 u32 fh 35:24:800 order 2048 key ht 
35 bkt 24 flowid 1:2e5  (rule hit 44037 success 44037)
   match 1f29d024/ffffffff at 16 (success 44037 )


I found a similar question here 
http://serverfault.com/questions/370795/tc-u32-how-to-match-l2-protocols-in-recent-kernels

Thanks

^ permalink raw reply

* Re: [PATCH v3] drop_monitor: convert to modular building
From: Neil Horman @ 2012-05-22 13:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, bhutchings
In-Reply-To: <1337691919.3361.189.camel@edumazet-glaptop>

On Tue, May 22, 2012 at 03:05:19PM +0200, Eric Dumazet wrote:
> On Thu, 2012-05-17 at 16:21 -0400, Neil Horman wrote:
> > On Thu, May 17, 2012 at 04:09:37PM -0400, David Miller wrote:
> 
> > > 
> > > Applied, althrough it didn't apply cleanly to net-next.
> > > 
> > 
> > Apologies Dave, should have told you that I was carrying Joe P.'s cleanup patch
> > in my net-next tree as well:
> > http://marc.info/?l=linux-netdev&m=133727344816140&w=2
> > 
> > Since you noted that you had applied it, I applied it myself here.
> > Neil
> > 
> 
> Any plan to autoload drop_monitor module from dropwatch,
> or issuing some advice ?
> 
> # dropwatch -l kas
> Unable to find NET_DM family, dropwatch can't work
> Cleanuing up on socket creation error
> 
> Thanks
> 
I'm looking into that currently, although I was starting to wonder if its
possible to do with a generic netlink socket.  I can't seem to find any
examples, and I can't use the net-pf-* module alias mechanism that formal
protocols implement, since I don't have a defined address family.  I suppose I
could augment that format to support a net-pf-16-<name> alias, where name is the
name of the genl family that gets registered by the module you're looking for.

Does that seem like a reasonable idea?
Neil

> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
From: Sergio Correia @ 2012-05-22 13:57 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CAJyhjX22Vna=TwuhD-FqyoXEd9D8wq_VdHbAAWwmqO1hZ34FRA@mail.gmail.com>

On Tue, May 22, 2012 at 2:43 AM, Sergio Correia <lists@uece.net> wrote:
> So far it has happened only once.
> Last commit is 471368557a734c6c486ee757952c902b36e7fd01.
>
>
> [ 3726.624387] ------------[ cut here ]------------
> [ 3726.624398] WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()
> [ 3726.624400] Hardware name: N53SV
> [ 3726.624402] cleanup rbuf bug: copied A4D1F126 seq A4D1F126 rcvnxt A4D1F126
> [ 3726.624404] Modules linked in:
> [ 3726.624407] Pid: 1416, comm: transmission-gt Not tainted 3.4.0-git+ #52
> [ 3726.624409] Call Trace:
> [ 3726.624415]  [<ffffffff81035eba>] warn_slowpath_common+0x7a/0xb0
> [ 3726.624419]  [<ffffffff81035f91>] warn_slowpath_fmt+0x41/0x50
> [ 3726.624507]  [<ffffffff81aa57a5>] ? sub_preempt_count+0x65/0xc0
> [ 3726.624510]  [<ffffffff819101cf>] tcp_cleanup_rbuf+0x4f/0x110
> [ 3726.624514]  [<ffffffff819112b7>] tcp_recvmsg+0x637/0xa60
> [ 3726.624518]  [<ffffffff81849310>] ? release_sock+0xe0/0x110
> [ 3726.624522]  [<ffffffff81934a34>] inet_recvmsg+0x94/0xc0
> [ 3726.624534]  [<ffffffff81844792>] sock_aio_read.part.8+0x142/0x170
> [ 3726.624537]  [<ffffffff818447c0>] ? sock_aio_read.part.8+0x170/0x170
> [ 3726.624540]  [<ffffffff818447e1>] sock_aio_read+0x21/0x30
> [ 3726.624544]  [<ffffffff81124b0a>] do_sync_readv_writev+0xca/0x110
> [ 3726.624548]  [<ffffffff8140f582>] ? security_file_permission+0x92/0xb0
> [ 3726.624552]  [<ffffffff8112425c>] ? rw_verify_area+0x5c/0xe0
> [ 3726.624555]  [<ffffffff81124de6>] do_readv_writev+0xd6/0x1e0
> [ 3726.624558]  [<ffffffff8184336b>] ? sock_do_ioctl+0x2b/0x70
> [ 3726.624562]  [<ffffffff81135abf>] ? do_vfs_ioctl+0x8f/0x530
> [ 3726.624566]  [<ffffffff8141278f>] ? file_has_perm+0x8f/0xa0
> [ 3726.624569]  [<ffffffff81124f7d>] vfs_readv+0x2d/0x50
> [ 3726.624572]  [<ffffffff81124fe5>] sys_readv+0x45/0xb0
> [ 3726.624575]  [<ffffffff81aa9062>] system_call_fastpath+0x16/0x1b
> [ 3726.624578] ---[ end trace 6dc5d813929e5e6f ]---

Checked this morning, and my dmesg now is basically composed of this
warning over and over and over.

^ permalink raw reply

* [PATCH v4 4/7] ARM: davinci: net: davinci_emac: add OF support
From: Heiko Schocher @ 2012-05-22 13:55 UTC (permalink / raw)
  To: davinci-linux-open-source
  Cc: Heiko Schocher, linux-arm-kernel, devicetree-discuss, netdev,
	Grant Likely, Sekhar Nori, Wolfgang Denk, Anatoly Sivov
In-Reply-To: <1337694920-8925-1-git-send-email-hs@denx.de>

add of support for the davinci_emac driver.

Signed-off-by: Heiko Schocher <hs@denx.de>
Cc: davinci-linux-open-source@linux.davincidsp.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree-discuss@lists.ozlabs.org
Cc: netdev@vger.kernel.org
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Wolfgang Denk <wd@denx.de>
Cc: Anatoly Sivov <mm05@mail.ru>

---
- changes for v2:
  - add comment from Anatoly Sivov
    - fix typo in davinci_emac.txt
  - add comment from Grant Likely:
    - add prefix "ti,davinci-" to davinci specific property names
    - remove version property
    - use compatible name "ti,davinci-dm6460-emac"
    - use devm_kzalloc()
    - use of_match_ptr()
    - document all new properties
    - remove of_address_to_resource() and do not overwrite
      resource table
    - whitespace fixes
    - remove hw_ram_addr as it is not used in current
      board code
- no changes for v3
- changes for v4:
  add comments from Nori Sekhar:
  - move devictree documentation to:
    Documentation/devicetree/bindings/net/davinci_emac.txt
  - fix typo in it
  - rename compatible property to "ti,davinci-dm6467-emac"
  - remove pinmux-handle
  - set version directly in pdata->version
---
 .../devicetree/bindings/net/davinci_emac.txt       |   41 +++++++++
 drivers/net/ethernet/ti/davinci_emac.c             |   87 +++++++++++++++++++-
 2 files changed, 127 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/davinci_emac.txt

diff --git a/Documentation/devicetree/bindings/net/davinci_emac.txt b/Documentation/devicetree/bindings/net/davinci_emac.txt
new file mode 100644
index 0000000..48b259e
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/davinci_emac.txt
@@ -0,0 +1,41 @@
+* Texas Instruments Davinci EMAC
+
+This file provides information, what the device node
+for the davinci_emac interface contains.
+
+Required properties:
+- compatible: "ti,davinci-dm6467-emac";
+- reg: Offset and length of the register set for the device
+- ti,davinci-ctrl-reg-offset: offset to control register
+- ti,davinci-ctrl-mod-reg-offset: offset to control module register
+- ti,davinci-ctrl-ram-offset: offset to control module ram
+- ti,davinci-ctrl-ram-size: size of control module ram
+- ti,davinci-rmii-en: use RMII
+- ti,davinci-no-bd-ram: has the emac controller BD RAM
+- phy-handle: Contains a phandle to an Ethernet PHY.
+              if not, davinci_emac driver defaults to 100/FULL
+- interrupts: interrupt mapping for the davinci emac interrupts sources:
+              4 sources: <Receive Threshold Interrupt
+			  Receive Interrupt
+			  Transmit Interrupt
+			  Miscellaneous Interrupt>
+
+Optional properties:
+- local-mac-address : 6 bytes, mac address
+
+Example (enbw_cmc board):
+	eth0: emac@1e20000 {
+		compatible = "ti,davinci-dm6467-emac";
+		reg = <0x220000 0x4000>;
+		ti,davinci-ctrl-reg-offset = <0x3000>;
+		ti,davinci-ctrl-mod-reg-offset = <0x2000>;
+		ti,davinci-ctrl-ram-offset = <0>;
+		ti,davinci-ctrl-ram-size = <0x2000>;
+		local-mac-address = [ 00 00 00 00 00 00 ];
+		interrupts = <33
+				34
+				35
+				36
+				>;
+		interrupt-parent = <&intc>;
+	};
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 4da93a5..645618d 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -58,6 +58,12 @@
 #include <linux/io.h>
 #include <linux/uaccess.h>
 #include <linux/davinci_emac.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_net.h>
+
+#include <mach/mux.h>
 
 #include <asm/irq.h>
 #include <asm/page.h>
@@ -339,6 +345,9 @@ struct emac_priv {
 	u32 rx_addr_type;
 	atomic_t cur_tx;
 	const char *phy_id;
+#ifdef CONFIG_OF
+	struct device_node *phy_node;
+#endif
 	struct phy_device *phydev;
 	spinlock_t lock;
 	/*platform specific members*/
@@ -1762,6 +1771,75 @@ static const struct net_device_ops emac_netdev_ops = {
 #endif
 };
 
+#ifdef CONFIG_OF
+static struct emac_platform_data
+	*davinci_emac_of_get_pdata(struct platform_device *pdev,
+	struct emac_priv *priv)
+{
+	struct device_node *np;
+	struct emac_platform_data *pdata = NULL;
+	const u8 *mac_addr;
+	u32 data;
+	int ret;
+
+	pdata = pdev->dev.platform_data;
+	if (!pdata) {
+		pdata = devm_kzalloc(&pdev->dev, sizeof(*pdata), GFP_KERNEL);
+		if (!pdata)
+			goto nodata;
+	}
+
+	np = pdev->dev.of_node;
+	if (!np)
+		goto nodata;
+	else
+		pdata->version = EMAC_VERSION_2;
+
+	mac_addr = of_get_mac_address(np);
+	if (mac_addr)
+		memcpy(pdata->mac_addr, mac_addr, ETH_ALEN);
+
+	ret = of_property_read_u32(np, "ti,davinci-ctrl-reg-offset", &data);
+	if (!ret)
+		pdata->ctrl_reg_offset = data;
+
+	ret = of_property_read_u32(np, "ti,davinci-ctrl-mod-reg-offset",
+		&data);
+	if (!ret)
+		pdata->ctrl_mod_reg_offset = data;
+
+	ret = of_property_read_u32(np, "ti,davinci-ctrl-ram-offset", &data);
+	if (!ret)
+		pdata->ctrl_ram_offset = data;
+
+	ret = of_property_read_u32(np, "ti,davinci-ctrl-ram-size", &data);
+	if (!ret)
+		pdata->ctrl_ram_size = data;
+
+	ret = of_property_read_u32(np, "ti,davinci-rmii-en", &data);
+	if (!ret)
+		pdata->rmii_en = data;
+
+	ret = of_property_read_u32(np, "ti,davinci-no-bd-ram", &data);
+	if (!ret)
+		pdata->no_bd_ram = data;
+
+	priv->phy_node = of_parse_phandle(np, "phy-handle", 0);
+	if (!priv->phy_node)
+		pdata->phy_id = "";
+
+	pdev->dev.platform_data = pdata;
+nodata:
+	return  pdata;
+}
+#else
+static struct emac_platform_data
+	*davinci_emac_of_get_pdata(struct platform_device *pdev,
+	struct emac_priv *priv)
+{
+	return  pdev->dev.platform_data;
+}
+#endif
 /**
  * davinci_emac_probe: EMAC device probe
  * @pdev: The DaVinci EMAC device that we are removing
@@ -1804,7 +1882,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 
 	spin_lock_init(&priv->lock);
 
-	pdata = pdev->dev.platform_data;
+	pdata = davinci_emac_of_get_pdata(pdev, priv);
 	if (!pdata) {
 		dev_err(&pdev->dev, "no platform data\n");
 		rc = -ENODEV;
@@ -2015,6 +2093,12 @@ static const struct dev_pm_ops davinci_emac_pm_ops = {
 	.resume		= davinci_emac_resume,
 };
 
+static const struct of_device_id davinci_emac_of_match[] = {
+	{.compatible = "ti,davinci-dm6467-emac", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, davinci_emac_of_match);
+
 /**
  * davinci_emac_driver: EMAC platform driver structure
  */
@@ -2023,6 +2107,7 @@ static struct platform_driver davinci_emac_driver = {
 		.name	 = "davinci_emac",
 		.owner	 = THIS_MODULE,
 		.pm	 = &davinci_emac_pm_ops,
+		.of_match_table = of_match_ptr(davinci_emac_of_match),
 	},
 	.probe = davinci_emac_probe,
 	.remove = __devexit_p(davinci_emac_remove),
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 7/7] ARM: davinci: add support for the am1808 based enbw_cmc board
From: Heiko Schocher @ 2012-05-22 13:55 UTC (permalink / raw)
  To: davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/
  Cc: Heiko Schocher, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Woodhouse, Ben Dooks, Wolfram Sang, Sekhar Nori,
	Kevin Hilman, Wolfgang Denk, Scott Wood, Sylwester Nawrocki
In-Reply-To: <1337694920-8925-1-git-send-email-hs-ynQEQJNshbs@public.gmane.org>

- AM1808 based board
- 64 MiB DDR ram
- 2 MiB Nor flash
- 128 MiB NAND flash
- use internal RTC
- I2C support
- hwmon lm75 support
- UBI/UBIFS support
- MMC support
- USB OTG support

Signed-off-by: Heiko Schocher <hs-ynQEQJNshbs@public.gmane.org>
Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
Cc: davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/@public.gmane.org
Cc: linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Cc: linux-i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Ben Dooks <ben-linux-elnMNo+KYs3YtjvyW6yDsg@public.gmane.org>
Cc: Wolfram Sang <w.sang-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Cc: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
Cc: Kevin Hilman <khilman-l0cyMroinI0@public.gmane.org>
Cc: Wolfgang Denk <wd-ynQEQJNshbs@public.gmane.org>
Cc: Scott Wood <scottwood-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
Cc: Sylwester Nawrocki <s.nawrocki-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>

---
- post this board support with USB support, even though
  USB is only working with the 10 ms "workaround", posted here:
  http://comments.gmane.org/gmane.linux.usb.general/54505
  I see this issue also on the AM1808 TMDXEXP1808L evalboard.
- MMC and USB are not using OF support yet, ideas how to port
  this are welcome. I need for USB and MMC boards board
  specific callbacks, how to solve this with OF support?

- changes for v2:
  - changes in the nand node due to comments from Scott Wood:
    - add "ti,davinci-" prefix
    - Dashes are preferred to underscores
    - rename "nandflash" to "nand"
    - introduce new "ti,davinci" specific properties for setting
      up ecc_mode, ecc_bits, options and bbt options, instead
      using linux defines
  - changes for i2c due to comments from Sylwester Nawrocki:
    - use "cell-index" instead "id"
    - OF_DEV_AUXDATA in the machine code, instead pre-define
      platform device name
  - add comment from Grant Likely for i2c:
    - removed "id" resp. "cell-index" completely
    - fixed documentation
    - use of_match_ptr()
    - use devm_kzalloc() for allocating plattform data mem
    - fixed a whitespace issue
  - add net comments from Grant Likely:
    - add prefix "ti,davinci-" to davinci specific property names
    - remove version property
    - use compatible name "ti,davinci-dm6460-emac"
  - add comment from Grant Likely:
    - rename compatible node
    - do not use cell-index
    - CONFIG_OF required for this board
    TODO:
    - create a generic board support file, as I got no
      answer to my ping to grant, maybe this could be done
      in a second step?
- changes for v3:
  - add comments from Sergei Shtylyov:
    - rename compatible" prop to "ti,cp_intc"
    - cp_intc_init now used for Interrupt controller init
- changes for v4:
  add comment from Nori Sekhar:
  - rename davinci emac compatible property to "ti,davinci-dm6467-emac"
  - remove "pinmux-handle" property as discussed here:
    http://www.spinics.net/lists/arm-kernel/msg175701.html
    with Nori Sekhar
---
 arch/arm/boot/dts/enbw_cmc.dts                  |  172 +++++++++++
 arch/arm/configs/enbw_cmc_defconfig             |  123 ++++++++
 arch/arm/mach-davinci/Kconfig                   |    9 +
 arch/arm/mach-davinci/Makefile                  |    1 +
 arch/arm/mach-davinci/board-enbw-cmc.c          |  374 +++++++++++++++++++++++
 arch/arm/mach-davinci/include/mach/uncompress.h |    1 +
 6 files changed, 680 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/boot/dts/enbw_cmc.dts
 create mode 100644 arch/arm/configs/enbw_cmc_defconfig
 create mode 100644 arch/arm/mach-davinci/board-enbw-cmc.c

diff --git a/arch/arm/boot/dts/enbw_cmc.dts b/arch/arm/boot/dts/enbw_cmc.dts
new file mode 100644
index 0000000..2d5dea9
--- /dev/null
+++ b/arch/arm/boot/dts/enbw_cmc.dts
@@ -0,0 +1,172 @@
+/*
+ * Device Tree for the EnBW CMC plattform
+ *
+ * Copyright 2011 DENX Software Engineering GmbH
+ * Heiko Schocher <hs-ynQEQJNshbs@public.gmane.org>
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+/dts-v1/;
+/include/ "skeleton.dtsi"
+
+/ {
+	model = "EnBW CMC";
+	compatible = "enbw,cmc";
+
+	aliases {
+		ethernet0 = &eth0;
+	};
+
+	arm {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0 0xfffee000 0x00020000>;
+		intc: interrupt-controller@1 {
+			compatible = "ti,cp_intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+			ti,intc-size = <101>;
+			reg = <0x0 0x2000>;
+		};
+	};
+	soc@1c00000 {
+		compatible = "ti,da850";
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x0 0x01c00000 0x400000>;
+
+		serial0: serial@1c42000 {
+			compatible = "ti,da850", "ns16550a";
+			reg = <0x42000 0x100>;
+			clock-frequency = <150000000>;
+			reg-shift = <2>;
+			interrupts = <25>;
+			interrupt-parent = <&intc>;
+		};
+		serial1: serial@1d0c000 {
+			compatible = "ti,da850", "ns16550a";
+			reg = <0x10c000 0x100>;
+			clock-frequency = <150000000>;
+			reg-shift = <2>;
+			interrupts = <53>;
+			interrupt-parent = <&intc>;
+		};
+		serial2: serial@1d0d000 {
+			compatible = "ti,da850", "ns16550a";
+			reg = <0x10d000 0x100>;
+			clock-frequency = <150000000>;
+			reg-shift = <2>;
+			interrupts = <61>;
+			interrupt-parent = <&intc>;
+		};
+
+		eth0: emac@1e20000 {
+			compatible = "ti,davinci-dm6467-emac";
+			reg = <0x220000 0x4000>;
+			ti,davinci-ctrl-reg-offset = <0x3000>;
+			ti,davinci-ctrl-mod-reg-offset = <0x2000>;
+			ti,davinci-ctrl-ram-offset = <0>;
+			ti,davinci-ctrl-ram-size = <0x2000>;
+			local-mac-address = [ 00 00 00 00 00 00 ];
+			interrupts = <33
+					34
+					35
+					36
+					>;
+			interrupt-parent = <&intc>;
+		};
+
+		i2c@1c22000 {
+			compatible = "ti,davinci-i2c";
+			reg = <0x22000 0x1000>;
+			clock-frequency = <100000>;
+			interrupts = <15>;
+			interrupt-parent = <&intc>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			dtt@48 {
+				compatible = "national,lm75";
+				reg = <0x48>;
+			};
+		};
+	};
+	onchipram@8000000 {
+		compatible = "ti,davinci-onchipram";
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x0 0x80000000 0x20000>;
+	};
+	aemif@60000000 {
+		compatible = "ti,davinci-aemif";
+		#address-cells = <2>;
+		#size-cells = <1>;
+		reg = <0x68000000 0x80000>;
+		ranges = <2 0 0x60000000 0x02000000
+			  3 0 0x62000000 0x02000000
+			  4 0 0x64000000 0x02000000
+			  5 0 0x66000000 0x02000000
+			  6 0 0x68000000 0x02000000>;
+		cs2@68000000 {
+			compatible = "ti,davinci-cs";
+			#address-cells = <1>;
+			#size-cells = <1>;
+			/* all timings in nanoseconds */
+			cs = <2>;
+			asize = <1>;
+			ta = <0>;
+			rhold = <7>;
+			rstrobe = <42>;
+			rsetup = <14>;
+			whold = <7>;
+			wstrobe = <42>;
+			wsetup = <14>;
+			ew = <0>;
+			ss = <0>;
+		};
+		flash@2,0 {
+			compatible = "cfi-flash";
+			reg = <2 0x0 0x400000>;
+			#address-cells = <1>;
+			#size-cells = <1>;
+			bank-width = <2>;
+			device-width = <2>;
+		};
+		nand_cs: cs3@68000000 {
+			compatible = "ti,davinci-cs";
+			#address-cells = <1>;
+			#size-cells = <1>;
+			/* all timings in nanoseconds */
+			cs = <3>;
+			asize = <0>;
+			ta = <0>;
+			rhold = <7>;
+			rstrobe = <42>;
+			rsetup = <7>;
+			whold = <7>;
+			wstrobe = <14>;
+			wsetup = <7>;
+			ew = <0>;
+			ss = <0>;
+		};
+		nand@3,0 {
+			compatible = "ti,davinci-nand";
+			reg = <3 0x0 0x807ff
+				6 0x0 0x8000>;
+			#address-cells = <1>;
+			#size-cells = <1>;
+			ti,davinci-chipselect = <1>;
+			ti,davinci-mask-ale = <0>;
+			ti,davinci-mask-cle = <0>;
+			ti,davinci-mask-chipsel = <0>;
+			ti,davinci-ecc-mode = "hw";
+			ti,davinci-ecc-bits = <4>;
+			ti,davinci-nand-use-bbt;
+			timing-handle = <&nand_cs>;
+		};
+
+	};
+};
diff --git a/arch/arm/configs/enbw_cmc_defconfig b/arch/arm/configs/enbw_cmc_defconfig
new file mode 100644
index 0000000..9d98e7f
--- /dev/null
+++ b/arch/arm/configs/enbw_cmc_defconfig
@@ -0,0 +1,123 @@
+CONFIG_EXPERIMENTAL=y
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=14
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_EXPERT=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODULE_FORCE_UNLOAD=y
+CONFIG_MODVERSIONS=y
+# CONFIG_BLK_DEV_BSG is not set
+CONFIG_PARTITION_ADVANCED=y
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+CONFIG_ARCH_DAVINCI=y
+CONFIG_ARCH_DAVINCI_DA850=y
+# CONFIG_MACH_DAVINCI_DA850_EVM is not set
+CONFIG_GPIO_PCA953X=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_PREEMPT=y
+CONFIG_AEABI=y
+# CONFIG_OABI_COMPAT is not set
+CONFIG_USE_OF=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+# CONFIG_INET_LRO is not set
+CONFIG_IPV6=y
+CONFIG_NETFILTER=y
+# CONFIG_WIRELESS is not set
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+# CONFIG_FW_LOADER is not set
+CONFIG_MTD=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_CHAR=y
+CONFIG_MTD_BLKDEVS=y
+CONFIG_MTD_CFI=y
+CONFIG_MTD_CFI_INTELEXT=y
+CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_MTD_PHYSMAP=y
+CONFIG_MTD_PHYSMAP_OF=y
+CONFIG_MTD_NAND=y
+CONFIG_MTD_NAND_DAVINCI=y
+CONFIG_MTD_UBI=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_COUNT=1
+CONFIG_BLK_DEV_RAM_SIZE=32768
+CONFIG_EEPROM_AT24=y
+CONFIG_SCSI=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_NETDEVICES=y
+CONFIG_MII=y
+CONFIG_TI_DAVINCI_EMAC=y
+# CONFIG_WLAN is not set
+CONFIG_INPUT_POLLDEV=y
+# CONFIG_INPUT_MOUSEDEV is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_EVBUG=y
+# CONFIG_INPUT_KEYBOARD is not set
+# CONFIG_INPUT_MOUSE is not set
+# CONFIG_SERIO is not set
+# CONFIG_VT is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_NR_UARTS=3
+CONFIG_SERIAL_8250_RUNTIME_UARTS=3
+CONFIG_SERIAL_OF_PLATFORM=y
+CONFIG_HW_RANDOM=y
+CONFIG_I2C=y
+CONFIG_I2C_CHARDEV=y
+CONFIG_I2C_DAVINCI=y
+CONFIG_GPIO_SYSFS=y
+CONFIG_GPIO_PCF857X=y
+CONFIG_SENSORS_LM75=y
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_CORE=y
+CONFIG_DAVINCI_WATCHDOG=y
+# CONFIG_HID_SUPPORT is not set
+CONFIG_USB=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_MUSB_HDRC=y
+CONFIG_USB_MUSB_DA8XX=y
+CONFIG_USB_STORAGE=y
+CONFIG_USB_UAS=y
+CONFIG_USB_LIBUSUAL=y
+CONFIG_USB_GADGET=y
+CONFIG_USB_FUSB300=y
+CONFIG_USB_ETH=y
+CONFIG_MMC=y
+CONFIG_MMC_DAVINCI=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_OMAP=y
+CONFIG_EXT2_FS=y
+CONFIG_EXT3_FS=y
+CONFIG_AUTOFS4_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_TMPFS=y
+CONFIG_UBIFS_FS=y
+CONFIG_CRAMFS=y
+CONFIG_NFS_FS=y
+CONFIG_NFS_V3=y
+CONFIG_ROOT_NFS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ASCII=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_NLS_UTF8=y
+CONFIG_DEBUG_FS=y
+CONFIG_TIMER_STATS=y
+CONFIG_DEBUG_RT_MUTEXES=y
+CONFIG_DEBUG_MUTEXES=y
+# CONFIG_CRYPTO_ANSI_CPRNG is not set
+# CONFIG_CRYPTO_HW is not set
+CONFIG_CRC_CCITT=m
+CONFIG_CRC_T10DIF=m
diff --git a/arch/arm/mach-davinci/Kconfig b/arch/arm/mach-davinci/Kconfig
index 32d837d..4cb0469 100644
--- a/arch/arm/mach-davinci/Kconfig
+++ b/arch/arm/mach-davinci/Kconfig
@@ -202,6 +202,15 @@ config DA850_WL12XX
 	  Say Y if you want to use a wl1271 expansion card connected to the
 	  AM18x EVM.
 
+config MACH_ENBW_CMC
+	bool "EnBW Communication Module Compact"
+	default ARCH_DAVINCI_DA850
+	depends on ARCH_DAVINCI_DA850
+	select OF
+	help
+	  Say Y here to select the EnBW Communication Module Compact
+	  board.
+
 config GPIO_PCA953X
 	default MACH_DAVINCI_DA850_EVM
 
diff --git a/arch/arm/mach-davinci/Makefile b/arch/arm/mach-davinci/Makefile
index 2db78bd..12f3166 100644
--- a/arch/arm/mach-davinci/Makefile
+++ b/arch/arm/mach-davinci/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_MACH_DAVINCI_DA850_EVM)	+= board-da850-evm.o
 obj-$(CONFIG_MACH_TNETV107X)		+= board-tnetv107x-evm.o
 obj-$(CONFIG_MACH_MITYOMAPL138)		+= board-mityomapl138.o
 obj-$(CONFIG_MACH_OMAPL138_HAWKBOARD)	+= board-omapl138-hawk.o
+obj-$(CONFIG_MACH_ENBW_CMC)		+= board-enbw-cmc.o
 
 # Power Management
 obj-$(CONFIG_CPU_FREQ)			+= cpufreq.o
diff --git a/arch/arm/mach-davinci/board-enbw-cmc.c b/arch/arm/mach-davinci/board-enbw-cmc.c
new file mode 100644
index 0000000..fcec14f
--- /dev/null
+++ b/arch/arm/mach-davinci/board-enbw-cmc.c
@@ -0,0 +1,374 @@
+/*
+ * EnBW Communication Module Compact board
+ * Copyright 2011 DENX Software Engineering GmbH
+ * Author: Heiko Schocher <hs-ynQEQJNshbs@public.gmane.org>
+ *
+ * based on:
+ * TI DA850/OMAP-L138 EVM board
+ *
+ * Copyright (C) 2009 Texas Instruments Incorporated - http://www.ti.com/
+ *
+ * Derived from: arch/arm/mach-davinci/board-da850-evm.c
+ * Original Copyrights follow:
+ *
+ * 2007, 2009 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+#include <linux/console.h>
+#include <linux/gpio.h>
+#include <linux/gpio_keys.h>
+#include <linux/i2c.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/nand.h>
+#include <linux/mtd/partitions.h>
+#include <linux/mtd/physmap.h>
+#include <linux/of.h>
+#include <linux/of_net.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
+#include <linux/phy.h>
+#include <linux/phy_fixed.h>
+#include <linux/platform_device.h>
+#include <linux/spi/spi.h>
+#include <linux/spi/flash.h>
+#include <asm/mach-types.h>
+#include <asm/mach/arch.h>
+#include <mach/aemif.h>
+#include <mach/cp_intc.h>
+#include <mach/da8xx.h>
+#include <mach/mux.h>
+#include <mach/nand.h>
+#include <mach/spi.h>
+
+#define ENBW_CMC_MMCSD_CD_PIN          GPIO_TO_PIN(3, 13)
+
+/*
+ * USB1 VBUS is controlled by GPIO7[12], over-current is reported on GPIO7[8].
+ */
+#define DA850_USB_VBUS_PIN	GPIO_TO_PIN(7, 12)
+#define ON_BD_USB_OVC		GPIO_TO_PIN(7, 8)
+
+#if defined(CONFIG_USB_OHCI_HCD)
+static irqreturn_t enbw_cmc_usb_ocic_irq(int irq, void *dev_id);
+static da8xx_ocic_handler_t enbw_cmc_usb_ocic_handler;
+
+static int enbw_cmc_usb_set_power(unsigned port, int on)
+{
+	gpio_set_value(DA850_USB_VBUS_PIN, on);
+	return 0;
+}
+
+static int enbw_cmc_usb_get_power(unsigned port)
+{
+	return gpio_get_value(DA850_USB_VBUS_PIN);
+}
+
+static int enbw_cmc_usb_get_oci(unsigned port)
+{
+	return !gpio_get_value(ON_BD_USB_OVC);
+}
+
+static irqreturn_t enbw_cmc_usb_ocic_irq(int, void *);
+
+static int enbw_cmc_usb_ocic_notify(da8xx_ocic_handler_t handler)
+{
+	int irq         = gpio_to_irq(ON_BD_USB_OVC);
+	int error       = 0;
+
+	if (handler != NULL) {
+		enbw_cmc_usb_ocic_handler = handler;
+
+		error = request_irq(irq, enbw_cmc_usb_ocic_irq,
+					IRQF_DISABLED | IRQF_TRIGGER_RISING |
+					IRQF_TRIGGER_FALLING,
+					"OHCI over-current indicator", NULL);
+		if (error)
+			pr_err("%s: could not request IRQ to watch "
+				"over-current indicator changes\n", __func__);
+	} else {
+		free_irq(irq, NULL);
+	}
+	return error;
+}
+
+static struct da8xx_ohci_root_hub enbw_cmc_usb11_pdata = {
+	.set_power      = enbw_cmc_usb_set_power,
+	.get_power      = enbw_cmc_usb_get_power,
+	.get_oci        = enbw_cmc_usb_get_oci,
+	.ocic_notify    = enbw_cmc_usb_ocic_notify,
+	.potpgt         = (10 + 1) / 2,  /* 10 ms max */
+};
+
+static irqreturn_t enbw_cmc_usb_ocic_irq(int irq, void *dev_id)
+{
+	enbw_cmc_usb_ocic_handler(&enbw_cmc_usb11_pdata, 1);
+	return IRQ_HANDLED;
+}
+#endif
+
+static __init void enbw_cmc_usb_init(void)
+{
+	int ret;
+	u32 cfgchip2;
+
+	/* Set up USB clock/mode in the CFGCHIP2 register. */
+	cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
+
+	/* USB2.0 PHY reference clock is AUXCLK with 24MHz */
+	cfgchip2 &= ~CFGCHIP2_REFFREQ;
+	cfgchip2 |=  CFGCHIP2_REFFREQ_24MHZ;
+
+	/*
+	 * Select internal reference clock for USB 2.0 PHY
+	 * and use it as a clock source for USB 1.1 PHY
+	 * (this is the default setting anyway).
+	 */
+	cfgchip2 &= ~CFGCHIP2_USB1PHYCLKMUX;
+	cfgchip2 |=  CFGCHIP2_USB2PHYCLKMUX;
+
+	cfgchip2 &= ~CFGCHIP2_OTGMODE;
+	cfgchip2 |=  CFGCHIP2_SESENDEN | CFGCHIP2_VBDTCTEN;
+
+	__raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
+
+	/*
+	 * SP2525A @ 5V supplies 500mA,
+	 * with the power on to power good time of 10 ms.
+	 */
+	ret = da8xx_register_usb20(500, 10);
+	if (ret)
+		pr_warning("%s: USB 2.0 registration failed: %d\n",
+			   __func__, ret);
+
+#if defined(CONFIG_USB_OHCI_HCD)
+	ret = gpio_request_one(DA850_USB_VBUS_PIN,
+			GPIOF_DIR_OUT, "USB 1.1 VBUS");
+	if (ret < 0) {
+		pr_err("%s: failed to request GPIO for USB 1.1 port "
+			"power control: %d\n", __func__, ret);
+		return;
+	}
+	gpio_direction_input(DA850_USB_VBUS_PIN);
+
+	ret = gpio_request(ON_BD_USB_OVC, "ON_BD_USB_OVC");
+	if (ret) {
+		printk(KERN_ERR "%s: failed to request GPIO for USB 1.1 port "
+		       "over-current indicator: %d\n", __func__, ret);
+		gpio_free(DA850_USB_VBUS_PIN);
+		return;
+	}
+	gpio_direction_input(ON_BD_USB_OVC);
+
+	ret = da8xx_register_usb11(&enbw_cmc_usb11_pdata);
+	if (ret) {
+		pr_warning("%s: USB 1.1 registration failed: %d\n",
+			   __func__, ret);
+		gpio_free(ON_BD_USB_OVC);
+		gpio_free(DA850_USB_VBUS_PIN);
+	}
+#endif
+
+	return;
+}
+
+static int enbw_cmc_mmc_get_ro(int index)
+{
+	return 0;
+}
+
+static int enbw_cmc_mmc_get_cd(int index)
+{
+	return gpio_get_value(ENBW_CMC_MMCSD_CD_PIN) ? 1 : 0;
+}
+
+static struct davinci_mmc_config enbw_cmc_mmc_config = {
+	.get_ro		= enbw_cmc_mmc_get_ro,
+	.get_cd		= enbw_cmc_mmc_get_cd,
+	.wires		= 4,
+	.max_freq	= 50000000,
+	.caps		= MMC_CAP_MMC_HIGHSPEED | MMC_CAP_SD_HIGHSPEED,
+	.version	= MMC_CTLR_VERSION_2,
+};
+
+static int __init enbw_cmc_config_emac(void)
+{
+	void __iomem *cfg_chip3_base;
+	u32 val;
+	struct davinci_soc_info *soc_info = &davinci_soc_info;
+
+	if (!machine_is_enbw_cmc())
+		return 0;
+
+	cfg_chip3_base = DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP3_REG);
+	val = __raw_readl(cfg_chip3_base);
+	val &= ~BIT(8);
+	pr_info("EMAC: MII PHY configured, RMII PHY will not be"
+						" functional\n");
+
+	/* configure the CFGCHIP3 register for MII */
+	__raw_writel(val, cfg_chip3_base);
+
+	/* use complete info from OF */
+	soc_info->emac_pdata = NULL;
+
+	return 0;
+}
+device_initcall(enbw_cmc_config_emac);
+
+static const s16 da850_dma0_rsv_chans[][2] = {
+	/* (offset, number) */
+	{-1, -1}
+};
+
+static const s16 da850_dma0_rsv_slots[][2] = {
+	/* (offset, number) */
+	{-1, -1}
+};
+
+static const s16 da850_dma1_rsv_chans[][2] = {
+	/* (offset, number) */
+	{-1, -1}
+};
+
+static const s16 da850_dma1_rsv_slots[][2] = {
+	/* (offset, number) */
+	{-1, -1}
+};
+
+static struct edma_rsv_info da850_edma_cc0_rsv = {
+	.rsv_chans	= da850_dma0_rsv_chans,
+	.rsv_slots	= da850_dma0_rsv_slots,
+};
+
+static struct edma_rsv_info da850_edma_cc1_rsv = {
+	.rsv_chans	= da850_dma1_rsv_chans,
+	.rsv_slots	= da850_dma1_rsv_slots,
+};
+
+static struct edma_rsv_info *da850_edma_rsv[2] = {
+	&da850_edma_cc0_rsv,
+	&da850_edma_cc1_rsv,
+};
+
+#ifdef CONFIG_CPU_FREQ
+static __init int da850_evm_init_cpufreq(void)
+{
+	switch (system_rev & 0xF) {
+	case 3:
+		da850_max_speed = 456000;
+		break;
+	case 2:
+		da850_max_speed = 408000;
+		break;
+	case 1:
+		da850_max_speed = 372000;
+		break;
+	}
+
+	return da850_register_cpufreq("pll0_sysclk3");
+}
+#else
+static __init int da850_evm_init_cpufreq(void) { return 0; }
+#endif
+
+struct of_dev_auxdata enbw_cmc_auxdata_lookup[] __initdata = {
+	OF_DEV_AUXDATA("ti,davinci-wdt", 0x01c21000, "ti,davinci-wdt", NULL),
+	OF_DEV_AUXDATA("ti,davinci-i2c", 0x01c22000, "i2c_davinci.1", NULL),
+	OF_DEV_AUXDATA("ti,davinci-i2c", 0x01e28000, "i2c_davinci.2", NULL),
+	OF_DEV_AUXDATA("ti,davinci-dm6467-emac", 0x01e20000, "davinci_emac.1",
+			NULL),
+	{}
+};
+
+const struct of_device_id enbw_cmc_bus_match_table[] = {
+	{ .compatible = "simple-bus", },
+	{ .compatible = "ti,da850", },
+	{ .compatible = "ti,davinci-onchipram", },
+	{ .compatible = "ti,davinci-aemif", },
+	{} /* Empty terminated list */
+};
+
+static __init void enbw_cmc_init(void)
+{
+	int ret;
+
+	of_platform_populate(NULL, enbw_cmc_bus_match_table,
+		enbw_cmc_auxdata_lookup, NULL);
+
+	ret = da8xx_register_watchdog();
+	if (ret)
+		pr_warning("enbw_cmc_init: watchdog registration failed: %d\n",
+				ret);
+
+	ret = da850_register_edma(da850_edma_rsv);
+	if (ret)
+		pr_warning("enbw_cmc_init: edma registration failed: %d\n",
+				ret);
+
+	/*
+	 * shut down uart 0 this port is not used on the board
+	 */
+	__raw_writel(0, IO_ADDRESS(DA8XX_UART0_BASE) + 0x30);
+
+	ret = da8xx_register_rtc();
+	if (ret)
+		pr_warning("enbw_cmc_init: rtc setup failed: %d\n", ret);
+
+	ret = da850_evm_init_cpufreq();
+	if (ret)
+		pr_warning("enbw_cmc_init: cpufreq registration failed: %d\n",
+				ret);
+
+	ret = da8xx_register_cpuidle();
+	if (ret)
+		pr_warning("enbw_cmc_init: cpuidle registration failed: %d\n",
+				ret);
+
+	ret = gpio_request(ENBW_CMC_MMCSD_CD_PIN, "MMC CD\n");
+	if (ret)
+		pr_warning("enbw_cmc_init: can not open GPIO %d\n",
+				ENBW_CMC_MMCSD_CD_PIN);
+	gpio_direction_input(ENBW_CMC_MMCSD_CD_PIN);
+
+	ret = da850_register_mmcsd1(&enbw_cmc_mmc_config);
+	if (ret)
+		pr_warning("enbw_cmc_init: mmcsd1 registration failed:"
+				" %d\n", ret);
+
+	enbw_cmc_usb_init();
+}
+
+#ifdef CONFIG_SERIAL_8250_CONSOLE
+static int __init enbw_cmc_console_init(void)
+{
+	if (!machine_is_enbw_cmc())
+		return 0;
+
+	return add_preferred_console("ttyS", 2, "115200");
+}
+console_initcall(enbw_cmc_console_init);
+#endif
+
+static void __init enbw_cmc_map_io(void)
+{
+	da850_init();
+}
+
+static const char *enbw_cmc_board_compat[] __initconst = {
+	"enbw,cmc",
+	NULL
+};
+
+MACHINE_START(ENBW_CMC, "EnBW CMC")
+	.map_io		= enbw_cmc_map_io,
+	.init_irq	= cp_intc_init,
+	.timer		= &davinci_timer,
+	.init_machine	= enbw_cmc_init,
+	.dt_compat	= enbw_cmc_board_compat,
+	.dma_zone_size	= SZ_128M,
+	.restart	= da8xx_restart,
+MACHINE_END
diff --git a/arch/arm/mach-davinci/include/mach/uncompress.h b/arch/arm/mach-davinci/include/mach/uncompress.h
index da2fb2c..6119543 100644
--- a/arch/arm/mach-davinci/include/mach/uncompress.h
+++ b/arch/arm/mach-davinci/include/mach/uncompress.h
@@ -98,6 +98,7 @@ static inline void __arch_decomp_setup(unsigned long arch_id)
 		DEBUG_LL_DA8XX(davinci_da850_evm,	2);
 		DEBUG_LL_DA8XX(mityomapl138,		1);
 		DEBUG_LL_DA8XX(omapl138_hawkboard,	2);
+		DEBUG_LL_DA8XX(enbw_cmc,		2);
 
 		/* TNETV107x boards */
 		DEBUG_LL_TNETV107X(tnetv107x,		1);
-- 
1.7.7.6

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox