Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] mac802154: correct a typo in ieee802154_alloc_device() prototype
From: Alexandre Belloni @ 2013-10-21 17:09 UTC (permalink / raw)
  To: David S. Miller
  Cc: alex.bluesman.smirnov, netdev, linux-kernel, Alexandre Belloni

This has no other impact than a cosmetic one.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 include/net/mac802154.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/mac802154.h b/include/net/mac802154.h
index d0d11df..807d6b7 100644
--- a/include/net/mac802154.h
+++ b/include/net/mac802154.h
@@ -133,7 +133,7 @@ struct ieee802154_ops {
 
 /* Basic interface to register ieee802154 device */
 struct ieee802154_dev *
-ieee802154_alloc_device(size_t priv_data_lex, struct ieee802154_ops *ops);
+ieee802154_alloc_device(size_t priv_data_len, struct ieee802154_ops *ops);
 void ieee802154_free_device(struct ieee802154_dev *dev);
 int ieee802154_register_device(struct ieee802154_dev *dev);
 void ieee802154_unregister_device(struct ieee802154_dev *dev);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH] davinci_emac.c: Fix IFF_ALLMULTI setup
From: Mariusz Ceier @ 2013-10-21 16:59 UTC (permalink / raw)
  To: David S. Miller, Mugunthan V N, Lad Prabhakar, Jingoo Han,
	Jiri Pirko
  Cc: netdev, linux-kernel, Mariusz Ceier

When IFF_ALLMULTI flag is set on interface and IFF_PROMISC isn't,
emac_dev_mcast_set should only enable RX of multicasts and reset
MACHASH registers.

It does this, but afterwards it either sets up multicast MACs
filtering or disables RX of multicasts and resets MACHASH registers
again, rendering IFF_ALLMULTI flag useless.

This patch fixes emac_dev_mcast_set, so that multicast MACs filtering and
disabling of RX of multicasts are skipped when IFF_ALLMULTI flag is set.

Tested with kernel 2.6.37.

Signed-off-by: Mariusz Ceier <mceier+kernel@gmail.com>
---
 drivers/net/ethernet/ti/davinci_emac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 67df09e..ff3bf0e 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -876,7 +876,7 @@ static void emac_dev_mcast_set(struct net_device *ndev)
 		    netdev_mc_count(ndev) > EMAC_DEF_MAX_MULTICAST_ADDRESSES) {
 			mbp_enable = (mbp_enable | EMAC_MBP_RXMCAST);
 			emac_add_mcast(priv, EMAC_ALL_MULTI_SET, NULL);
-		}
+		} else
 		if (!netdev_mc_empty(ndev)) {
 			struct netdev_hw_addr *ha;

-- 
1.8.4

^ permalink raw reply related

* Re: [PATCH RFC 5/5] net:stmmac: asynchronous tx_clean
From: Eric Dumazet @ 2013-10-21 16:30 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: Jimmy Perchet, netdev
In-Reply-To: <5265318C.5050307@st.com>

On Mon, 2013-10-21 at 15:52 +0200, Giuseppe CAVALLARO wrote:
> Hello Jimmy
> 
> On 10/16/2013 5:24 PM, Jimmy Perchet wrote:
> > Tx descriptor's cleanup and preparation are serialized, which is not necessary
> > and decrease performance.
> > In addition TX descriptor's cleanup is performed on NET_RX softirq, this is
> > confusing.
> 
> hmm, here you are changing the logic behind the tx/rx processes.
> 
> As done in many drivers, the stmmac cleans the tx resources in
> NAPI context and this is not a confuse approach ;-).
> 
> It gave me some performance improvements especially on TCP benchmarks.
> 
> > This patch unserialize tx descriptor's cleanup and preparation
> > and defer cleanup in workqueue.
> 
> So you decide to use workqueue and I kindly ask you to give me more
> details about the performance improvements (UDP/TCP) and cpu usage.
> 
> I can try to do some tests on my side too. This could take a while
> unfortunately.

Anyway this patch is buggy.

1) Removing tx_lock spinlock in TX completion adds a race in
stmmac_xmit()

2) Generally speaking, we should not rely on a work queue to perform TX
completions.

Think about being flooded by incoming frames.

Work queue could never be scheduled.

^ permalink raw reply

* Re: [PATCH RFC 4/5] net:stmmac: fix jumbo frame handling.
From: Jimmy PERCHET @ 2013-10-21 16:28 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: netdev, jimmy.perchet
In-Reply-To: <52652EE7.2060500@st.com>

On 21/10/2013 15:40, Giuseppe CAVALLARO wrote:
> On 10/16/2013 5:24 PM, Jimmy Perchet wrote:
>> This patch addresses several issues which prevent jumbo frames from working properly :
>> .jumbo frames' last descriptor was not closed
>> .several confusion regarding descriptor's max buffer size
>> .frags could not be jumbo
>>
>> Signed-off-by: Jimmy Perchet <jimmy.perchet@parrot.com>
> 
> 
> Jimmy, thx for thi patch. BElow some my first notes.

Thanks a lot for this first review.

> I'll continue to look at the patch to verify if I missed
> soemthing. I kindly ask you, for the next version, to add
> more comments especially in the function to prepare the
> tx desc in order to help me on reviewing.

Sure ;)

I hope do v2 by next week.

I'm OK with most of your comments. Some additional
notes below:

>>   }
>> @@ -81,7 +81,7 @@ static inline void ndesc_end_tx_desc_on_ring(struct dma_desc *p, int ter)
>>
>>   static inline void norm_set_tx_desc_len_on_ring(struct dma_desc *p, int len)
>>   {
>> -    if (unlikely(len > BUF_SIZE_2KiB)) {
>> +    if (unlikely(len >= BUF_SIZE_2KiB)) {
> 
> we cannot manage a size of 2048 on normal desc
> 
> Pls you should verify to not break the back-compatibility.

IMHO, this actually fix the problem you think I create.
In current code, if len is equal to 2048, buffer1_size is set to 2048,
this is wrong because the max size is actually 2047...

> 
>>           p->des01.etx.buffer1_size = BUF_SIZE_2KiB - 1;
>>           p->des01.etx.buffer2_size = len - p->des01.etx.buffer1_size;
>>       } else



>>
>>   static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
>>   {
>> @@ -103,13 +90,13 @@ static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
>>       if (unlikely(priv->plat->has_gmac))
>>           /* Fill DES3 in case of RING mode */
>>           if (priv->dma_buf_sz >= BUF_SIZE_8KiB)
>> -            p->des3 = p->des2 + BUF_SIZE_8KiB;
>> +            p->des3 = p->des2 + BUF_SIZE_8KiB - 1;
> 
> is it correct? can you check?

The actual buffer's max size is 8191, so, in ring mode,
the second buffer must start at p->des2 + 8191.

>> -    priv->cur_tx++;
>> +    priv->cur_tx += nb_desc;
> 
> can we avoid to use the nb_desc?
Actually, it is a preparation for my 5th patch : I want to write cur_tx only once.
I can split this.



Best Regards,
Jimmy

^ permalink raw reply

* Stale IPv6 address accumulation on linux 3.2.17
From: Templin, Fred L @ 2013-10-21 15:50 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <loom.20131021T165114-650@post.gmane.org>

Hi,

On linux 3.2.17, I have a host that configures IPv6 addresses on
an eth0 interface based on Router Advertisements received from an
on-link linux box configured as an IPv6 router and running radvd.
When the host gets an RA, it configures both an EUI-64-based IPv6
address and an IPv6 privacy address, so it has two IPv6 addresses.
But, if I leave the host up for long periods of time, it seems to
accumulate additional IPv6 addresses - perhaps these are stale
IPv6 privacy addresses?

Is this known behavior, and if so is there a way to turn it off?
Or, perhaps this was a known bug that has been corrected in more
recent linux kernel versions?

Thanks - Fred
fred.l.templin@boeing.com

^ permalink raw reply

* Re: [BUG] 3.12.0-rcX IPv6 panic
From: Hannes Frederic Sowa @ 2013-10-21 15:52 UTC (permalink / raw)
  To: Bob Tracy; +Cc: linux-kernel, netdev
In-Reply-To: <20131021131846.GA5769@gherkin.frus.com>

Hi!

On Mon, Oct 21, 2013 at 08:18:46AM -0500, Bob Tracy wrote:
> Actually, a regression: the 3.11 kernel is rock-solid stable on my
> Alpha.
> 
> Beginning with 3.12.0-rc1, I can reliably trigger a kernel panic by
> executing the gogo6.net "gw6c" IPv6 client program.  If the networking
> layer is active, an "Oops" will eventually (within a day) occur regardless
> of whether I attempt to run "gw6c".  3.12.0-rcX is stable as long as I
> leave networking completely disabled.  The error has persisted up through
> -rc6.  Apologies for not mentioning this earlier, but the state of my
> PWS-433au has been questionable, and I wanted to make sure I had a
> legitimate bug sighting.
> 
> I'll have to transcribe the panic backtrace by hand: nothing makes it
> into any of the system logs :-(.  I *can* recall that every backtrace
> I've seen thus far has included one of the skb_copy() variants near the
> top of the list (first or second function).

Try to capture the panic via serial console. Otherwise a picture
would give us a first hint. Please watch out for lines like
skb_(over|under)_panic.

gw6c is a tunnel client? Can you post ip -6 tunnel ls?

(Also please send networking bugs to netdev@).

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching
From: Daniel Borkmann @ 2013-10-21 15:48 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Eric W. Biederman, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <526543A2.2040901-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

On 10/21/2013 05:09 PM, Daniel Wagner wrote:
> On 10/19/2013 08:16 AM, Daniel Borkmann wrote:
>> On 10/19/2013 01:21 AM, Eric W. Biederman wrote:
>>
>>> I am coming to this late.  But two concrete suggestions.
>>>
>>> 1) process groups and sessions don't change as frequently as pids.
>>>
>>> 2) It is possible to put a set of processes in their own network
>>>     namespace and pipe just the packets you want those processes to
>>>     use into that network namespace.  Using an ingress queueing filter
>>>     makes that process very efficient even if you have to filter by port.
>>
>> Actually in our case we're filtering outgoing traffic, based on which
>> local socket that originated from; so you wouldn't need all of that
>> construct. Also, you wouldn't even need to have an a-prio knowledge of
>> the application internals regarding their use of particular use of ports
>> or protocols. I don't think that such a setup will have the same
>> efficiency, ease of use, and power to distinguish the application the
>> traffic came from in such a lightweight, protocol independent and easy way.
>
> Sorry for beeing late as well (and also stupid question)
>
> Couldn't you use something from the LSM? I mean you allow the
> application to create the socket etc and then block later
> the traffic originated from that socket. Wouldn't it make
> more sense to block early?

I gave one simple example for blocking in the commit message,
that's true, but it is not limited to that, meaning we can have
much different scenarios/policies that netfilter allows us than
just blocking, e.g. fine grained settings where applications are
allowed to connect/send traffic to, application traffic marking/
conntracking, application-specific packet mangling, and so on,
just think of the whole netfilter universe.

^ permalink raw reply

* DEVELLOP YOUR INTEREST PLEASE
From: MRS MUSA SAMORA @ 2013-10-21 15:18 UTC (permalink / raw)









FROM MRS MUSA SAMORA




Hello




My name is Mrs Samora Musa,I have been suffering from Ovarian cancer disease and the doctor says that i have just few days to leave.I am from Libya, but based in Burkina Faso,Africa since ten years ago as a business woman dealing with cocoa exportation,now that i am about to end the race like this,without any family members and no child.I have $1.1 Million US DOLLARS in Bank of Africa(BOA)Burkina Faso which i instructed the bank to give African union leaders to help sick people around Africa.But my mind is not at rest because i am writing this letter now through the help of my nurse beside me here in my hospital room.
 
I also have $2.1 Million US Dollars in another bank in Burkina Faso which i want you to claim from the bank and use help less privileged people in your country,but you must assure me that you will  take only 40% of the total money and give the rest 60% to the orphanage home in your country for my heart to rest.
 
Upon the receipt of your email that you are willing and capable to execute my plan, i will instruct the bank management to make the immediate transfer into your account.
 
Sincerely,
 
Mrs Samora Musa.

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching
From: Daniel Wagner @ 2013-10-21 15:09 UTC (permalink / raw)
  To: Daniel Borkmann, Eric W. Biederman
  Cc: pablo, netfilter-devel, netdev, Tejun Heo, cgroups
In-Reply-To: <526231E0.6060903@redhat.com>

Hi Daniel

On 10/19/2013 08:16 AM, Daniel Borkmann wrote:
> On 10/19/2013 01:21 AM, Eric W. Biederman wrote:
>
>> I am coming to this late.  But two concrete suggestions.
>>
>> 1) process groups and sessions don't change as frequently as pids.
>>
>> 2) It is possible to put a set of processes in their own network
>>     namespace and pipe just the packets you want those processes to
>>     use into that network namespace.  Using an ingress queueing filter
>>     makes that process very efficient even if you have to filter by port.
>
> Actually in our case we're filtering outgoing traffic, based on which
> local socket that originated from; so you wouldn't need all of that
> construct. Also, you wouldn't even need to have an a-prio knowledge of
> the application internals regarding their use of particular use of ports
> or protocols. I don't think that such a setup will have the same
> efficiency, ease of use, and power to distinguish the application the
> traffic came from in such a lightweight, protocol independent and easy way.

Sorry for beeing late as well (and also stupid question)

Couldn't you use something from the LSM? I mean you allow the
application to create the socket etc and then block later
the traffic originated from that socket. Wouldn't it make
more sense to block early?

cheers,
daniel

^ permalink raw reply

* Re: unmanaged L2TPv3 ethernet pseudowire Cisco <=> Linux
From: Pierre Desvaux @ 2013-10-21 15:08 UTC (permalink / raw)
  To: netdev
In-Reply-To: <51535D16.4080207@katalix.com>

James Chapman <jchapman <at> katalix.com> writes:

> 
> On 27/03/13 20:08, Tomas Agartz wrote:
> > On Tue, 26 Mar 2013, James Chapman wrote:
> > 
> >> The issue is that Linux and Cisco use a different default for the
> >> L2SpecificSublayer header setting and neither implementation provides
> >> a config option to change its setting.  The Linux default is to use
> >> the Default L2SpecificSublayer as defined in the RFC. Unfortunately
> >> the Cisco default is to use no L2SpecificSublayer.
> >>
> >> The kernel already has an API to allow the L2SpecificSublayer setting
> >> to be configured. The missing piece is an iproute2 l2tp config option
> >> to configure it. I'll work on an iproute2 patch now to allow this
> >> setting to be configured.
> > 
> > I patched my iproute2 with your patch and now my tunnel is working.
> > Thank you! :)
> 
> Great. Thanks for reporting back.
> 
> >> For unmanaged tunnels, these parameters must be manually configured
> >> consistently at each side. Both Cisco and Linux default to use no
> >> cookies and both already have config parameters to set cookie
> >> parameters, if needed. However, for L2SpecificSublayer this isn't the
> >> case. We need to add a config option on the Linux side to force the
> >> same setting as Cisco is using.
> > 
> > Does the API in the kernel allow you to set the cookie? In that case it
> > seems like a good idea to add that to iproute2 as well?
> 
> It is already supported. See the cookie and peer_cookie parameters of ip
> l2tp add session.
> 
> ip l2tp help
> or
> man ip-l2tp
> 
> James
> 
> 

Hi,

I have tried an other solution to bypass this issue.
I put a 4 bytes cookie in the paquets sent by the Cisco. It looks like this:
[IPv4][L2TPv3][Cookie][payload]

With value 0, the cookie is seen by the Linux as a L2SpecificSublayer with
Sbit at 0. Wich means ignore the value of the sequence number in
L2SpecificSublayer so Linux accepts it. Linux replies automaticaly with Sbit
0 to Cisco.
Cisco is as well configured to accept a 4 bytes cookie, the
L2SpecificSublayer is now accepted as a cookie.

To configure Cisco:
xconnect 192.168.0.1 200 encapsulation l2tpv3 manual pw-class tlund
   l2tp id 200 200
   l2tp cookie local 4 0
   l2tp cookie remote 4 0

Pierre

^ permalink raw reply

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Antonio Quartulli @ 2013-10-21 15:06 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Pablo Neira Ayuso, Antonio Quartulli, David S. Miller,
	netdev@vger.kernel.org, Stephen Hemminger
In-Reply-To: <526541A0.4000504@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

On Mon, Oct 21, 2013 at 11:00:48AM -0400, Vlad Yasevich wrote:
> >>
> >> Looking at other encapsulators (PPP, iptunnel, VXLAN), they do
> >> nf_reset() on input.  Would that be appropriate for batman as well?
> >
> > I thought that too.
> >
> > But at this point, wouldn't it be better to do a reset here and remove the other
> > resets from any other encapsulation module?
> >
> > Maybe this operation is supposed to not happen if no encapsulation is involved?
> 
> This is exactly right.  The reset happens much later if there is no 
> encapsulation.  However, if there is an encapsuation that changes the 
> hader values that are used to filter, then nf_reset has to happen.
> That is why nf_reset happens input to the encapsulation layer instead
> of always on output from bridge.

I see.

Then I think that this patch can be dropped.
I will provide another patch to batman-adv so that it can reset the nf state
before extracting the payload packet.


Thanks a lot for your feedback!

Regards,

-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Vlad Yasevich @ 2013-10-21 15:00 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: Pablo Neira Ayuso, Antonio Quartulli, David S. Miller,
	netdev@vger.kernel.org, Stephen Hemminger
In-Reply-To: <20131018154123.GM2596@neomailbox.net>

On 10/18/2013 11:41 AM, Antonio Quartulli wrote:
> On Fri, Oct 18, 2013 at 11:33:06AM -0400, Vlad Yasevich wrote:
>> On 10/18/2013 10:46 AM, Antonio Quartulli wrote:
>>> On Fri, Oct 18, 2013 at 10:32:09AM -0400, Vlad Yasevich wrote:
>>>> On 10/18/2013 07:35 AM, Antonio Quartulli wrote:
>>>>> On Fri, Oct 18, 2013 at 01:10:41PM +0200, Pablo Neira Ayuso wrote:
>>>>>> On Thu, Oct 17, 2013 at 01:37:35PM +0200, Antonio Quartulli wrote:
>>>>>>> On Thu, Oct 17, 2013 at 04:28:57AM -0700, Pablo Neira Ayuso wrote:
>>>
>>> [...]
>>>
>>>>>
>>>>> The problem I was having was due to an skb entering br0 first and br1 later.
>>>>> When reaching br1 skb->nf_bridge was != NULL because of the previous processing
>>>>> in br0.
>>>>>
>>>>
>>>> Doesn't br_nf_pre_routing already take care of this for you?  It will
>>>> drop the ref on the current nf_bridge and allocate a new one.  Is that
>>>> not sufficient?
>>>
>>> In my case that line is not reached because
>>>
>>>    700         if (!IS_IP(skb) && !IS_VLAN_IP(skb) && !IS_PPPOE_IP(skb))
>>>
>>> is always true: the packet getting analysed is a batman-adv encapsulated packet,
>>> which does not match any of the three above.
>>>
>>> Cheers,
>>>
>>
>> Looking at other encapsulators (PPP, iptunnel, VXLAN), they do
>> nf_reset() on input.  Would that be appropriate for batman as well?
>
> I thought that too.
>
> But at this point, wouldn't it be better to do a reset here and remove the other
> resets from any other encapsulation module?
>
> Maybe this operation is supposed to not happen if no encapsulation is involved?

This is exactly right.  The reset happens much later if there is no 
encapsulation.  However, if there is an encapsuation that changes the 
hader values that are used to filter, then nf_reset has to happen.
That is why nf_reset happens input to the encapsulation layer instead
of always on output from bridge.

-vlad

> I thought that polishing the nf state when exiting the nf related path was a
> clean and easy solution.
>
> Moreover we avoid that any newly implemented tunneling module hit this problem again.
>
>
> Cheers,
>

^ permalink raw reply

* Re: [PATCH RFC] xfrm: Don't queue retransmitted packets if the original is still on the host
From: Steffen Klassert @ 2013-10-21 14:51 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20131018.163412.487683352492277510.davem@davemloft.net>

On Fri, Oct 18, 2013 at 04:34:12PM -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 18 Oct 2013 13:23:45 -0700
> 
> > On Fri, 2013-10-18 at 16:19 -0400, David Miller wrote:
> >> From: Steffen Klassert <steffen.klassert@secunet.com>
> >> Date: Wed, 16 Oct 2013 13:42:47 +0200
> >> 
> >> > It does not make sense to queue retransmitted packets if the
> >> > original packet is still in some queue of this host. So add
> >> > a check to xdst_queue_output() and drop the packet if the
> >> > original packet is not yet sent.
> >> > 
> >> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> >> 
> >> I have no problems with this, what about you Eric?
> > 
> > It looks fine to me.
> > 
> > Acked-by: Eric Dumazet <edumazet@google.com>
> 
> Great, Steffen I assume you'll merge this in via your ipsec tree.

Now applied to the ipsec-next tree. Thanks everyone for the review!

^ permalink raw reply

* Re: [PATCH ipsec-next] xfrm: use vmalloc_node() for percpu scratches
From: Steffen Klassert @ 2013-10-21 14:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, David S. Miller, netdev
In-Reply-To: <1382093656.3284.14.camel@edumazet-glaptop.roam.corp.google.com>

On Fri, Oct 18, 2013 at 03:54:16AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> scratches are per cpu, we can use vmalloc_node() for proper
> NUMA affinity.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied to ipsec-next, thanks a lot Eric!

^ permalink raw reply

* Re: [PATCH 15/15] net: enic: remove unnecessary pci_set_drvdata()
From: Sujith Sankar (ssujith) @ 2013-10-21 14:02 UTC (permalink / raw)
  To: Jingoo Han, 'David S. Miller'
  Cc: netdev@vger.kernel.org, Christian Benvenuti (benve),
	'Govindarajulu Varadarajan', Neel Patel (neepatel)
In-Reply-To: <00be01cecb98$8cabcd20$a6036760$%han@samsung.com>

This looks good to me.

Regards,
-Sujith

On 18/10/13 5:55 AM, "Jingoo Han" <jg1.han@samsung.com> wrote:

>The driver core clears the driver data to NULL after device_release
>or on probe failure. Thus, it is not needed to manually clear the
>device driver data to NULL.
>
>Signed-off-by: Jingoo Han <jg1.han@samsung.com>
>---
> drivers/net/ethernet/cisco/enic/enic_main.c |    2 --
> 1 file changed, 2 deletions(-)
>
>diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c
>b/drivers/net/ethernet/cisco/enic/enic_main.c
>index 7b756cf9..ff78dfa 100644
>--- a/drivers/net/ethernet/cisco/enic/enic_main.c
>+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
>@@ -2309,7 +2309,6 @@ err_out_release_regions:
> err_out_disable_device:
> 	pci_disable_device(pdev);
> err_out_free_netdev:
>-	pci_set_drvdata(pdev, NULL);
> 	free_netdev(netdev);
> 
> 	return err;
>@@ -2338,7 +2337,6 @@ static void enic_remove(struct pci_dev *pdev)
> 		enic_iounmap(enic);
> 		pci_release_regions(pdev);
> 		pci_disable_device(pdev);
>-		pci_set_drvdata(pdev, NULL);
> 		free_netdev(netdev);
> 	}
> }
>-- 
>1.7.10.4
>
>

^ permalink raw reply

* Re: [PATCH RFC 5/5] net:stmmac: asynchronous tx_clean
From: Giuseppe CAVALLARO @ 2013-10-21 13:52 UTC (permalink / raw)
  To: Jimmy Perchet; +Cc: netdev
In-Reply-To: <1381937052-8999-6-git-send-email-jimmy.perchet@parrot.com>

Hello Jimmy

On 10/16/2013 5:24 PM, Jimmy Perchet wrote:
> Tx descriptor's cleanup and preparation are serialized, which is not necessary
> and decrease performance.
> In addition TX descriptor's cleanup is performed on NET_RX softirq, this is
> confusing.

hmm, here you are changing the logic behind the tx/rx processes.

As done in many drivers, the stmmac cleans the tx resources in
NAPI context and this is not a confuse approach ;-).

It gave me some performance improvements especially on TCP benchmarks.

> This patch unserialize tx descriptor's cleanup and preparation
> and defer cleanup in workqueue.

So you decide to use workqueue and I kindly ask you to give me more
details about the performance improvements (UDP/TCP) and cpu usage.

I can try to do some tests on my side too. This could take a while
unfortunately.

peppe

>
> Signed-off-by: Jimmy Perchet <jimmy.perchet@parrot.com>
> ---
>   drivers/net/ethernet/stmicro/stmmac/chain_mode.c  |  6 +--
>   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  6 ++-
>   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 52 +++++++++++++----------
>   3 files changed, 37 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
> index d6ed0ce..8ab83cb 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
> @@ -120,7 +120,7 @@ static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
>   		 * to keep explicit chaining in the descriptor.
>   		 */
>   		p->des3 = (unsigned int)(priv->dma_rx_phy +
> -					 (((priv->dirty_rx) + 1) %
> +					 ((priv->dirty_rx + 1) %
>   					  priv->dma_rx_size) *
>   					 sizeof(struct dma_desc));
>   }
> @@ -135,9 +135,9 @@ static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
>   		 * to keep explicit chaining in the descriptor.
>   		 */
>   		p->des3 = (unsigned int)(priv->dma_tx_phy +
> -					 (((priv->dirty_tx + 1) %
> +					 ((atomic_read(&priv->dirty_tx) + 1) %
>   					   priv->dma_tx_size) *
> -					  sizeof(struct dma_desc)));
> +					  sizeof(struct dma_desc));
>   }
>
>   const struct stmmac_chain_mode_ops chain_mode_ops = {
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> index f16a9bd..d5b8906 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> @@ -38,8 +38,8 @@ struct stmmac_priv {
>   	struct dma_extended_desc *dma_etx ____cacheline_aligned_in_smp;
>   	struct dma_desc *dma_tx;
>   	struct sk_buff **tx_skbuff;
> -	unsigned int cur_tx;
> -	unsigned int dirty_tx;
> +	atomic_t cur_tx;
> +	atomic_t dirty_tx;
>   	unsigned int dma_tx_size;
>   	u32 tx_count_frames;
>   	u32 tx_coal_frames;
> @@ -106,6 +106,8 @@ struct stmmac_priv {
>   	u32 adv_ts;
>   	int use_riwt;
>   	spinlock_t ptp_lock;
> +	struct work_struct txclean_work;
> +	struct workqueue_struct *wq;
>   };
>
>   extern int phyaddr;
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 5873246..de614ad 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -48,6 +48,7 @@
>   #include <linux/seq_file.h>
>   #endif /* CONFIG_STMMAC_DEBUG_FS */
>   #include <linux/net_tstamp.h>
> +#include <linux/workqueue.h>
>   #include "stmmac_ptp.h"
>   #include "stmmac.h"
>
> @@ -205,7 +206,8 @@ static void print_pkt(unsigned char *buf, int len)
>
>   static inline u32 stmmac_tx_avail(struct stmmac_priv *priv)
>   {
> -	return priv->dirty_tx + priv->dma_tx_size - priv->cur_tx - 1;
> +	return atomic_read(&priv->dirty_tx) + priv->dma_tx_size
> +					 - atomic_read(&priv->cur_tx) - 1;
>   }
>
>   /**
> @@ -230,7 +232,7 @@ static inline void stmmac_hw_fix_mac_speed(struct stmmac_priv *priv)
>   static void stmmac_enable_eee_mode(struct stmmac_priv *priv)
>   {
>   	/* Check and enter in LPI mode */
> -	if ((priv->dirty_tx == priv->cur_tx) &&
> +	if ((atomic_read(&priv->dirty_tx) == atomic_read(&priv->cur_tx)) &&
>   	    (priv->tx_path_in_lpi_mode == false))
>   		priv->hw->mac->set_eee_mode(priv->ioaddr);
>   }
> @@ -1118,8 +1120,8 @@ static int init_dma_desc_rings(struct net_device *dev)
>   		priv->tx_skbuff[i] = NULL;
>   	}
>
> -	priv->dirty_tx = 0;
> -	priv->cur_tx = 0;
> +	atomic_set(&priv->dirty_tx, 0);
> +	atomic_set(&priv->cur_tx, 0);
>
>   	stmmac_clear_descriptors(priv);
>
> @@ -1264,17 +1266,17 @@ static void stmmac_dma_operation_mode(int mtu, struct stmmac_priv *priv)
>    * @priv: driver private structure
>    * Description: it reclaims resources after transmission completes.
>    */
> -static void stmmac_tx_clean(struct stmmac_priv *priv)
> +static void stmmac_tx_clean(struct work_struct *work)
>   {
> +	struct stmmac_priv *priv =
> +			container_of(work, struct stmmac_priv, txclean_work);
>   	unsigned int txsize = priv->dma_tx_size;
>
> -	spin_lock(&priv->tx_lock);
> -
>   	priv->xstats.tx_clean++;
>
> -	while (priv->dirty_tx != priv->cur_tx) {
> +	while (atomic_read(&priv->dirty_tx) != atomic_read(&priv->cur_tx)) {
>   		int last;
> -		unsigned int entry = priv->dirty_tx % txsize;
> +		unsigned int entry = atomic_read(&priv->dirty_tx) % txsize;
>   		struct sk_buff *skb = priv->tx_skbuff[entry];
>   		struct dma_desc *p;
>
> @@ -1308,7 +1310,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
>   		}
>   		if (netif_msg_tx_done(priv))
>   			pr_debug("%s: curr %d, dirty %d\n", __func__,
> -				 priv->cur_tx, priv->dirty_tx);
> +				 atomic_read(&priv->cur_tx),
> +				 atomic_read(&priv->dirty_tx));
>
>   		if (likely(priv->tx_skbuff_dma[entry])) {
>   			dma_unmap_single(priv->device,
> @@ -1326,7 +1329,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
>
>   		priv->hw->desc->release_tx_desc(p, priv->mode);
>
> -		priv->dirty_tx++;
> +		wmb();
> +		atomic_inc(&priv->dirty_tx);
>   	}
>   	if (unlikely(netif_queue_stopped(priv->dev) &&
>   		     stmmac_tx_avail(priv) > STMMAC_TX_THRESH(priv))) {
> @@ -1344,7 +1348,6 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
>   		stmmac_enable_eee_mode(priv);
>   		mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer));
>   	}
> -	spin_unlock(&priv->tx_lock);
>   }
>
>   static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv)
> @@ -1380,8 +1383,8 @@ static void stmmac_tx_err(struct stmmac_priv *priv)
>   			priv->hw->desc->init_tx_desc(&priv->dma_tx[i],
>   						     priv->mode,
>   						     (i == txsize - 1));
> -	priv->dirty_tx = 0;
> -	priv->cur_tx = 0;
> +	atomic_set(&priv->dirty_tx, 0);
> +	atomic_set(&priv->cur_tx, 0);
>   	priv->hw->dma->start_tx(priv->ioaddr);
>
>   	priv->dev->stats.tx_errors++;
> @@ -1401,12 +1404,16 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
>   	int status;
>
>   	status = priv->hw->dma->dma_interrupt(priv->ioaddr, &priv->xstats);
> -	if (likely((status & handle_rx)) || (status & handle_tx)) {
> +	if (likely(status & handle_rx)) {
>   		if (likely(napi_schedule_prep(&priv->napi))) {
>   			stmmac_disable_dma_irq(priv);
>   			__napi_schedule(&priv->napi);
>   		}
>   	}
> +	if (likely(status & handle_tx))
> +		queue_work(priv->wq, &priv->txclean_work);
> +
> +
>   	if (unlikely(status & tx_hard_error_bump_tc)) {
>   		/* Try to bump up the dma threshold on this failure */
>   		if (unlikely(tc != SF_DMA_MODE) && (tc <= 256)) {
> @@ -1596,8 +1603,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
>   static void stmmac_tx_timer(unsigned long data)
>   {
>   	struct stmmac_priv *priv = (struct stmmac_priv *)data;
> -
> -	stmmac_tx_clean(priv);
> +	queue_work(priv->wq, &priv->txclean_work);
>   }
>
>   /**
> @@ -1876,7 +1882,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>   	if (priv->tx_path_in_lpi_mode)
>   		stmmac_disable_eee_mode(priv);
>
> -	first_entry = entry = priv->cur_tx % txsize;
> +	first_entry = entry = atomic_read(&priv->cur_tx) % txsize;
>
>   	csum_insertion = (skb->ip_summed == CHECKSUM_PARTIAL);
>   	/* To program the descriptors according to the size of the frame */
> @@ -1944,12 +1950,13 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>   	priv->hw->desc->set_tx_owner(first);
>   	wmb();
>
> -	priv->cur_tx += nb_desc;
> +	atomic_add(nb_desc, &priv->cur_tx);
>
>   	if (netif_msg_pktdata(priv)) {
>   		pr_debug("%s: curr %d dirty=%d entry=%d, first=%p, nfrags=%d",
> -			__func__, (priv->cur_tx % txsize),
> -			(priv->dirty_tx % txsize), entry, first, nfrags);
> +			__func__, (atomic_read(&priv->cur_tx) % txsize),
> +			(atomic_read(&priv->dirty_tx) % txsize), entry,
> +			first, nfrags);
>
>   		if (priv->extend_desc)
>   			stmmac_display_ring((void *)priv->dma_etx, txsize, 1);
> @@ -2171,7 +2178,6 @@ static int stmmac_poll(struct napi_struct *napi, int budget)
>   	int work_done = 0;
>
>   	priv->xstats.napi_poll++;
> -	stmmac_tx_clean(priv);
>
>   	work_done = stmmac_rx(priv, budget);
>   	if (work_done < budget) {
> @@ -2747,6 +2753,8 @@ struct stmmac_priv *stmmac_dvr_probe(struct device *device,
>
>   	spin_lock_init(&priv->lock);
>   	spin_lock_init(&priv->tx_lock);
> +	priv->wq = create_singlethread_workqueue("stmmac_wq");
> +	INIT_WORK(&priv->txclean_work, stmmac_tx_clean);
>
>   	ret = register_netdev(ndev);
>   	if (ret) {
>

^ permalink raw reply

* Re: [PATCH RFC 1/5] net:stmmac: set threshold/store and forward mode according to mtu size.
From: Giuseppe CAVALLARO @ 2013-10-21 13:49 UTC (permalink / raw)
  To: Rayagond K; +Cc: Jimmy Perchet, netdev
In-Reply-To: <CAJ3bTp62KZVK22b3M5n03Bj20auaUFYDJk-_-U=cDxb2x9dFVQ@mail.gmail.com>

On 10/21/2013 11:58 AM, Rayagond K wrote:
> On Mon, Oct 21, 2013 at 2:17 PM, Giuseppe CAVALLARO
> <peppe.cavallaro@st.com>  wrote:
>> >On 10/16/2013 5:24 PM, Jimmy Perchet wrote:
>>> >>
>>> >>Threshold mode dma is needed on rx path if jumbo frames are expected.
>> >
>> >
>> >I think we should not force the threshold mode depending on the mtu.
> I remember SNPS HW team suggesting me to program the rx path in
> Threshold mode for jumbo frames, as it takes care of low RX FIFO size.
>

ok. no problem to keep it as default in the stmmac.

Cheers
Peppe

^ permalink raw reply

* Re: [PATCH RFC 4/5] net:stmmac: fix jumbo frame handling.
From: Giuseppe CAVALLARO @ 2013-10-21 13:40 UTC (permalink / raw)
  To: Jimmy Perchet; +Cc: netdev
In-Reply-To: <1381937052-8999-5-git-send-email-jimmy.perchet@parrot.com>

On 10/16/2013 5:24 PM, Jimmy Perchet wrote:
> This patch addresses several issues which prevent jumbo frames from working properly :
> .jumbo frames' last descriptor was not closed
> .several confusion regarding descriptor's max buffer size
> .frags could not be jumbo
>
> Signed-off-by: Jimmy Perchet <jimmy.perchet@parrot.com>


Jimmy, thx for thi patch. BElow some my first notes.
I'll continue to look at the patch to verify if I missed
soemthing. I kindly ask you, for the next version, to add
more comments especially in the function to prepare the
tx desc in order to help me on reviewing.

I hope to do some tests asap.

peppe

> ---
>   drivers/net/ethernet/stmicro/stmmac/chain_mode.c  |  95 ++++++++----------
>   drivers/net/ethernet/stmicro/stmmac/common.h      |   6 ++
>   drivers/net/ethernet/stmicro/stmmac/descs_com.h   |   8 +-
>   drivers/net/ethernet/stmicro/stmmac/enh_desc.c    |   6 ++
>   drivers/net/ethernet/stmicro/stmmac/norm_desc.c   |   6 ++
>   drivers/net/ethernet/stmicro/stmmac/ring_mode.c   |  90 ++++++++---------
>   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 112 +++++++++++-----------
>   7 files changed, 158 insertions(+), 165 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
> index d234ab5..d6ed0ce 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
> @@ -28,70 +28,58 @@
>
>   #include "stmmac.h"
>
> -static unsigned int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
> +static unsigned int stmmac_prepare_frm(void *p, void *data, unsigned int len,
> +						 int csum, unsigned int entry)
>   {
>   	struct stmmac_priv *priv = (struct stmmac_priv *)p;
> -	unsigned int txsize = priv->dma_tx_size;
> -	unsigned int entry = priv->cur_tx % txsize;
> -	struct dma_desc *desc = priv->dma_tx + entry;
> -	unsigned int nopaged_len = skb_headlen(skb);
> +	unsigned int entry_count = 0;
> +	struct dma_desc *desc;
>   	unsigned int bmax;
> -	unsigned int i = 1, len;
>
> -	if (priv->plat->enh_desc)
> -		bmax = BUF_SIZE_8KiB;
> -	else
> -		bmax = BUF_SIZE_2KiB;
> -
> -	len = nopaged_len - bmax;
> -
> -	desc->des2 = dma_map_single(priv->device, skb->data,
> -				    bmax, DMA_TO_DEVICE);
> -	priv->tx_skbuff_dma[entry] = desc->des2;
> -	priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum, STMMAC_CHAIN_MODE);
> -
> -	while (len != 0) {
> -		entry = (++priv->cur_tx) % txsize;
> +	if (priv->plat->enh_desc) {
> +		desc = (struct dma_desc *)(priv->dma_etx + entry);
> +		bmax = BUF_SIZE_8KiB - 1;
> +	} else{
>   		desc = priv->dma_tx + entry;
> -
> -		if (len > bmax) {
> -			desc->des2 = dma_map_single(priv->device,
> -						    (skb->data + bmax * i),
> -						    bmax, DMA_TO_DEVICE);
> -			priv->tx_skbuff_dma[entry] = desc->des2;
> -			priv->hw->desc->prepare_tx_desc(desc, 0, bmax, csum,
> -							STMMAC_CHAIN_MODE);
> -			priv->hw->desc->set_tx_owner(desc);
> -			priv->tx_skbuff[entry] = NULL;
> -			len -= bmax;
> -			i++;
> -		} else {
> -			desc->des2 = dma_map_single(priv->device,
> -						    (skb->data + bmax * i), len,
> -						    DMA_TO_DEVICE);
> -			priv->tx_skbuff_dma[entry] = desc->des2;
> -			priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
> -							STMMAC_CHAIN_MODE);
> -			priv->hw->desc->set_tx_owner(desc);
> -			priv->tx_skbuff[entry] = NULL;
> -			len = 0;
> -		}
> +		bmax = BUF_SIZE_2KiB - 1;
>   	}
> -	return entry;
> -}
>
> -static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
> -{
> -	unsigned int ret = 0;
> +	while (len > bmax) {
> +		desc->des2 = dma_map_single(priv->device,
> +					    data + bmax*entry_count,
> +					    bmax, DMA_TO_DEVICE);
> +		priv->tx_skbuff_dma[entry] = desc->des2;
> +		priv->tx_skbuff[entry] = NULL;
> +		priv->hw->desc->prepare_tx_desc(desc, 0, bmax, csum,
> +						STMMAC_CHAIN_MODE);
> +
> +		len -= bmax;
> +		entry++;
> +		entry %= priv->dma_tx_size;
> +		entry_count++;
> +
> +		if (priv->plat->enh_desc)
> +			desc = (struct dma_desc *)
> +					(((struct dma_extended_desc *)desc)+1);
> +		else
> +			desc++;
> +	}
>
> -	if ((enh_desc && (len > BUF_SIZE_8KiB)) ||
> -	    (!enh_desc && (len > BUF_SIZE_2KiB))) {
> -		ret = 1;
> +	if (len)	{
> +		desc->des2 = dma_map_single(priv->device,
> +					    data + bmax*entry_count,
> +					    len, DMA_TO_DEVICE);
> +		priv->tx_skbuff_dma[entry] = desc->des2;
> +		priv->tx_skbuff[entry] = NULL;
> +		priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
> +						STMMAC_CHAIN_MODE);
> +		entry_count++;
>   	}
>
> -	return ret;
> +	return entry_count;
>   }
>
> +
>   static void stmmac_init_dma_chain(void *des, dma_addr_t phy_addr,
>   				  unsigned int size, unsigned int extend_desc)
>   {
> @@ -154,8 +142,7 @@ static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
>
>   const struct stmmac_chain_mode_ops chain_mode_ops = {
>   	.init = stmmac_init_dma_chain,
> -	.is_jumbo_frm = stmmac_is_jumbo_frm,
> -	.jumbo_frm = stmmac_jumbo_frm,
> +	.prepare_frm = stmmac_prepare_frm,
>   	.refill_desc3 = stmmac_refill_desc3,
>   	.clean_desc3 = stmmac_clean_desc3,
>   };
> diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
> index 7eb8bab..5d3f734 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/common.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
> @@ -323,6 +323,8 @@ struct stmmac_desc_ops {
>   	/* Handle extra events on specific interrupts hw dependent */
>   	int (*get_rx_owner) (struct dma_desc *p);
>   	void (*set_rx_owner) (struct dma_desc *p);
> +
> +	void (*set_tx_first) (struct dma_desc *p);

pls can you use set_tx_fs to be aligned with the doc?

add it just below the ls and w/o empty line.

  ... see another my comment below for prepare_tx_desc.

>   	/* Get the receive frame size */
>   	int (*get_rx_frame_len) (struct dma_desc *p, int rx_coe_type);
>   	/* Return the reception status looking at the RDES1 */
> @@ -421,6 +423,8 @@ struct mii_regs {
>   struct stmmac_ring_mode_ops {
>   	unsigned int (*is_jumbo_frm) (int len, int ehn_desc);
>   	unsigned int (*jumbo_frm) (void *priv, struct sk_buff *skb, int csum);

you are replacing two callbacks that will be never used. So you
can remove them from the header.

> +	unsigned int (*prepare_frm) (void *p, void *data, unsigned int len,
> +						int csum, unsigned int entry);

ok I like it. pls review indentation.

>   	void (*refill_desc3) (void *priv, struct dma_desc *p);
>   	void (*init_desc3) (struct dma_desc *p);
>   	void (*clean_desc3) (void *priv, struct dma_desc *p);
> @@ -432,6 +436,8 @@ struct stmmac_chain_mode_ops {
>   		      unsigned int extend_desc);
>   	unsigned int (*is_jumbo_frm) (int len, int ehn_desc);
>   	unsigned int (*jumbo_frm) (void *priv, struct sk_buff *skb, int csum);
> +	unsigned int (*prepare_frm) (void *p, void *data, unsigned int len,
> +					 int csum, unsigned int entry);
>   	void (*refill_desc3) (void *priv, struct dma_desc *p);
>   	void (*clean_desc3) (void *priv, struct dma_desc *p);
>   };
> diff --git a/drivers/net/ethernet/stmicro/stmmac/descs_com.h b/drivers/net/ethernet/stmicro/stmmac/descs_com.h
> index 6f2cc78..cf199d6 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/descs_com.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/descs_com.h
> @@ -53,9 +53,9 @@ static inline void enh_desc_end_tx_desc_on_ring(struct dma_desc *p, int ter)
>
>   static inline void enh_set_tx_desc_len_on_ring(struct dma_desc *p, int len)
>   {
> -	if (unlikely(len > BUF_SIZE_4KiB)) {
> -		p->des01.etx.buffer1_size = BUF_SIZE_4KiB;
> -		p->des01.etx.buffer2_size = len - BUF_SIZE_4KiB;
> +	if (unlikely(len >= BUF_SIZE_8KiB)) {
> +		p->des01.etx.buffer1_size = BUF_SIZE_8KiB - 1;
> +		p->des01.etx.buffer2_size = len - p->des01.etx.buffer1_size;

When I wrote this code, the default sizes were 8KiB for tx buffers and
16KiB for rx buffers (from our RTL configuration)

For this reason, in ring mode a buffer bigger than 4KiB but lesser than
8KiB as managed in a single descriptor.

Indeed for enhanced descriptors, TBS1 and TBS2 are fine for sizes of
8KiB so for a frame of 16KiB.

This is true for old gmac (I verified the databook starting from the 
3.30a to 3.70a).

Anyway I accept this change, on HW with limitations this will be
managed in other way.

>   	} else
>   		p->des01.etx.buffer1_size = len;
>   }
> @@ -81,7 +81,7 @@ static inline void ndesc_end_tx_desc_on_ring(struct dma_desc *p, int ter)
>
>   static inline void norm_set_tx_desc_len_on_ring(struct dma_desc *p, int len)
>   {
> -	if (unlikely(len > BUF_SIZE_2KiB)) {
> +	if (unlikely(len >= BUF_SIZE_2KiB)) {

we cannot manage a size of 2048 on normal desc

Pls you should verify to not break the back-compatibility.

>   		p->des01.etx.buffer1_size = BUF_SIZE_2KiB - 1;
>   		p->des01.etx.buffer2_size = len - p->des01.etx.buffer1_size;
>   	} else
> diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
> index 7e6628a..915a7ab 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
> @@ -297,6 +297,11 @@ static void enh_desc_release_tx_desc(struct dma_desc *p, int mode)
>   		enh_desc_end_tx_desc_on_ring(p, ter);
>   }
>
> +static void enh_desc_set_tx_first(struct dma_desc *p)
> +{
> +	p->des01.etx.first_segment = 1;
> +}
> +
>   static void enh_desc_prepare_tx_desc(struct dma_desc *p, int is_fs, int len,
>   				     int csum_flag, int mode)
>   {
> @@ -393,6 +398,7 @@ const struct stmmac_desc_ops enh_desc_ops = {
>   	.get_tx_ls = enh_desc_get_tx_ls,
>   	.set_tx_owner = enh_desc_set_tx_owner,
>   	.set_rx_owner = enh_desc_set_rx_owner,
> +	.set_tx_first = enh_desc_set_tx_first,
>   	.get_rx_frame_len = enh_desc_get_rx_frame_len,
>   	.rx_extended_status = enh_desc_get_ext_status,
>   	.enable_tx_timestamp = enh_desc_enable_tx_timestamp,
> diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
> index 35ad4f4..6363776 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
> @@ -180,6 +180,11 @@ static void ndesc_release_tx_desc(struct dma_desc *p, int mode)
>   		ndesc_end_tx_desc_on_ring(p, ter);
>   }
>
> +static void ndesc_set_tx_first(struct dma_desc *p)
> +{
> +	p->des01.etx.first_segment = 1;
> +}
> +
>   static void ndesc_prepare_tx_desc(struct dma_desc *p, int is_fs, int len,
>   				  int csum_flag, int mode)
>   {
> @@ -265,6 +270,7 @@ const struct stmmac_desc_ops ndesc_ops = {
>   	.get_tx_ls = ndesc_get_tx_ls,
>   	.set_tx_owner = ndesc_set_tx_owner,
>   	.set_rx_owner = ndesc_set_rx_owner,
> +	.set_tx_first = ndesc_set_tx_first,
>   	.get_rx_frame_len = ndesc_get_rx_frame_len,
>   	.enable_tx_timestamp = ndesc_enable_tx_timestamp,
>   	.get_tx_timestamp_status = ndesc_get_tx_timestamp_status,
> diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
> index 1ef9d8a..7faa42a 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
> @@ -28,73 +28,60 @@
>
>   #include "stmmac.h"
>
> -static unsigned int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
> +
> +static unsigned int stmmac_prepare_frm(void *p, void *data, unsigned int len,
> +						 int csum, unsigned int entry)
>   {
>   	struct stmmac_priv *priv = (struct stmmac_priv *)p;
> -	unsigned int txsize = priv->dma_tx_size;
> -	unsigned int entry = priv->cur_tx % txsize;
> +	unsigned int entry_count = 0;
>   	struct dma_desc *desc;
> -	unsigned int nopaged_len = skb_headlen(skb);
> -	unsigned int bmax, len;
> +	unsigned int bmax;
>
> -	if (priv->extend_desc)
> +	if (priv->plat->enh_desc) {
>   		desc = (struct dma_desc *)(priv->dma_etx + entry);
> -	else
> +		bmax = BUF_SIZE_8KiB - 1;
> +	} else{
>   		desc = priv->dma_tx + entry;
> +		bmax = BUF_SIZE_2KiB - 1;
> +	}
>
> -	if (priv->plat->enh_desc)
> -		bmax = BUF_SIZE_8KiB;
> -	else
> -		bmax = BUF_SIZE_2KiB;
> -
> -	len = nopaged_len - bmax;
> -
> -	if (nopaged_len > BUF_SIZE_8KiB) {
> -
> -		desc->des2 = dma_map_single(priv->device, skb->data,
> -					    bmax, DMA_TO_DEVICE);
> +	while (len > 2*bmax) {
> +		desc->des2 = dma_map_single(priv->device,
> +						data + 2*bmax*entry_count,
> +						2*bmax, DMA_TO_DEVICE);
>   		priv->tx_skbuff_dma[entry] = desc->des2;
> -		desc->des3 = desc->des2 + BUF_SIZE_4KiB;
> -		priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum,
> +		priv->tx_skbuff[entry] = NULL;
> +		desc->des3 = desc->des2 + bmax;
> +		priv->hw->desc->prepare_tx_desc(desc, 0, 2*bmax, csum,
>   						STMMAC_RING_MODE);

hmm, not sure but is it now useful to pass the is_fs as parameter
in the prepare_tx_desc? This could be removed because we have a new
callback.

> -		wmb();
> -		entry = (++priv->cur_tx) % txsize;
>
> -		if (priv->extend_desc)
> -			desc = (struct dma_desc *)(priv->dma_etx + entry);
> +		len -= 2*bmax;

pls use a variable for 2*bmax that is used often. Or bmax should
directly be 2*bmax ;-)


> +		entry++;
> +		entry %= priv->dma_tx_size;
> +		entry_count++;
> +
> +		if (priv->plat->enh_desc)
> +			desc = (struct dma_desc *)
> +					(((struct dma_extended_desc *)desc)+1);
>   		else
> -			desc = priv->dma_tx + entry;
> +			desc++;
> +	}
> +	if (len) {
>
> -		desc->des2 = dma_map_single(priv->device, skb->data + bmax,
> -					    len, DMA_TO_DEVICE);
> +		desc->des2 = dma_map_single(priv->device,
> +						data + 2*bmax*entry_count,
> +						len, DMA_TO_DEVICE);
>   		priv->tx_skbuff_dma[entry] = desc->des2;
> -		desc->des3 = desc->des2 + BUF_SIZE_4KiB;
> -		priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
> -						STMMAC_RING_MODE);
> -		wmb();
> -		priv->hw->desc->set_tx_owner(desc);
>   		priv->tx_skbuff[entry] = NULL;
> -	} else {
> -		desc->des2 = dma_map_single(priv->device, skb->data,
> -					    nopaged_len, DMA_TO_DEVICE);
> -		priv->tx_skbuff_dma[entry] = desc->des2;
> -		desc->des3 = desc->des2 + BUF_SIZE_4KiB;
> -		priv->hw->desc->prepare_tx_desc(desc, 1, nopaged_len, csum,
> +		desc->des3 = desc->des2 + bmax;
> +		priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
>   						STMMAC_RING_MODE);
> +		entry_count++;
>   	}
>
> -	return entry;
> +	return entry_count;
>   }
>
> -static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
> -{
> -	unsigned int ret = 0;
> -
> -	if (len >= BUF_SIZE_4KiB)
> -		ret = 1;
> -
> -	return ret;
> -}
>
>   static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
>   {
> @@ -103,13 +90,13 @@ static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
>   	if (unlikely(priv->plat->has_gmac))
>   		/* Fill DES3 in case of RING mode */
>   		if (priv->dma_buf_sz >= BUF_SIZE_8KiB)
> -			p->des3 = p->des2 + BUF_SIZE_8KiB;
> +			p->des3 = p->des2 + BUF_SIZE_8KiB - 1;

is it correct? can you check?

>   }
>
>   /* In ring mode we need to fill the desc3 because it is used as buffer */
>   static void stmmac_init_desc3(struct dma_desc *p)
>   {
> -	p->des3 = p->des2 + BUF_SIZE_8KiB;
> +	p->des3 = p->des2 + BUF_SIZE_8KiB - 1;
>   }
>
>   static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
> @@ -127,8 +114,7 @@ static int stmmac_set_16kib_bfsize(int mtu)
>   }
>
>   const struct stmmac_ring_mode_ops ring_mode_ops = {
> -	.is_jumbo_frm = stmmac_is_jumbo_frm,
> -	.jumbo_frm = stmmac_jumbo_frm,
> +	.prepare_frm = stmmac_prepare_frm,
>   	.refill_desc3 = stmmac_refill_desc3,
>   	.init_desc3 = stmmac_init_desc3,
>   	.clean_desc3 = stmmac_clean_desc3,
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index af04b5d..5873246 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1832,6 +1832,20 @@ static int stmmac_release(struct net_device *dev)
>   	return 0;
>   }
>
> +
> +static struct dma_desc *get_desc(struct stmmac_priv *priv, unsigned int entry)
> +{
> +	struct dma_desc *desc;
> +
> +	if (priv->plat->enh_desc)
> +		desc = (struct dma_desc *)(priv->dma_etx + entry);
> +	else
> +		desc = priv->dma_tx + entry;
> +
> +	return  desc;
> +}

this could stay in another patch and it could be also used
for other cases inside the driver to get the description
and distinguish between enhanced and "not enh" descriptors.

I didn't add it before becasue this was just done in some place but if 
you need it (and it is used in many parts) I agree that we can have a 
dedicated function (maybe inline and in an header if shared).

> +
>   /**
>    *  stmmac_xmit: Tx entry point of the driver
>    *  @skb : the socket buffer
> @@ -1844,11 +1858,10 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct stmmac_priv *priv = netdev_priv(dev);
>   	unsigned int txsize = priv->dma_tx_size;
> -	unsigned int entry;
> -	int i, csum_insertion = 0, is_jumbo = 0;
> +	unsigned int entry, first_entry, nb_desc = 0;
> +	int i, csum_insertion = 0;
>   	int nfrags = skb_shinfo(skb)->nr_frags;
> -	struct dma_desc *desc, *first;
> -	unsigned int nopaged_len = skb_headlen(skb);
> +	struct dma_desc *desc = NULL, *first;
>
>   	if (unlikely(stmmac_tx_avail(priv) < nfrags + 1)) {
>   		if (!netif_queue_stopped(dev)) {
> @@ -1858,73 +1871,53 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>   		}
>   		return NETDEV_TX_BUSY;
>   	}
> -
>   	spin_lock(&priv->tx_lock);
>
>   	if (priv->tx_path_in_lpi_mode)
>   		stmmac_disable_eee_mode(priv);
>
> -	entry = priv->cur_tx % txsize;
> +	first_entry = entry = priv->cur_tx % txsize;

entry = priv->cur_tx % txsize;
first_entry = entry;

>
>   	csum_insertion = (skb->ip_summed == CHECKSUM_PARTIAL);
> -
> -	if (priv->extend_desc)
> -		desc = (struct dma_desc *)(priv->dma_etx + entry);
> +	/* To program the descriptors according to the size of the frame */
> +	if (priv->mode == STMMAC_RING_MODE)
> +		entry += priv->hw->ring->prepare_frm(priv, skb->data,
> +				skb_headlen(skb), csum_insertion, entry);
>   	else
> -		desc = priv->dma_tx + entry;
> +		entry += priv->hw->chain->prepare_frm(priv, skb->data,
> +				 skb_headlen(skb), csum_insertion, entry);
>
> -	first = desc;
> -
> -	priv->tx_skbuff[entry] = skb;
> -
> -	/* To program the descriptors according to the size of the frame */
> -	if (priv->mode == STMMAC_RING_MODE) {
> -		is_jumbo = priv->hw->ring->is_jumbo_frm(skb->len,
> -							priv->plat->enh_desc);
> -		if (unlikely(is_jumbo))
> -			entry = priv->hw->ring->jumbo_frm(priv, skb,
> -							  csum_insertion);
> -	} else {
> -		is_jumbo = priv->hw->chain->is_jumbo_frm(skb->len,
> -							 priv->plat->enh_desc);
> -		if (unlikely(is_jumbo))
> -			entry = priv->hw->chain->jumbo_frm(priv, skb,
> -							   csum_insertion);
> -	}
> -	if (likely(!is_jumbo)) {
> -		desc->des2 = dma_map_single(priv->device, skb->data,
> -					    nopaged_len, DMA_TO_DEVICE);
> -		priv->tx_skbuff_dma[entry] = desc->des2;
> -		priv->hw->desc->prepare_tx_desc(desc, 1, nopaged_len,
> -						csum_insertion, priv->mode);
> -	} else
> -		desc = first;
> +	entry %= txsize;
>
>   	for (i = 0; i < nfrags; i++) {
>   		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> -		int len = skb_frag_size(frag);
>
> -		entry = (++priv->cur_tx) % txsize;
> -		if (priv->extend_desc)
> -			desc = (struct dma_desc *)(priv->dma_etx + entry);
> +		if (priv->mode == STMMAC_RING_MODE)
> +			entry += priv->hw->ring->prepare_frm(priv,
> +						 skb_frag_address(frag),
> +						 skb_frag_size(frag),
> +						 csum_insertion, entry);
>   		else
> -			desc = priv->dma_tx + entry;
> -
> -		desc->des2 = skb_frag_dma_map(priv->device, frag, 0, len,
> -					      DMA_TO_DEVICE);
> -		priv->tx_skbuff_dma[entry] = desc->des2;
> -		priv->tx_skbuff[entry] = NULL;
> -		priv->hw->desc->prepare_tx_desc(desc, 0, len, csum_insertion,
> -						priv->mode);
> -		wmb();
> -		priv->hw->desc->set_tx_owner(desc);
> -		wmb();
> -	}
> -
> +			entry += priv->hw->chain->prepare_frm(priv,
> +						 skb_frag_address(frag),
> +						 skb_frag_size(frag),
> +						 csum_insertion, entry);
> +
> +		entry %= txsize;
> +	}
> +	/*Set owner for all segment but the first one */
> +	for (i = first_entry; i != entry;) {
> +		desc = get_desc(priv, i);
> +		nb_desc++;
> +		if (i != first_entry)
> +			priv->hw->desc->set_tx_owner(desc);
> +		i++;
> +		i %= txsize;
> +	}

this sounds to be quite ineffective. We do another loop in a
critical path.

Also you didn't add barriers that were needed and fixed problems
in some platforms in the past

Maybe we could improve this operation and prepare the descr
setting the own bit unless for the first one.

> +	BUG_ON(desc == NULL);
>   	/* Finalize the latest segment. */
>   	priv->hw->desc->close_tx_desc(desc);
>
> -	wmb();
>   	/* According to the coalesce parameter the IC bit for the latest
>   	 * segment could be reset and the timer re-started to invoke the
>   	 * stmmac_tx function. This approach takes care about the fragments.
> @@ -1938,11 +1931,20 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>   	} else
>   		priv->tx_count_frames = 0;
>
> +	/*Prepare first segment */
> +	priv->tx_skbuff[first_entry] = skb;
> +
> +	first = get_desc(priv, first_entry);
> +
> +	priv->hw->desc->set_tx_first(first);
> +
> +	wmb();
> +
>   	/* To avoid raise condition */
>   	priv->hw->desc->set_tx_owner(first);
>   	wmb();
>
> -	priv->cur_tx++;
> +	priv->cur_tx += nb_desc;

can we avoid to use the nb_desc?

>
>   	if (netif_msg_pktdata(priv)) {
>   		pr_debug("%s: curr %d dirty=%d entry=%d, first=%p, nfrags=%d",
>

^ permalink raw reply

* Re: [PATCH net-next v3 02/07] r8169: Support for byte queue limits
From: Sergei Shtylyov @ 2013-10-21 13:38 UTC (permalink / raw)
  To: Tino Reichardt, netdev, Realtek linux nic maintainers,
	Igor Maravic, Francois Romieu
In-Reply-To: <1382296991-20289-3-git-send-email-milky-kernel@mcmilk.de>

Hello.

On 10/20/2013 11:23 PM, Tino Reichardt wrote:

> Changes to r8169 driver to use byte queue limits.


> Signed-off-by: Tino Reichardt <milky-kernel@mcmilk.de>

> ---
>   drivers/net/ethernet/realtek/r8169.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)

> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 3397cee..8f12145 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -818,6 +819,9 @@ MODULE_AUTHOR("Realtek and the Linux r8169 crew <netdev@vger.kernel.org>");
>   MODULE_DESCRIPTION("RealTek RTL-8169 Gigabit Ethernet driver");
>   module_param(use_dac, int, 0);
>   MODULE_PARM_DESC(use_dac, "Enable PCI DAC. Unsafe on 32 bit PCI slot.");
> +module_param(bql_disable, bool, 0);
> +MODULE_PARM_DESC(bql_disable,
> +	"Disable Byte Queue Limits functionality (default: false)");
>   module_param_named(debug, debug.msg_enable, int, 0);
>   MODULE_PARM_DESC(debug, "Debug verbosity level (0=none, ..., 16=all)");
>   MODULE_LICENSE("GPL");
> @@ -5841,6 +5845,8 @@ static void rtl8169_tx_clear(struct rtl8169_private *tp)
>   {
>   	rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_TX_DESC);
>   	tp->cur_tx = tp->dirty_tx = 0;
> +	if (unlikely(bql_disable))
> +		netdev_reset_queue(tp->dev);

    Joe has inverted the logic in his request and you blindly followed it. He 
must have meant !bql_disable. Also, unlikely() does not really fit here, so in 
fact Joe's request was completely invalid.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 0/5] bonding: patchset for rcu use in bonding
From: Veaceslav Falico @ 2013-10-21 13:31 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <20131021132136.GE692@redhat.com>

On Mon, Oct 21, 2013 at 03:21:36PM +0200, Veaceslav Falico wrote:
>On Mon, Oct 21, 2013 at 02:41:44PM +0200, Veaceslav Falico wrote:
>>On Mon, Oct 21, 2013 at 08:32:11PM +0800, Ding Tianhong wrote:
>>>On 2013/10/21 17:35, Veaceslav Falico wrote:
>>>>On Mon, Oct 21, 2013 at 05:27:51PM +0800, Ding Tianhong wrote:
>>>>>On 2013/10/21 17:13, Veaceslav Falico wrote:
>>>>>>On Mon, Oct 21, 2013 at 04:58:36PM +0800, Ding Tianhong wrote:
>>>>>>>Hi:
>>>>>>>
>>>>>>>The Patch Set will remove the invalid lock for bond work queue and replace it
>>>>>>>with rtnl lock, as read lock for bond could not protect slave list any more.
>>>>>>
>>>>>>rtnl lock is a lot more expensive than bond lock, and not only for bond,
>>>>>>but for all the networking stack.
>>>>>>
>>>>>>Why is the bond->lock invalid? It correctly protects slaves from being
>>>>>>modified concurrently.
>>>>>>
>>>>>>I don't see the point in this patchset.
>>>>>>
>>>>>
>>>>>yes, rtnl lock is a big lock, but I think bond->lock could not protect
>>>>>bond_for_each_slave any more, am I miss something?
>>>>
>>>>Why can't it protect bond_for_each_slave()?
>>>>
>>>
>>>bond_master_upper_dev_link() and bond_upper_dev_unlink() was only in rtnl lock,
>>>bond_for_each_slave may changed while loop in bond read lock, but it sees that
>>>nothing serious will happen yet.
>>>Maybe I miss something.
>>
>>Even if it is unsafe to use bond_for_each_slave() while holding bond->lock
>>- it means that we must protect the list by locking the
>>bond_upper_dev_(un)link() via bond->lock, but not by removing bond->lock
>>from everywhere where it is now. And I'm not that sure if it's safe or not.
>
>I've quickly looked over the code - yes, theoretically we could race
>between bond_for_each_slave() that is not rtnl-protected and
>bond_upper_dev_(un)link().

For this race, btw, it's enough to apply the following patch, and we're
good (we don't care if we add a slave whilst bond_for_each_slave()) -
untested patch:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d90734f..b3923e1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1737,10 +1737,10 @@ static int __bond_release_one(struct net_device *bond_dev,
  		unblock_netpoll_tx();
  		return -EINVAL;
  	}
+	bond_upper_dev_unlink(bond_dev, slave_dev);
  
  	write_unlock_bh(&bond->lock);
  
-	bond_upper_dev_unlink(bond_dev, slave_dev);
  	/* unregister rx_handler early so bond_handle_frame wouldn't be called
  	 * for this slave anymore.
  	 */

^ permalink raw reply related

* Re: [PATCH net-next v2 05/07] via-velocity: Support for byte queue limits
From: Sergei Shtylyov @ 2013-10-21 13:28 UTC (permalink / raw)
  To: Tino Reichardt, netdev, Francois Romieu
In-Reply-To: <1382292803-18875-6-git-send-email-milky-kernel@mcmilk.de>

Hello.

On 10/20/2013 10:13 PM, Tino Reichardt wrote:

> Changes to via-velocity driver to use byte queue limits.


> Signed-off-by: Tino Reichardt <milky-kernel@mcmilk.de>

> ---
>   drivers/net/ethernet/via/via-velocity.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)

> diff --git a/drivers/net/ethernet/via/via-velocity.c b/drivers/net/ethernet/via/via-velocity.c
> index d022bf9..b084404 100644
> --- a/drivers/net/ethernet/via/via-velocity.c
> +++ b/drivers/net/ethernet/via/via-velocity.c
> @@ -1,3 +1,4 @@
> +

    Spurious newline, I guess.

>   /*
>    * This code is derived from the VIA reference driver (copyright message
>    * below) provided to Red Hat by VIA Networking Technologies, Inc. for

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 0/5] bonding: patchset for rcu use in bonding
From: Veaceslav Falico @ 2013-10-21 13:21 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <20131021124144.GD692@redhat.com>

On Mon, Oct 21, 2013 at 02:41:44PM +0200, Veaceslav Falico wrote:
>On Mon, Oct 21, 2013 at 08:32:11PM +0800, Ding Tianhong wrote:
>>On 2013/10/21 17:35, Veaceslav Falico wrote:
>>>On Mon, Oct 21, 2013 at 05:27:51PM +0800, Ding Tianhong wrote:
>>>>On 2013/10/21 17:13, Veaceslav Falico wrote:
>>>>>On Mon, Oct 21, 2013 at 04:58:36PM +0800, Ding Tianhong wrote:
>>>>>>Hi:
>>>>>>
>>>>>>The Patch Set will remove the invalid lock for bond work queue and replace it
>>>>>>with rtnl lock, as read lock for bond could not protect slave list any more.
>>>>>
>>>>>rtnl lock is a lot more expensive than bond lock, and not only for bond,
>>>>>but for all the networking stack.
>>>>>
>>>>>Why is the bond->lock invalid? It correctly protects slaves from being
>>>>>modified concurrently.
>>>>>
>>>>>I don't see the point in this patchset.
>>>>>
>>>>
>>>>yes, rtnl lock is a big lock, but I think bond->lock could not protect
>>>>bond_for_each_slave any more, am I miss something?
>>>
>>>Why can't it protect bond_for_each_slave()?
>>>
>>
>>bond_master_upper_dev_link() and bond_upper_dev_unlink() was only in rtnl lock,
>>bond_for_each_slave may changed while loop in bond read lock, but it sees that
>>nothing serious will happen yet.
>>Maybe I miss something.
>
>Even if it is unsafe to use bond_for_each_slave() while holding bond->lock
>- it means that we must protect the list by locking the
>bond_upper_dev_(un)link() via bond->lock, but not by removing bond->lock
>from everywhere where it is now. And I'm not that sure if it's safe or not.

I've quickly looked over the code - yes, theoretically we could race
between bond_for_each_slave() that is not rtnl-protected and
bond_upper_dev_(un)link().

Your patchset, also, doesn't 'replace' bond->lock with rtnl_lock(), cause
everywhere the rtnl_lock() is already present, it just moves it around,
while removing the bond->lock.

The commit message is wrong and says actually nothing why is it done, how
is it done, why it's safe to do so and what do we get in the end. For every
patch, and the cover letter is also not an exception.

I'd suggest you either:

1) add bond->lock around bond_upper_dev_(un)link() (GFP_ATOMIC might be needed).

or

2) add ASSERT_RTNL() to bond_for_each_slave() macro, catch all the offenders
and remove the bond->lock from them. Also, I'm not that sure that it's safe
to do so - cause one of the slaves (not the slave list) might change, and
we might have race conditions there.

or

3) move bond_for_each_slave() to bond_for_each_slave_rcu() where appropriate.

And in any case write specific commit messages - bonding's code is really
old and full of locks that were placed for some reason (and the reason
might have gone away long ago, too), so it's really hard to say if the
change is safe or not.

I'd personally go for either 3) (preferred), or 1).

Sorry, I'm a bit tired of going in-depth on your patches. Start either
doing patches with commit messages that *prove* me that you're right (I'll,
obviously, verify it - but at least I'll know what you're doing, and won't
have to figure it out from code), or I'll start explicitly NAKing them.

Sorry again, but I don't really have time for that. I didn't have time to
review your last patchset (RCUifying the remaining transmit path), and now
I can understand nothing from their commit description, except that you've
changed bond_for_each_slave() to bond_for_each_slave_rcu() and bond->lock
to rcu_read_lock(). I'm not saying that they're wrong, just that they're
really hard to understand.

So, please, start writing commit messages.

>
>>
>>Ding
>>
>>
>>>>
>>>>Ding
>>>>
>>>>>>
>>>>>>Ding Tianhong (5):
>>>>>> bonding: remove bond read lock for bond_mii_monitor()
>>>>>> bonding: remove bond read lock for bond_alb_monitor()
>>>>>> bonding: remove bond read lock for bond_loadbalance_arp_mon()
>>>>>> bonding: remove bond read lock for bond_activebackup_arp_mon()
>>>>>> bonding: remove bond read lock for bond_3ad_state_machine_handler()
>>>>>>
>>>>>>drivers/net/bonding/bond_3ad.c  |   9 ++--
>>>>>>drivers/net/bonding/bond_alb.c  |  20 ++------
>>>>>>drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
>>>>>>3 files changed, 40 insertions(+), 89 deletions(-)
>>>>>>
>>>>>>--
>>>>>>1.8.2.1
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>.
>>>
>>
>>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC 3/5] net:stmmac: ensure we reclaim all dirty descriptors.
From: Jimmy PERCHET @ 2013-10-21 13:10 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: netdev
In-Reply-To: <5264EEE9.8070607@st.com>

Hello Peppe,

On 21/10/2013 11:07, Giuseppe CAVALLARO wrote:
> Hello Jimmy
> 
> On 10/16/2013 5:24 PM, Jimmy Perchet wrote:
>> On low speed link (10MBit/s), some TX descriptors can remain dirty
>> if the tx coalescence timer expires before they were treated. Re-arm timer
>> in this case.
>>
>> Signed-off-by: Jimmy Perchet <jimmy.perchet@parrot.com>
>> ---
>>   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index 0015175..af04b5d 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -1284,8 +1284,12 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
>>               p = priv->dma_tx + entry;
>>
>>           /* Check if the descriptor is owned by the DMA. */
>> -        if (priv->hw->desc->get_tx_owner(p))
>> +        if (priv->hw->desc->get_tx_owner(p)) {
>> +            /* Be sure to harvest remaining descriptor. */
>> +            mod_timer(&priv->txtimer,
>> +              STMMAC_COAL_TIMER(priv->tx_coal_timer));
>>               break;
>> +        }
> 
> 
> why should we reload the timer when clean the tx resource?
> This is done in the xmit where as soon as a frame has to be
> transmitted it makes sense to reload the timer.
> 
> Also I have not clear why the problem happens on 10MBit/s speed
>  Maybe there is an hidden problem (lock protection)
> that should be fixed.
> 
> How did you get this problem? Just on low speed and stress the net?
> I have never seen it.

I can reproduce this problem by issuing 9KiB jumbo frames on 10MBit/s link.
If socket's wmemory size is about 500kiB (or less), the transfer stall.
(I guess it is reproducible with 1500o frames by decreasing
socket's wmemory to 90KB)
Re-arming the timer fix this behaviour.

Here my understanding of this issue : 
With 9KiB frames and 500kiB of wmemory, only 60 frames can be
prepared in a row. It is below the tx coalescence threshold,
so there will be no interrupt. When the tx coalescence timer 
expires (40ms after), only five descriptors have to be
freed (9000*5 @ 10Mbit/s = 34ms), it is not enough to reach
the socket's wake-up threshold. We get into a deadlock :
*Socket is waiting for free buffers before performing new transfer.
*Driver is waiting for new transfer before performing cleanup.

Maybe, it is not a real life use-case, and is not worth
a patch. What do you think ?

Best Regards,
Jimmy

> 
> peppe
> 
>>
>>           /* Verify tx error by looking at the last segment. */
>>           last = priv->hw->desc->get_tx_ls(p);
>>
> 

^ permalink raw reply

* Re: [PATCH net-next 0/5] bonding: patchset for rcu use in bonding
From: Veaceslav Falico @ 2013-10-21 12:41 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <52651ECB.1080901@huawei.com>

On Mon, Oct 21, 2013 at 08:32:11PM +0800, Ding Tianhong wrote:
>On 2013/10/21 17:35, Veaceslav Falico wrote:
>> On Mon, Oct 21, 2013 at 05:27:51PM +0800, Ding Tianhong wrote:
>>> On 2013/10/21 17:13, Veaceslav Falico wrote:
>>>> On Mon, Oct 21, 2013 at 04:58:36PM +0800, Ding Tianhong wrote:
>>>>> Hi:
>>>>>
>>>>> The Patch Set will remove the invalid lock for bond work queue and replace it
>>>>> with rtnl lock, as read lock for bond could not protect slave list any more.
>>>>
>>>> rtnl lock is a lot more expensive than bond lock, and not only for bond,
>>>> but for all the networking stack.
>>>>
>>>> Why is the bond->lock invalid? It correctly protects slaves from being
>>>> modified concurrently.
>>>>
>>>> I don't see the point in this patchset.
>>>>
>>>
>>> yes, rtnl lock is a big lock, but I think bond->lock could not protect
>>> bond_for_each_slave any more, am I miss something?
>>
>> Why can't it protect bond_for_each_slave()?
>>
>
>bond_master_upper_dev_link() and bond_upper_dev_unlink() was only in rtnl lock,
>bond_for_each_slave may changed while loop in bond read lock, but it sees that
>nothing serious will happen yet.
>Maybe I miss something.

Even if it is unsafe to use bond_for_each_slave() while holding bond->lock
- it means that we must protect the list by locking the
bond_upper_dev_(un)link() via bond->lock, but not by removing bond->lock
from everywhere where it is now. And I'm not that sure if it's safe or not.

>
>Ding
>
>
>>>
>>> Ding
>>>
>>>>>
>>>>> Ding Tianhong (5):
>>>>>  bonding: remove bond read lock for bond_mii_monitor()
>>>>>  bonding: remove bond read lock for bond_alb_monitor()
>>>>>  bonding: remove bond read lock for bond_loadbalance_arp_mon()
>>>>>  bonding: remove bond read lock for bond_activebackup_arp_mon()
>>>>>  bonding: remove bond read lock for bond_3ad_state_machine_handler()
>>>>>
>>>>> drivers/net/bonding/bond_3ad.c  |   9 ++--
>>>>> drivers/net/bonding/bond_alb.c  |  20 ++------
>>>>> drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
>>>>> 3 files changed, 40 insertions(+), 89 deletions(-)
>>>>>
>>>>> --
>>>>> 1.8.2.1
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> .
>>
>
>

^ permalink raw reply

* Re: [PATCH net-next 0/5] bonding: patchset for rcu use in bonding
From: Ding Tianhong @ 2013-10-21 12:32 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <20131021093549.GC692@redhat.com>

On 2013/10/21 17:35, Veaceslav Falico wrote:
> On Mon, Oct 21, 2013 at 05:27:51PM +0800, Ding Tianhong wrote:
>> On 2013/10/21 17:13, Veaceslav Falico wrote:
>>> On Mon, Oct 21, 2013 at 04:58:36PM +0800, Ding Tianhong wrote:
>>>> Hi:
>>>>
>>>> The Patch Set will remove the invalid lock for bond work queue and replace it
>>>> with rtnl lock, as read lock for bond could not protect slave list any more.
>>>
>>> rtnl lock is a lot more expensive than bond lock, and not only for bond,
>>> but for all the networking stack.
>>>
>>> Why is the bond->lock invalid? It correctly protects slaves from being
>>> modified concurrently.
>>>
>>> I don't see the point in this patchset.
>>>
>>
>> yes, rtnl lock is a big lock, but I think bond->lock could not protect
>> bond_for_each_slave any more, am I miss something?
> 
> Why can't it protect bond_for_each_slave()?
> 

bond_master_upper_dev_link() and bond_upper_dev_unlink() was only in rtnl lock,
bond_for_each_slave may changed while loop in bond read lock, but it sees that
nothing serious will happen yet.
Maybe I miss something.

Ding


>>
>> Ding
>>
>>>>
>>>> Ding Tianhong (5):
>>>>  bonding: remove bond read lock for bond_mii_monitor()
>>>>  bonding: remove bond read lock for bond_alb_monitor()
>>>>  bonding: remove bond read lock for bond_loadbalance_arp_mon()
>>>>  bonding: remove bond read lock for bond_activebackup_arp_mon()
>>>>  bonding: remove bond read lock for bond_3ad_state_machine_handler()
>>>>
>>>> drivers/net/bonding/bond_3ad.c  |   9 ++--
>>>> drivers/net/bonding/bond_alb.c  |  20 ++------
>>>> drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
>>>> 3 files changed, 40 insertions(+), 89 deletions(-)
>>>>
>>>> -- 
>>>> 1.8.2.1
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> .
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox