Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 0/3] bonding: modify the current and add new hash functions
From: Nikolay Aleksandrov @ 2013-09-25 23:56 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico
In-Reply-To: <1380133912-1240-1-git-send-email-nikolay@redhat.com>

Self-nack on this patchset as the flow_dissector patch will make nhoff
different from before, that is the port offset will not be added to it, and
thus will make flow_keys' th_off different in skb_flow_dissect.
Again, I will wait until tomorrow to see if there're any other comments and
will re-post a v3 with this bug fixed.

Nik

^ permalink raw reply

* Re: [PATCH] moxa: drop free_irq of devm_request_irq allocated irq
From: Jingoo Han @ 2013-09-26  0:47 UTC (permalink / raw)
  To: 'Wei Yongjun', 'Jonas Jensen'
  Cc: davem, grant.likely, rob.herring, yongjun_wei, netdev,
	'Jingoo Han', 'Sachin Kamat'
In-Reply-To: <CAPgLHd8YRP7kL_jNHi3J9jFn86f=Fv-AWWUY_aRGZQOgVTNPgg@mail.gmail.com>

On Wednesday, September 25, 2013 4:33 PM, Wei Yongjun wrote:
> 
> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> 
> irq allocated with devm_request_irq should not be freed using
> free_irq, because doing so causes a dangling pointer, and a
> subsequent double free.
> 
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> ---
>  drivers/net/ethernet/moxa/moxart_ether.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
> index 83c2091..9a7fcb5 100644
> --- a/drivers/net/ethernet/moxa/moxart_ether.c
> +++ b/drivers/net/ethernet/moxa/moxart_ether.c
> @@ -531,7 +531,6 @@ static int moxart_remove(struct platform_device *pdev)
>  	struct net_device *ndev = platform_get_drvdata(pdev);
> 
>  	unregister_netdev(ndev);
> -	free_irq(ndev->irq, ndev);
>  	moxart_mac_free_memory(ndev);
>  	free_netdev(ndev);

CC'ed Sachin Kamat,

In this case, the free_irq() will be called, after calling
free_netdev(). 'ndev' is freed by free_netdev(). Then, 'ndev->irq'
is used by free_irq(). Is it right?

In my humble opinion, it seems to make the problem.

Best regards,
Jingoo Han

^ permalink raw reply

* +157.8% netperf throughput by "ipv4: raise IP_MAX_MTU to theoretical limit"
From: Fengguang Wu @ 2013-09-26  1:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Kuznetsov, Willem de Bruijn, lkp, netdev@vger.kernel.org,
	LKML

Hi Eric,

We are glad to find that your below commit brings large increase in
lo netperf throughput:

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                  761.80      +534.6%      4834.60  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                  168.10     +1317.4%      2382.70  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                  169.60      +979.4%      1830.70  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                 2154.20      +135.7%      5077.50  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                 3559.00        -3.5%      3435.20  lkp-t410/micro/netperf/120s-200%-TCP_STREAM
                 6812.70      +157.8%     17560.70  TOTAL netperf.Throughput_Mbps

The side effects are some increased/decreased lock contentions:

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                 7695.93    +16534.7%   1280196.60  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                 1917.74   +159631.5%   3063227.91  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                 2293.73   +118436.8%   2718918.57  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                50823.71     +1881.3%   1006978.18  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                62731.11    +12763.3%   8069321.26  TOTAL lock_stat.slock-AF_INET.waittime-total

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                 7007.20    +17377.5%   1224681.80  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                  960.80    +74382.1%    715623.90  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                  951.40    +55033.2%    524536.90  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                49582.00     +2141.1%   1111185.60  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                58501.40     +6012.7%   3576028.20  TOTAL lock_stat.slock-AF_INET.contentions

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                 7289.40    +16702.1%   1224771.60  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                  922.40    +77764.1%    718218.60  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                  910.10    +58394.1%    532355.20  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                51325.70     +2062.2%   1109739.10  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                60447.60     +5830.9%   3585084.50  TOTAL lock_stat.slock-AF_INET.contentions.udp_queue_rcv_skb

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                 4266.20    +13915.1%    597912.60  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                  560.50    +57667.0%    323783.90  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                  518.80    +46369.9%    241086.00  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                23391.50     +1891.7%    465887.80  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                28737.00     +5567.5%   1628670.30  TOTAL lock_stat.&(&list->lock)->rlock#2.contentions.__skb_recv_datagram

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                 4245.80    +13836.7%    591726.20  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                  563.30    +57100.8%    322211.90  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                  520.40    +45650.2%    238084.00  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                23157.10     +1898.6%    462812.40  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                28486.60     +5568.8%   1614834.50  TOTAL lock_stat.&(&list->lock)->rlock#2.contentions.sock_queue_rcv_skb

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                  466.24    +20666.8%     96823.64  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                  227.96    +39775.4%     90899.22  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                  165.76    +63787.9%    105903.10  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                 3031.79     +2325.1%     73523.25  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                 3891.75     +9334.0%    367149.20  TOTAL lock_stat.&wq->wait.waittime-total

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
                  891.80    +21266.3%    190545.00  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
                  213.30    +50036.7%    106941.50  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
                  152.60    +58410.9%     89287.60  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
                 6143.00     +2623.3%    167291.40  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
                 7400.70     +7386.7%    554065.50  TOTAL lock_stat.&wq->wait.contentions.__wake_up_sync_key

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
              4174266.80     +3137.3% 135134002.00  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
              1541864.80     +3808.9%  60270231.50  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
              1754844.30     +3034.9%  55013405.80  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
             22053438.30      +263.9%  80244072.30  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
             29524414.20     +1020.0% 330661711.60  TOTAL lock_stat.&(&zone->lock)->rlock.contentions.__free_pages_ok

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
              4303082.60     +2809.1% 125179298.60  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
              1638845.10     +3613.5%  60859041.30  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
              1883848.20     +3012.7%  58638288.40  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
             22293114.60      +216.3%  70505512.80  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
             30118890.50      +946.5% 315182141.10  TOTAL lock_stat.&(&zone->lock)->rlock.contentions.get_page_from_freelist

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
               291821.83       -96.6%      9957.12  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
               400941.64       -94.0%     24083.56  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
               692763.47       -95.1%     34040.68  TOTAL lock_stat.&mm->mmap_sem/1.holdtime-max

    35596b2796713c6a9dc0      734d2725db879f3f6fcd  
------------------------  ------------------------  
             38718739.60       -95.8%   1637024.20  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
             13064940.40       -84.5%   2031181.40  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
             64201496.70       -98.2%   1166711.50  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
              2302513.00       -97.9%     48685.00  lkp-t410/micro/netperf/120s-200%-UDP_STREAM
            118287689.70       -95.9%   4883602.10  TOTAL lock_stat.&(&base->lock)->rlock.acquisitions
......

commit 734d2725db879f3f6fcdc2b1d2a5deae105f5e95
Author: Eric Dumazet <edumazet@google.com>
Date:   Sun Aug 18 19:08:07 2013 -0700

    ipv4: raise IP_MAX_MTU to theoretical limit
    
    As discussed last year [1], there is no compelling reason
    to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF
    
    [1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2
    

:040000 040000 f2085347b7781a6a42020dc1bc5ca090f7077361 fcb2c411b5073c8ac5009a41f7a1908a352c23d5 M	net
bisect run success

# bad: [272b98c6455f00884f0350f775c5342358ebb73f] Linux 3.12-rc1
# good: [6e4664525b1db28f8c4e1130957f70a94c19213e] Linux 3.11
git bisect start '272b98c6455f00884f0350f775c5342358ebb73f' '6e4664525b1db28f8c4e1130957f70a94c19213e' '--'
# good: [57d730924d5cc2c3e280af16a9306587c3a511db] Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 57d730924d5cc2c3e280af16a9306587c3a511db
# bad: [27c7651a6a5f143eccd66db38c7a3035e1f8bcfb] Merge tag 'gpio-v3.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
git bisect bad 27c7651a6a5f143eccd66db38c7a3035e1f8bcfb
# bad: [06c54055bebf919249aa1eb68312887c3cfe77b4] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad 06c54055bebf919249aa1eb68312887c3cfe77b4
# bad: [d75ea942b360690a380da8012a51eaf6a6ebb1b1] can: flexcan: use platform_set_drvdata()
git bisect bad d75ea942b360690a380da8012a51eaf6a6ebb1b1
# good: [926489be1d2b030c17d38fa10b5921bf3409d91d] drivers: net: cpsw: Add support for new CPSW IP version present in AM43xx SoC
git bisect good 926489be1d2b030c17d38fa10b5921bf3409d91d
# good: [a5354ccaaf54ac61c6d1b350e8d3e4234dd28849] ath9k: Enable WLAN/BT Ant Diversity for WB225/WB195
git bisect good a5354ccaaf54ac61c6d1b350e8d3e4234dd28849
# good: [84ce1ddfefc3d5a8af5ede6fe16546c143117616] 6lowpan: init ipv6hdr buffer to zero
git bisect good 84ce1ddfefc3d5a8af5ede6fe16546c143117616
# bad: [a0e186003be7892fd75613a23aaafaf09f3611e6] net: fsl_pq_mdio: use platform_{get,set}_drvdata()
git bisect bad a0e186003be7892fd75613a23aaafaf09f3611e6
# good: [89d5e23210f53ab53b7ff64843bce62a106d454f] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
git bisect good 89d5e23210f53ab53b7ff64843bce62a106d454f
# good: [0aa857f83fe2674f973ad89ec7c0202f5c5d9554] moxa: fix missing unlock on error in moxart_mac_start_xmit()
git bisect good 0aa857f83fe2674f973ad89ec7c0202f5c5d9554
# bad: [0dde80268ee0a5a1511935bdb9c547191d616aa9] myri10ge: Add support for ndo_busy_poll
git bisect bad 0dde80268ee0a5a1511935bdb9c547191d616aa9
# good: [397b41746333ad386d91d23ea0f79481320dcdcc] tcp: trivial: Remove nocache argument from tcp_v4_send_synack
git bisect good 397b41746333ad386d91d23ea0f79481320dcdcc
# bad: [734d2725db879f3f6fcdc2b1d2a5deae105f5e95] ipv4: raise IP_MAX_MTU to theoretical limit
git bisect bad 734d2725db879f3f6fcdc2b1d2a5deae105f5e95
# good: [35596b2796713c6a9dc05759837fa9f0e156a200] vhost: Include linux/uio.h instead of linux/socket.h
git bisect good 35596b2796713c6a9dc05759837fa9f0e156a200
# first bad commit: [734d2725db879f3f6fcdc2b1d2a5deae105f5e95] ipv4: raise IP_MAX_MTU to theoretical limit


Compare of all good/bad commits during the bisect.

                                  iostat.cpu.user

   2.2 ++-------------------------------------------------------------------+
     2 ++               O   O              O                                |
       |                  O  O O O OO O O O  O OO O O OO   OO O O O         |
   1.8 ++                                                O                  |
   1.6 ++                                                                   |
       |                                                                    |
   1.4 O+OO O O OO O O O                                                    |
   1.2 ++                                                                   |
     1 ++                                                                   |
       |                                                                    |
   0.8 ++                                                                   |
   0.6 ++                                                                   |
       |                                                                    |
   0.4 ++               *.*.           .*.**.    .*.                       .*
   0.2 *+**-*-*-**-*-*-*----**-*-*-**-*------*-**---*-**-*-**-*-*-**-*-*-**-+


                               netperf.Throughput_Mbps

   2000 ++------------------------------------------------------------------+
   1800 ++               O OO O O OO O OO O OO O O OO O OO O OO O O         |
        |                                                                   |
   1600 ++                                                                  |
   1400 ++                                                                  |
        O OO O O OO O OO                                                    |
   1200 ++                                                                  |
   1000 ++                                                                  |
    800 ++                                                                  |
        |                                                                   |
    600 ++                                                                  |
    400 ++                                                                  |
        |                                                                   |
    200 *+**.*.*.**.*.**.*.**.*.*.**.*.**.*.**.*.*.**.*.**.*.**.*.*.**.*.**.*
      0 ++------------------------------------------------------------------+


                                  vmstat.system.in

   34000 ++---------------------O-----------O-------------------------------+
         |                     O  O OO O O O  O    O  O O OO O OO O         |
   32000 ++                                     OO   O                      |
   30000 ++                                                                 |
         |                                                                  |
   28000 ++                                                                 |
   26000 O+   O OO O O  O                                                   |
         | OO         O                                                     |
   24000 ++                                                                 |
   22000 ++                                                                 |
         |                                                                  |
   20000 ++                O                                                |
   18000 ++                                    .*                           |
         *.**.*.**.*.**.*.  .*.**.*.**.*.*.**.*  *.*.**.*.**.*.**.*.**.*.**.*
   16000 ++---------------**------------------------------------------------+


                                   vmstat.system.cs

   900000 ++---------------O------------------------------------------------+
          |                   O                                             |
   800000 ++                O  O O O    OO O OO O OO O OO   OO OO O         |
   700000 ++                        O O                   O                 |
          |                                                                 |
   600000 ++                                                                |
   500000 O+OO O OO O OO O                                                  |
          |                                                                 |
   400000 ++                                                                |
   300000 ++                                                                |
          |                                                                 |
   200000 ++                                                                |
   100000 ++                                                                |
          *.**.*.**.*.**.*.**.**.*.**.*.**.*.**.*.**.*.**.*.**.**.*.**.*.**.*
        0 ++----------------------------------------------------------------+


                     lock_stat.&(&zone->lock)->rlock.contentions

   7e+07 ++---------------O-------------------------------------------------+
         |                 O O                                              |
   6e+07 ++                    OO O      O O  O OO O OO O OO O OO O         |
         |                          OO O    O                               |
   5e+07 O+OO O OO O OO O                                                   |
         |                                                                  |
   4e+07 ++                                                                 |
         |                                                                  |
   3e+07 ++                                                                 |
         |                                                                  |
   2e+07 ++                                                                 |
         |                                                                  |
   1e+07 ++                                                                 |
         |               .**.*.                                             |
       0 *+**-*-**-*-**-*------**-*-**-*-*-**-*-**-*-**-*-**-*-**-*-**-*-**-*


             lock_stat.&(&zone->lock)->rlock.contentions.__free_pages_ok

   7e+07 ++-----------------------------------------------------------------+
         |                O  O                                              |
   6e+07 ++                O   OO O           O OO O O  O  O O  O O         |
         |                          OO O O OO         O   O                 |
   5e+07 O+OO O OO O OO O                                                   |
         |                                                                  |
   4e+07 ++                                                                 |
         |                                                                  |
   3e+07 ++                                                    O            |
         |                                                                  |
   2e+07 ++                                                                 |
         |                                                                  |
   1e+07 ++                                                                 |
         |                                                                  |
       0 *+**-*-**-*-**-*-**-*-**-*-**-*-*-**-*-**-*-**-*-**-*-**-*-**-*-**-*


         lock_stat.&(&zone->lock)->rlock.contentions.get_page_from_freelist

   7e+07 ++---------------O-------------------------------------------------+
         |                 O O                                              |
   6e+07 ++                    OO O OO O O OO O OO O OO O OO O OO O         |
         |                                                                  |
   5e+07 O+OO O OO O OO O                                                   |
         |                                                                  |
   4e+07 ++                                                                 |
         |                                                                  |
   3e+07 ++                                                                 |
         |                                                                  |
   2e+07 ++                                                                 |
         |                                                                  |
   1e+07 ++                                                                 |
         |               .**.*.                                             |
       0 *+**-*-**-*-**-*------**-*-**-*-*-**-*-**-*-**-*-**-*-**-*-**-*-**-*


                            lock_stat.&rq->lock.contentions

   300000 ++---------------------O------OO-O-O---------------O--------------+
          |                    O   OO O       O O OO O OO O O  OO O         |
   250000 ++                                                                |
          |                                                                 |
          |            O                                                    |
   200000 O+OO O OO O O  O                                                  |
          |                                                                 |
   150000 ++                                                                |
          |                                                                 |
   100000 ++                                                                |
          *. *.                      .*.* .*.* .*.*   .* .*.    *. .**.    .*
          | *  *.**.*.**.*     *.*.**    *    *    *.*  *   **.*  *    *.** |
    50000 ++              :    :                                            |
          |               : O O                                             |
        0 ++---------------**-*---------------------------------------------+


               lock_stat.clockevents_lock.contentions.clockevents_notify

   120000 ++----------------------------------------------------------------+
          |                                     *                           |
   100000 ++                                    :                           |
          |                                     :                           |
          |                                    : :                          |
    80000 ++                                   : :                          |
          |           *                        : :                          |
    60000 ++         ::                        : :                          |
          |       *. : :                    .* : :                          |
    40000 ++      : *  *.       .*         * ::   :                .*       *
          *. *.  :       *.**.**  + *.*.  +   :   :*.   *. .**.* .*  *. .* +|
          | *  *.*                 *    **    *   *  *.*  *     *      *  * |
    20000 ++          O    OO OO   O  O OO O    O    O OO O  O O            |
          O OO O OO O  O O       O  O        OO   OO        O   O O         |
        0 ++----------------------------------------------------------------+


             lock_stat.&(&base->lock)->rlock.contentions.lock_timer_base

   12000 ++-----------------------------------------------------------------+
         |              *                                                   |
   10000 *+* .*.**.    + :     * .*.          *.  .*.            .*. *.     |
         |  *      *.**  :    : *   *        +  **   **.*.**.*. *   *  *.**.*
         |                :   :      *.*.*.**                  *            |
    8000 ++               **.*                                              |
         |                                                                  |
    6000 ++                                                                 |
         |                                                                  |
    4000 ++                                                                 |
         |                                                                  |
         |                                                                  |
    2000 ++                                                                 |
         |                                                                  |
       0 O+OO-O-OO-O-OO-O-OO-O-OO-O-OO-O-O-OO-O-OO-O-OO-O-OO-O-OO-O---------+


            lock_stat.&(&base->lock)->rlock.contentions.run_timer_softirq

   7000 ++------------------------------------------------------------------+
        |                        .*                                         |
   6000 ++*            *      *.* :          *.   .*             .*. *.     |
        *  :.*.*.* .*. :      :    :         : *.*  *.*.**.    .*   *  *.**.*
   5000 ++ *      *   * :    :     :         :             *.**             |
        |               :    :     *.*.*    :                               |
   4000 ++               :  :           *.*.*                               |
        |                *.**                                               |
   3000 ++                                                                  |
        |                                                                   |
   2000 ++                                                                  |
        |                                                                   |
   1000 ++                                                                  |
        |                                                                   |
      0 O+OO-O-O-OO-O-OO-O-OO-O-O-OO-O-OO-O-OO-O-O-OO-O-OO-O-OO-O-O---------+


                           lock_stat.rcu_node_1.contentions

   120000 ++----------------------------------------------------------------+
          |                                                                 |
   100000 ++                O OO O      OO                     OO O         |
          |                O       OO      O O  O O  O OO O O               |
          | OO O O       O            O       O    O         O              |
    80000 O+      O O OO                                                    |
          |                                                                 |
    60000 ++                                                                |
          |                                                                 |
    40000 ++                                                                |
          |                                                                 |
          |                                                                 |
    20000 ++                                                                |
          |                                                                 |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                lock_stat.rcu_node_1.contentions.rcu_process_callbacks

   200000 ++------------------O---------------------------------------------+
   180000 ++                O  O O      OO O O       O O  O    OO O         |
          |                O       OO           O O     O   OO              |
   160000 ++OO O O       O            O       O    O                        |
   140000 O+      O O OO                                                    |
          |                                                                 |
   120000 ++                                                                |
   100000 ++                                                                |
    80000 ++                                                                |
          |                                                                 |
    60000 ++                                                                |
    40000 ++                                                                |
          |                                                                 |
    20000 ++                                                                |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                    lock_stat.rcu_node_1.contentions.force_qs_rnp

   20000 ++-----------------------------------------------------------------+
   18000 ++               O  O                                              |
         |                 O                                                |
   16000 ++                                                                 |
   14000 ++                                                                 |
         |                               O                                  |
   12000 ++                    OO O O  O   OO   OO   OO O OO O OO O         |
   10000 O+OO O OO O    O            O        O    O                        |
    8000 ++          OO                                                     |
         |                                                                  |
    6000 ++                                                                 |
    4000 ++                                                                 |
         |                                                                  |
    2000 ++                                                                 |
       0 *+**-*-**-*-**-*-**-*-**-*-**-*-*-**-*-**-*-**-*-**-*-**-*-**-*-**-*


                          lock_stat.slock-AF_INET.contentions

   600000 ++----------------------------------------------------------------+
          |                    O O  O   O          O   OO O  O              |
   500000 ++                       O  O  O O OO   O         O  OO O         |
          |                                     O    O                      |
          |                                                                 |
   400000 ++                O                                               |
          |                                                                 |
   300000 O+OO O OO O OO O                                                  |
          |                                                                 |
   200000 ++                                                                |
          |                                                                 |
          |                                                                 |
   100000 ++                                                                |
          |                   O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                  lock_stat.slock-AF_INET.contentions.lock_sock_fast

   600000 ++----------------------------------------------------------------+
          |                    O O                        O                 |
   500000 ++                       OO O OO O OO   OO   OO   OO OO O         |
          |                                     O    O                      |
          |                                                                 |
   400000 ++                                                                |
          |                                                                 |
   300000 O+OO O OO O OO O  O                                               |
          |                                                                 |
   200000 ++                                                                |
          |                                                                 |
          |                                                                 |
   100000 ++                                                                |
          |                   O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                 lock_stat.slock-AF_INET.contentions.udp_queue_rcv_skb

   600000 ++----------------------------------------------O-----------------+
          |                    O O  O   O    O     O   OO    O              |
   500000 ++                       O  O  O O  O   O         O  OO O         |
          |                 O                   O    O                      |
          |                                                                 |
   400000 ++                                                                |
          |                                                                 |
   300000 O+OO O OO O OO O                                                  |
          |                                                                 |
   200000 ++                                                                |
          |                                                                 |
          |                                                                 |
   100000 ++                                                                |
          |                   O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                     lock_stat.&(&list->lock)->rlock#2.contentions

   300000 ++----------------------------------------------------------------+
          |                           O O                                   |
   250000 ++                        O    O O OO   O    OO                   |
          |                    O O O            O  O      O OO OO O         |
          |                                          O                      |
   200000 ++                                                                |
          |                                                                 |
   150000 ++                                                                |
          O OO O OO O OO O                                                  |
   100000 ++                                                                |
          |                                                                 |
          |                 O                                               |
    50000 ++                                                                |
          |                   O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


           lock_stat.&(&list->lock)->rlock#2.contentions.__skb_recv_datagram

   300000 ++----------------------------------------------------------------+
          |                           O O                                   |
   250000 ++                        O    O O OO   O    OO                   |
          |                    O O O            O  O      O OO OO O         |
          |                                          O                      |
   200000 ++                                                                |
          |                                                                 |
   150000 ++                                                                |
          O OO O OO O OO O                                                  |
   100000 ++                                                                |
          |                                                                 |
          |                 O                                               |
    50000 ++                                                                |
          |                   O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


           lock_stat.&(&list->lock)->rlock#2.contentions.sock_queue_rcv_skb

   300000 ++----------------------------------------------------------------+
          |                           O O                                   |
   250000 ++                        O    O O OO   O    OO                   |
          |                    O O O            O  O      O OO OO O         |
          |                                          O                      |
   200000 ++                                                                |
          |                                                                 |
   150000 ++                                                                |
          O OO O OO O OO O                                                  |
   100000 ++                                                                |
          |                                                                 |
          |                 O                                               |
    50000 ++                                                                |
          |                   O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                            lock_stat.&wq->wait.contentions

   100000 ++-------------------O--------------------------------------------+
    90000 ++                     O  O    O O  O O OO O OO O    O            |
          |                        O  O O    O              OO  O O         |
    80000 ++                                                                |
    70000 ++                                                                |
          |                                                                 |
    60000 ++     O                                                          |
    50000 O+OO O  O O OO O                                                  |
    40000 ++                                                                |
          |                                                                 |
    30000 ++                                                                |
    20000 ++                                                                |
          |                 O                                               |
    10000 ++                  O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                      lock_stat.&wq->wait.contentions.finish_wait

   100000 ++----------------------------------------------------------------+
    90000 ++                   O         O O  O O O    OO O    O            |
          |                      O OO O O    O     O O       O  O O         |
    80000 ++                                                O               |
    70000 ++                                                                |
          |                                                                 |
    60000 ++                                                                |
    50000 O+OO O OO O OO O                                                  |
    40000 ++                                                                |
          |                                                                 |
    30000 ++                                                                |
    20000 ++                                                                |
          |                                                                 |
    10000 ++                O                                               |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*


                  lock_stat.&wq->wait.contentions.__wake_up_sync_key

   100000 ++----------------------------------------------------------------+
    90000 ++                   O         O O  O O OO   OO O    O            |
          |                      O OO O O    O       O      OO  O O         |
    80000 ++                                                                |
    70000 ++                                                                |
          |                                                                 |
    60000 ++     O                                                          |
    50000 O+OO O  O O OO O                                                  |
    40000 ++                                                                |
          |                                                                 |
    30000 ++                                                                |
    20000 ++                                                                |
          |                 O                                               |
    10000 ++                  O                                             |
        0 *+**-*-**-*-**-*-**-**-*-**-*-**-*-**-*-**-*-**-*-**-**-*-**-*-**-*

^ permalink raw reply

* Re: Device tree node for Freescale Gianfar PTP reference clock source selection
From: Scott Wood @ 2013-09-26  1:30 UTC (permalink / raw)
  To: Aida Mynzhasova
  Cc: netdev, Richard Cochran, linuxppc-dev, Claudiu Manoil, devicetree
In-Reply-To: <523FEFCD.70409@skitlab.ru>

On Mon, 2013-09-23 at 11:37 +0400, Aida Mynzhasova wrote:
> Hi,
> 
> Currently, Freescale Gianfar PTP reference clock source is determined 
> through hard-coded value in gianfar_ptp driver. I don't think that 
> recompilation of the entire module (or even worse - the kernel) is a god 
> idea when we want to change one clock source to another. So, I want to 
> add new device tree binding, which can be used as:

Is this describing the hardware or how you're using it?  If the latter,
it should be a module parameter or some sort of runtime knob instead.

> 	ptp_clock@24E00 {
> 		compatible = "fsl,etsec-ptp";
> 		reg = <0x24E00 0xB0>;
> 		interrupts = <12 0x8 13 0x8>;
> 		interrupt-parent = < &ipic >;
> 		fsl,cksel = <0>; /* <-- New entry */
> 		fsl,tclk-period = <10>;
> 		fsl,tmr-prsc    = <100>;
> 		fsl,tmr-add     = <0x999999A4>;
> 		fsl,tmr-fiper1  = <0x3B9AC9F6>;
> 		fsl,tmr-fiper2  = <0x00018696>;
> 		fsl,max-adj     = <659999998>;
> 	};
> 
> fsl,cksel acceptable values:
> 
> <0> for external clock;
> <1> for eTSEC system clock;
> <2> for eTSEC1 transmit clock;
> <3> for RTC clock input.
> 
> I am new in this mailing list, and as far as I know, I have to discuss 
> all updates for device tree files here before sending patch, which uses 
> new attributes.
> 
> Also, should I define new bindings in some special way? I want to add 
> description of cksel attribute in 
> /Documentation/devicetree/bindings/net/fsl-tsec-phy.txt. Is it enough or 
> not?

Assuming this is actually describing how the hardware is wired up, yes,
that's how you'd document it (making sure that device trees without that
property are interpreted the same as today).

-Scott

^ permalink raw reply

* Re: +157.8% netperf throughput by "ipv4: raise IP_MAX_MTU to theoretical limit"
From: Fengguang Wu @ 2013-09-26  1:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Kuznetsov, Willem de Bruijn, lkp, netdev@vger.kernel.org,
	LKML
In-Reply-To: <20130926012144.GA9931@localhost>

On Thu, Sep 26, 2013 at 09:21:44AM +0800, Fengguang Wu wrote:
> Hi Eric,
> 
> We are glad to find that your below commit brings large increase in
> lo netperf throughput:
> 
>     35596b2796713c6a9dc0      734d2725db879f3f6fcd  
> ------------------------  ------------------------  
>                   761.80      +534.6%      4834.60  lkp-ib03/micro/netperf/120s-200%-UDP_STREAM
>                   168.10     +1317.4%      2382.70  lkp-nex04/micro/netperf/120s-200%-UDP_STREAM
>                   169.60      +979.4%      1830.70  lkp-nex05/micro/netperf/120s-200%-UDP_STREAM
>                  2154.20      +135.7%      5077.50  lkp-sb03/micro/netperf/120s-200%-UDP_STREAM
>                  3559.00        -3.5%      3435.20  lkp-t410/micro/netperf/120s-200%-TCP_STREAM
>                  6812.70      +157.8%     17560.70  TOTAL netperf.Throughput_Mbps
> 
> The side effects are some increased/decreased lock contentions:

This direct view may be more clear. Before patch:

                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquis
itions   holdtime-min   holdtime-max holdtime-total
-------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------

                 &(&nf->lru_lock)->rlock:      19017744       19034681           0.15        5884.35  5772892473.69       20428335       20
475976           0.10        1109.59    77448429.38
                 -----------------------
                 &(&nf->lru_lock)->rlock        4905538          [<ffffffff819227b9>] ip_defrag+0xa4f/0xbd3
                 &(&nf->lru_lock)->rlock        5695105          [<ffffffff8195bd6f>] inet_frag_find+0x2c7/0x30d
                 &(&nf->lru_lock)->rlock        5629414          [<ffffffff8195be74>] inet_frag_kill+0xbf/0x117
                 &(&nf->lru_lock)->rlock        2804624          [<ffffffff8195bf29>] inet_frag_evictor+0x5d/0x103
                 -----------------------
                 &(&nf->lru_lock)->rlock        6172104          [<ffffffff8195bd6f>] inet_frag_find+0x2c7/0x30d
                 &(&nf->lru_lock)->rlock        5348696          [<ffffffff8195be74>] inet_frag_kill+0xbf/0x117
                 &(&nf->lru_lock)->rlock        4421308          [<ffffffff819227b9>] ip_defrag+0xa4f/0xbd3
                 &(&nf->lru_lock)->rlock        3092573          [<ffffffff8195bf29>] inet_frag_evictor+0x5d/0x103

...............................................................................................................................................................................................

                      &(&q->lock)->rlock:       2322575        2323896           0.22        5802.58   934469091.38        3041941       13000848           0.10        5902.91  2811690638.18
                      ------------------
                      &(&q->lock)->rlock        2163449          [<ffffffff8195bf7e>] inet_frag_evictor+0xb2/0x103
                      &(&q->lock)->rlock         160447          [<ffffffff81921e91>] ip_defrag+0x127/0xbd3
                      ------------------
                      &(&q->lock)->rlock        2165896          [<ffffffff81921e91>] ip_defrag+0x127/0xbd3
                      &(&q->lock)->rlock         158000          [<ffffffff8195bf7e>] inet_frag_evictor+0xb2/0x103

...............................................................................................................................................................................................

                   &(&zone->lock)->rlock:       1845042        1851805           0.18        4917.52    19475590.18        9003807       10134386           0.13        3747.70     8347088.06
                   ---------------------
                   &(&zone->lock)->rlock         866751          [<ffffffff8116fbbc>] __free_pages_ok.part.47+0x94/0x2a1
                   &(&zone->lock)->rlock         984597          [<ffffffff8116f856>] get_page_from_freelist+0x4a3/0x6e8
                   &(&zone->lock)->rlock            112          [<ffffffff8116fe3b>] free_pcppages_bulk+0x35/0x31a
                   &(&zone->lock)->rlock            116          [<ffffffff8116f72c>] get_page_from_freelist+0x379/0x6e8
                   ---------------------
                   &(&zone->lock)->rlock         918190          [<ffffffff8116fbbc>] __free_pages_ok.part.47+0x94/0x2a1
                   &(&zone->lock)->rlock            722          [<ffffffff8116f72c>] get_page_from_freelist+0x379/0x6e8
                   &(&zone->lock)->rlock            861          [<ffffffff8116fe3b>] free_pcppages_bulk+0x35/0x31a
                   &(&zone->lock)->rlock         922607          [<ffffffff8116f856>] get_page_from_freelist+0x4a3/0x6e8


After patch, top contented locks become:

                   &(&zone->lock)->rlock:      58469530       58470181           0.16        4838.84   238618042.87      107374530      107408478           0.13        3610.05    73617127.93
                   ---------------------
                   &(&zone->lock)->rlock       29783268          [<ffffffff8116f856>] get_page_from_freelist+0x4a3/0x6e8
                   &(&zone->lock)->rlock            837          [<ffffffff8116f72c>] get_page_from_freelist+0x379/0x6e8
                   &(&zone->lock)->rlock           1105          [<ffffffff8116fe3b>] free_pcppages_bulk+0x35/0x31a
                   &(&zone->lock)->rlock       28684627          [<ffffffff8116fbbc>] __free_pages_ok.part.47+0x94/0x2a1
                   ---------------------
                   &(&zone->lock)->rlock          11356          [<ffffffff8116fe3b>] free_pcppages_bulk+0x35/0x31a
                   &(&zone->lock)->rlock           6741          [<ffffffff8116f72c>] get_page_from_freelist+0x379/0x6e8
                   &(&zone->lock)->rlock       28880589          [<ffffffff8116f856>] get_page_from_freelist+0x4a3/0x6e8
                   &(&zone->lock)->rlock       29558251          [<ffffffff8116fbbc>] __free_pages_ok.part.47+0x94/0x2a1

...............................................................................................................................................................................................

                           slock-AF_INET:        507780         508036           0.20        1167.78     2564695.48       11115246      106594271           0.12        1196.01   989718694.82
                           -------------
                           slock-AF_INET         434691          [<ffffffff818ed738>] lock_sock_fast+0x2f/0x84
                           slock-AF_INET          73294          [<ffffffff819482dc>] udp_queue_rcv_skb+0x1ba/0x3aa
                           slock-AF_INET             51          [<ffffffff818ed6b5>] lock_sock_nested+0x34/0x88
                           -------------
                           slock-AF_INET         434615          [<ffffffff819482dc>] udp_queue_rcv_skb+0x1ba/0x3aa
                           slock-AF_INET          73370          [<ffffffff818ed738>] lock_sock_fast+0x2f/0x84
                           slock-AF_INET             51          [<ffffffff8193fae1>] tcp_v4_rcv+0x390/0x978

..............................................................................................................................................................................................

                               &rq->lock:        286309         286456           0.21         294.85     1768779.90        5887506      244517912           0.09        1080.71   315600465.71
                               ---------
                               &rq->lock          92057          [<ffffffff81a0aa65>] __schedule+0x103/0x852
                               &rq->lock          18386          [<ffffffff810ecc02>] try_to_wake_up+0x95/0x26c
                               &rq->lock            730          [<ffffffff810f13fb>] update_blocked_averages+0x30/0x47f
                               &rq->lock            304          [<ffffffff81a0af43>] __schedule+0x5e1/0x852
                               ---------
                               &rq->lock         107807          [<ffffffff810ecd7b>] try_to_wake_up+0x20e/0x26c
                               &rq->lock         144391          [<ffffffff81a0aa65>] __schedule+0x103/0x852
                               &rq->lock            924          [<ffffffff810ecc02>] try_to_wake_up+0x95/0x26c
                               &rq->lock             29          [<ffffffff810e9090>] task_rq_lock+0x4b/0x85

Thanks,
Fengguang

^ permalink raw reply

* Re: BUG: MARK in OUTPUT + ip_tunnel causes kernel panic
From: Konstantin Kuzov @ 2013-09-26  2:06 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev
In-Reply-To: <20130925085947.GY7660@secunet.com>

On Wed, 25 Sep 2013 10:59:47 +0200
Steffen Klassert wrote:

 > > > When trying to tunnel traffic originating from the same machine as the
 > > > tunnel endpoint, I am experiencing kernel panics for some types of
 > > > traffic (ICMP and UDP). TCP seems not to be affected by this, at least
 > > > I have not been able to trigger the panic.
 > > > 
 > > > I have one tunnel (without an IP address) and use policy routing to
 > > > steer some traffic through the tunnels.  
 > > [...]  
 > > > An interesting thing is that I have seen different kernel panics being
 > > > triggered. The other one I have seen has RIP pointing to
 > > > e1000_xmit_frame() and the message "protocol 0800 is buggy". However,
 > > > the one I have posted is by far the most common.  
 > > I'm experiencing the same issue on two different machines. It happens on any 
 > > kernel starting from 3.10 when ip_tunnel/ip_tunnel_core were introduced.
 > >   
 > Can you please try the patch below?
 > I've posted the same patch already to netdev in the morning.
 > 
 > Subject: [PATCH net 1/2] ip_tunnel: Fix a memory corruption in ip_tunnel_xmit  
 
Thank you very much. All works fine with that patch applied.

^ permalink raw reply

* Re: [PATCH] moxa: drop free_irq of devm_request_irq allocated irq
From: Wei Yongjun @ 2013-09-26  2:12 UTC (permalink / raw)
  To: jg1.han
  Cc: jonas.jensen, davem, grant.likely, rob.herring, yongjun_wei,
	netdev, sachin.kamat
In-Reply-To: <000001ceba51$f8b45c10$ea1d1430$%han@samsung.com>

On 09/26/2013 08:47 AM, Jingoo Han wrote:
> On Wednesday, September 25, 2013 4:33 PM, Wei Yongjun wrote:
>> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>>
>> irq allocated with devm_request_irq should not be freed using
>> free_irq, because doing so causes a dangling pointer, and a
>> subsequent double free.
>>
>> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>> ---
>>  drivers/net/ethernet/moxa/moxart_ether.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
>> index 83c2091..9a7fcb5 100644
>> --- a/drivers/net/ethernet/moxa/moxart_ether.c
>> +++ b/drivers/net/ethernet/moxa/moxart_ether.c
>> @@ -531,7 +531,6 @@ static int moxart_remove(struct platform_device *pdev)
>>  	struct net_device *ndev = platform_get_drvdata(pdev);
>>
>>  	unregister_netdev(ndev);
>> -	free_irq(ndev->irq, ndev);
>>  	moxart_mac_free_memory(ndev);
>>  	free_netdev(ndev);
> CC'ed Sachin Kamat,
>
> In this case, the free_irq() will be called, after calling
> free_netdev(). 'ndev' is freed by free_netdev(). Then, 'ndev->irq'
> is used by free_irq(). Is it right?
>
> In my humble opinion, it seems to make the problem.
>

devm_request_irq() has recorded the irq and dev_id, so free_irq() by devm_*
will not touch 'ndev' which has been freed by free_netdev().
So, if we not need to call free_irq() before free_netdev(), there will be
no problem.

^ permalink raw reply

* Re: [PATCH v5 net-next 11/27] bonding: rework bond_3ad_xmit_xor() to use bond_for_each_slave() only
From: Ding Tianhong @ 2013-09-26  2:31 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, jiri, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1380093632-1842-12-git-send-email-vfalico@redhat.com>

On 2013/9/25 15:20, Veaceslav Falico wrote:
> Currently, there are two loops - first we find the first slave in an
> aggregator after the xmit_hash_policy() returned number, and after that we
> loop from that slave, over bonding head, and till that slave to find any
> suitable slave to send the packet through.
> 
> Replace it by just one bond_for_each_slave() loop, which first loops
> through the requested number of slaves, saving the first suitable one, and
> after that we've hit the requested number of slaves to skip - search for
> any up slave to send the packet through. If we don't find such kind of
> slave - then just send the packet through the first suitable slave found.
> 
> Logic remains unchainged, and we skip two loops. Also, refactor it a bit
> for readability.
> 
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
> 
> Notes:
>     v4  -> v5
>     No change.
>     
>     v3  -> v4:
>     No change.
>     
>     v2  -> v3:
>     No change.
>     
>     v1  -> v2:
>     No changes.
>     
>     RFC -> v1:
>     New patch.
> 
>  drivers/net/bonding/bond_3ad.c | 46 ++++++++++++++++++++----------------------
>  1 file changed, 22 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index 3847aee..c861ee7 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -2417,15 +2417,15 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
>  
>  int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>  {
> -	struct slave *slave, *start_at;
>  	struct bonding *bond = netdev_priv(dev);
> +	struct slave *slave, *first_ok_slave;
> +	struct aggregator *agg;
> +	struct ad_info ad_info;
>  	struct list_head *iter;
> -	int slave_agg_no;
>  	int slaves_in_agg;
> -	int agg_id;
> -	int i;
> -	struct ad_info ad_info;
> +	int slave_agg_no;
>  	int res = 1;
> +	int agg_id;
>  
>  	read_lock(&bond->lock);
>  	if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
> @@ -2438,20 +2438,28 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>  	agg_id = ad_info.aggregator_id;
>  
>  	if (slaves_in_agg == 0) {
> -		/*the aggregator is empty*/
>  		pr_debug("%s: Error: active aggregator is empty\n", dev->name);
>  		goto out;
>  	}
>  
>  	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
> +	first_ok_slave = NULL;
>  
>  	bond_for_each_slave(bond, slave, iter) {
> -		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
> +		agg = SLAVE_AD_INFO(slave).port.aggregator;
> +		if (!agg || agg->aggregator_identifier != agg_id)
> +			continue;
>  
> -		if (agg && (agg->aggregator_identifier == agg_id)) {
> +		if (slave_agg_no >= 0) {
> +			if (!first_ok_slave && SLAVE_IS_OK(slave))
> +				first_ok_slave = slave;
>  			slave_agg_no--;
> -			if (slave_agg_no < 0)
> -				break;
> +			continue;
> +		}
> +
> +		if (SLAVE_IS_OK(slave)) {
> +			res = bond_dev_queue_xmit(bond, skb, slave->dev);
> +			goto out;
>  		}

I think you miss something, you will send skb always by the first suitable port,
it could not support load balance.
pls consult my function.
 		if (agg && (agg->aggregator_identifier == agg_id)) {
-			slave_agg_no--;
-			if (slave_agg_no < 0)
-				break;
+			if (--slave_agg_no < 0) {
+				if (SLAVE_IS_OK(slave)) {
+					res = bond_dev_queue_xmit(bond,
+						skb, slave->dev);
+					goto out;
+				}
+			}
 		}
 	}

>  	}
>  
> @@ -2461,20 +2469,10 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>  		goto out;
>  	}
>  
> -	start_at = slave;
> -
> -	bond_for_each_slave_from(bond, slave, i, start_at) {
> -		int slave_agg_id = 0;
> -		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
> -
> -		if (agg)
> -			slave_agg_id = agg->aggregator_identifier;
> -
> -		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
> -			res = bond_dev_queue_xmit(bond, skb, slave->dev);
> -			break;
> -		}
> -	}
> +	/* we couldn't find any suitable slave after the agg_no, so use the
> +	 * first suitable found, if found. */
> +	if (first_ok_slave)
> +		res = bond_dev_queue_xmit(bond, skb, first_ok_slave->dev);
>  
>  out:
>  	read_unlock(&bond->lock);
> 

^ permalink raw reply

* Re: [PATCH net-next v5 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: Ding Tianhong @ 2013-09-26  2:33 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <20130925105000.GC23575@redhat.com>

On 2013/9/25 18:50, Veaceslav Falico wrote:
> On Wed, Sep 25, 2013 at 05:52:15PM +0800, Ding Tianhong wrote:
>> The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
>> (bonding: initial RCU conversion) has convert the roundrobin, active-backup,
>> broadcast and xor xmit path to rcu protection, the performance will be better
>> for these mode, so this time, convert xmit path for 3ad mode.
>>
>> Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>> Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>> Cc: Veaceslav Falico <vfalico@redhat.com>
>> ---
>> drivers/net/bonding/bond_3ad.c | 32 ++++++++++++++------------------
>> drivers/net/bonding/bonding.h  | 30 +++++++++++++++++++++++++++++-
>> 2 files changed, 43 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index 0d8f427..13f1deb 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -143,7 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
>>  */
>> static inline struct port *__get_first_port(struct bonding *bond)
>> {
>> -    struct slave *first_slave = bond_first_slave(bond);
>> +    struct slave *first_slave = bond_first_slave_rcu(bond);
>>
>>     return first_slave ? &(SLAVE_AD_INFO(first_slave).port) : NULL;
>> }
>> @@ -163,7 +163,7 @@ static inline struct port *__get_next_port(struct port *port)
>>     // If there's no bond for this port, or this is the last slave
>>     if (bond == NULL)
>>         return NULL;
>> -    slave_next = bond_next_slave(bond, slave);
>> +    slave_next = bond_next_slave_rcu(bond, slave);
>>     if (!slave_next || bond_is_first_slave(bond, slave_next))
>>         return NULL;
>>
>> @@ -2417,16 +2417,14 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
>>
>> int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>> {
>> -    struct slave *slave, *start_at;
>>     struct bonding *bond = netdev_priv(dev);
>> +    struct slave *slave;
>>     int slave_agg_no;
>>     int slaves_in_agg;
>>     int agg_id;
>> -    int i;
>>     struct ad_info ad_info;
>>     int res = 1;
>>
>> -    read_lock(&bond->lock);
>>     if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
>>         pr_debug("%s: Error: __bond_3ad_get_active_agg_info failed\n",
>>              dev->name);
>> @@ -2444,13 +2442,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>
>>     slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>>
>> -    bond_for_each_slave(bond, slave) {
>> +    bond_for_each_slave_rcu(bond, slave) {
>>         struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>>
>>         if (agg && (agg->aggregator_identifier == agg_id)) {
>> -            slave_agg_no--;
>> -            if (slave_agg_no < 0)
>> -                break;
>> +            if (--slave_agg_no < 0) {
>> +                if (SLAVE_IS_OK(slave)) {
>> +                    res = bond_dev_queue_xmit(bond,
>> +                        skb, slave->dev);
>> +                    goto out;
>> +                }
>> +            }
>>         }
>>     }
>>
>> @@ -2460,23 +2462,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>         goto out;
>>     }
>>
>> -    start_at = slave;
>> -
>> -    bond_for_each_slave_from(bond, slave, i, start_at) {
>> -        int slave_agg_id = 0;
>> +    bond_for_each_slave_rcu(bond, slave) {
>>         struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>>
>> -        if (agg)
>> -            slave_agg_id = agg->aggregator_identifier;
>> -
>> -        if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
>> +        if (SLAVE_IS_OK(slave) && agg &&
>> +            agg->aggregator_identifier == agg_id) {
>>             res = bond_dev_queue_xmit(bond, skb, slave->dev);
>>             break;
>>         }
>>     }
>>
>> out:
>> -    read_unlock(&bond->lock);
>>     if (res) {
>>         /* no suitable interface, frame not sent */
>>         kfree_skb(skb);
>> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>> index 03cf3fd..eb36f57 100644
>> --- a/drivers/net/bonding/bonding.h
>> +++ b/drivers/net/bonding/bonding.h
>> @@ -74,13 +74,31 @@
>> /* slave list primitives */
>> #define bond_to_slave(ptr) list_entry(ptr, struct slave, list)
>>
>> +/* slave list primitives, Caller must hold rcu_read_lock */
>> +#define bond_to_slave_rcu(ptr) list_entry_rcu(ptr, struct slave, list)
>> +
>> +/* bond_is_empty return NULL if slave list is empty*/
>> +#define bond_is_empty(bond) \
>> +    (list_empty(&(bond)->slave_list))
>> +
>> +/* bond_is_empty_rcu return NULL if slave list is empty*/
>> +#define bond_is_empty_rcu(bond) \
>> +    (!list_first_or_null_rcu(&(bond)->slave_list, struct slave, list))
> 
> Useless/dangerous function. It's not used in this patch and there can be no
> constructs like:
> 
> 1 if (!bond_is_empty_rcu(bond)) {
> 2     slave = bond_first_slave_rcu(bond);
> 3     do_something(slave);
> 

agree, forgot to remove it.

> because between 1) and 2) the slave can go away, and we'll end up with
> slave == NULL.
> 
>> +
>> /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */
>> #define bond_first_slave(bond) \
>>     list_first_entry_or_null(&(bond)->slave_list, struct slave, list)
>> #define bond_last_slave(bond) \
>> -    (list_empty(&(bond)->slave_list) ? NULL : \
>> +    (bond_is_empty(bond) ? NULL : \
>>                        bond_to_slave((bond)->slave_list.prev))
>>
>> +/**
>> + * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an empty list
>> + * Caller must hold rcu_read_lock
>> + */
>> +#define bond_first_slave_rcu(bond) \
>> +    list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);
> 
> bond_first_slave_or_null_rcu() ?
> 

both is suitable.

>> +
>> #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
>> #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)
>>
>> @@ -93,6 +111,16 @@
>>     (bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
>>                       bond_to_slave((pos)->list.prev))
>>
>> +/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
>> +#define bond_next_slave_rcu(bond, pos) \
>> +    ({struct list_head *__slave_list = &(bond)->slave_list; \
>> +     struct list_head __rcu *__next = list_next_rcu(__slave_list); \
>> +     struct list_head __rcu *__pos_next = list_next_rcu(&(pos)->list); \
>> +     likely(__pos_next != __slave_list) ? \
>> +     container_of(__pos_next, struct slave, list) : \
>> +     container_of(__next, struct slave, list); \
>> +     })
> 
> And if bond has no slaves, whilst pos being the slave that has just
> detached? Then bond->slave_list->next == pos->list->next ==
> &bond->slave_list, and we'll return the container_of(bond->slave_list),
> which is garbage, but not NULL.
> 
>> +
>> /**
>>  * bond_for_each_slave_from - iterate the slaves list from a starting point
>>  * @bond:    the bond holding this list.
>> -- 
>> 1.8.0
>>
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply

* Re: [PATCH] moxa: drop free_irq of devm_request_irq allocated irq
From: Jingoo Han @ 2013-09-26  2:36 UTC (permalink / raw)
  To: 'Wei Yongjun'
  Cc: 'Jonas Jensen', 'David S. Miller',
	'Grant Likely', rob.herring, yongjun_wei, netdev,
	'Sachin Kamat', 'Jingoo Han'
In-Reply-To: <CAPgLHd9VXaPx18ujDKbUik6XjOsnARcK-C-orCOynyJaWFREjw@mail.gmail.com>

On Thursday, September 26, 2013 11:13 AM, Wei Yongjun wrote:
> On 09/26/2013 08:47 AM, Jingoo Han wrote:
> > On Wednesday, September 25, 2013 4:33 PM, Wei Yongjun wrote:
> >> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> >>
> >> irq allocated with devm_request_irq should not be freed using
> >> free_irq, because doing so causes a dangling pointer, and a
> >> subsequent double free.
> >>
> >> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> >> ---
> >>  drivers/net/ethernet/moxa/moxart_ether.c | 1 -
> >>  1 file changed, 1 deletion(-)
> >>
> >> diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
> >> index 83c2091..9a7fcb5 100644
> >> --- a/drivers/net/ethernet/moxa/moxart_ether.c
> >> +++ b/drivers/net/ethernet/moxa/moxart_ether.c
> >> @@ -531,7 +531,6 @@ static int moxart_remove(struct platform_device *pdev)
> >>  	struct net_device *ndev = platform_get_drvdata(pdev);
> >>
> >>  	unregister_netdev(ndev);
> >> -	free_irq(ndev->irq, ndev);
> >>  	moxart_mac_free_memory(ndev);
> >>  	free_netdev(ndev);
> > CC'ed Sachin Kamat,
> >
> > In this case, the free_irq() will be called, after calling
> > free_netdev(). 'ndev' is freed by free_netdev(). Then, 'ndev->irq'
> > is used by free_irq(). Is it right?
> >
> > In my humble opinion, it seems to make the problem.
> >
> 
> devm_request_irq() has recorded the irq and dev_id, so free_irq() by devm_*
> will not touch 'ndev' which has been freed by free_netdev().
> So, if we not need to call free_irq() before free_netdev(), there will be
> no problem.

However, 'dev_id' is a pointer, not a value.
It seems to make the problem that references the invalid pointer.

Best regards,
Jingoo Han

^ permalink raw reply

* Re: Kernel Panic Sending Frames Using dev_queue_xmit()
From: Merlin Davis @ 2013-09-26  2:47 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Ben Hutchings, netdev
In-Reply-To: <20130925220351.GA4653@electric-eye.fr.zoreil.com>

On Wed, Sep 25, 2013 at 3:03 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Macro expansion bug.

Yes!  Excellent.  If I change the code to:

   err = dev_queue_xmit(skb);
   err = net_xmit_eval(err);

then the panic seems to disappear (I've run for at least 30 minutes
now without a crash anyway).  Would the best fix for this in the
kernel code be to:

A.) document the net_xmit_eval() macro with a warning,
B.) add local scope the the macro with ({ ... }) to remove the double
evaluation, or
C.) do nothing in the kernel code and stop being such a dumbass when calling it?

Thank you,
Merlin

^ permalink raw reply

* Re: [PATCH RFC] net: introduce support for lazy initialization of secret keys
From: Hannes Frederic Sowa @ 2013-09-26  2:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, fw, edumazet, davem, ycheng
In-Reply-To: <1380148946.3165.166.camel@edumazet-glaptop>

On Wed, Sep 25, 2013 at 03:42:26PM -0700, Eric Dumazet wrote:
> BTW, build_ehash_secret() is called like that :
> 
> if (unlikely(!inet_ehash_secret))
>     if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
>         build_ehash_secret();
> 
> So it would be better to make sure inet_ehash_secret is not 0 by
> accident.

This is the most idiomatic version which does not have this problem. It
does add a very few instructions on every hashing call and splits
inet6_ehash_secret and inet_ehash_secret. I just did this in a hurry
and did not test it well, so this is only a preview if this way would
be acceptable.

I would finish a final version by tomorrow based on the feedback. Thanks! ;)

[PATCH RFC] net: introduce support for lazy initialization of secret keys

net_get_random_once is a new macro which handles the initialization
of secret keys. It is possible to call it in the fast path. Only the
initialization depends on the spinlock and is rather slow. Otherwise
it should get used just before the key is used to delay the entropy
extration as late as possible to get better randomness. It returns true
if the key got initialized.

This patch splits the secret key for syncookies between ipv4 and ipv6
and initializes them with net_get_random_once.

Changed key initialization of tcp_fastopen cookies to net_get_random_once.

Also initialize the secret keys for tcp connection hashing via
net_get_random_once and split inet_ehash_secret and inet6_ehash_secret.
Thus inet_hash_secret need not be exported any more.

Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/net.h            | 14 ++++++++++++++
 include/net/inet6_hashtables.h |  9 +++++++--
 include/net/inet_sock.h        |  4 ++--
 include/net/ipv6.h             |  7 ++++++-
 include/net/tcp.h              |  3 +--
 net/core/secure_seq.c          | 14 ++------------
 net/core/utils.c               | 21 +++++++++++++++++++++
 net/ipv4/af_inet.c             | 25 -------------------------
 net/ipv4/syncookies.c          | 16 ++++++----------
 net/ipv4/sysctl_net_ipv4.c     |  5 +++++
 net/ipv4/tcp_fastopen.c        | 21 ++++++++++-----------
 net/ipv6/af_inet6.c            |  5 -----
 net/ipv6/inet6_hashtables.c    |  3 +++
 net/ipv6/syncookies.c          | 10 ++++++++--
 14 files changed, 85 insertions(+), 72 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 4f27575..d14fad5 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -243,6 +243,20 @@ do {								\
 #define net_random()		prandom_u32()
 #define net_srandom(seed)	prandom_seed((__force u32)(seed))
 
+bool __net_get_random_once(void *buf, int nbytes, bool *done);
+
+/* BE CAREFUL: this function is not interrupt safe */
+#define net_get_random_once(buf, nbytes)				\
+	({								\
+		static bool ___done = false;				\
+		bool ___ret = false;					\
+		if (unlikely(!___done))					\
+			___ret = __net_get_random_once(buf,		\
+						       nbytes,		\
+						       &___done);	\
+		___ret;							\
+	})
+
 extern int   	     kernel_sendmsg(struct socket *sock, struct msghdr *msg,
 				    struct kvec *vec, size_t num, size_t len);
 extern int   	     kernel_recvmsg(struct socket *sock, struct msghdr *msg,
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index f52fa88..8b66189 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -28,16 +28,21 @@
 
 struct inet_hashinfo;
 
+extern u32 inet6_ehash_secret;
+
 static inline unsigned int inet6_ehashfn(struct net *net,
 				const struct in6_addr *laddr, const u16 lport,
 				const struct in6_addr *faddr, const __be16 fport)
 {
-	u32 ports = (((u32)lport) << 16) | (__force u32)fport;
+	u32 ports;
+
+	net_get_random_once(&inet6_ehash_secret, sizeof(inet6_ehash_secret));
 
+	ports = (((u32)lport) << 16) | (__force u32)fport;
 	return jhash_3words((__force u32)laddr->s6_addr32[3],
 			    ipv6_addr_jhash(faddr),
 			    ports,
-			    inet_ehash_secret + net_hash_mix(net));
+			    inet6_ehash_secret + net_hash_mix(net));
 }
 
 static inline int inet6_sk_ehashfn(const struct sock *sk)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 636d203..d5a5353 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -202,13 +202,13 @@ static inline void inet_sk_copy_descendant(struct sock *sk_to,
 int inet_sk_rebuild_header(struct sock *sk);
 
 extern u32 inet_ehash_secret;
-extern u32 ipv6_hash_secret;
-void build_ehash_secret(void);
 
 static inline unsigned int inet_ehashfn(struct net *net,
 					const __be32 laddr, const __u16 lport,
 					const __be32 faddr, const __be16 fport)
 {
+	net_get_random_once(&inet_ehash_secret, sizeof(inet_ehash_secret));
+
 	return jhash_3words((__force __u32) laddr,
 			    (__force __u32) faddr,
 			    ((__u32) lport) << 16 | (__force __u32)fport,
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index fe1c7f6..c683020 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -538,11 +538,16 @@ static inline u32 ipv6_addr_hash(const struct in6_addr *a)
 #endif
 }
 
+extern u32 ipv6_hash_secret;
+
 /* more secured version of ipv6_addr_hash() */
 static inline u32 ipv6_addr_jhash(const struct in6_addr *a)
 {
-	u32 v = (__force u32)a->s6_addr32[0] ^ (__force u32)a->s6_addr32[1];
+	u32 v;
+
+	net_get_random_once(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
 
+	v = (__force u32)a->s6_addr32[0] ^ (__force u32)a->s6_addr32[1];
 	return jhash_3words(v,
 			    (__force u32)a->s6_addr32[2],
 			    (__force u32)a->s6_addr32[3],
diff --git a/include/net/tcp.h b/include/net/tcp.h
index de870ee..2a26100 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -475,7 +475,6 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size);
 void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb);
 
 /* From syncookies.c */
-extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
 int __cookie_v4_check(const struct iphdr *iph, const struct tcphdr *th,
 		      u32 cookie);
 struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
@@ -1323,7 +1322,7 @@ extern struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
 int tcp_fastopen_reset_cipher(void *key, unsigned int len);
 void tcp_fastopen_cookie_gen(__be32 src, __be32 dst,
 			     struct tcp_fastopen_cookie *foc);
-
+void tcp_fastopen_init_key_once(bool publish);
 #define TCP_FASTOPEN_KEY_LENGTH 16
 
 /* Fastopen key context */
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 3f1ec15..b02fd16 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -7,6 +7,7 @@
 #include <linux/hrtimer.h>
 #include <linux/ktime.h>
 #include <linux/string.h>
+#include <linux/net.h>
 
 #include <net/secure_seq.h>
 
@@ -16,18 +17,7 @@ static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
 
 static void net_secret_init(void)
 {
-	u32 tmp;
-	int i;
-
-	if (likely(net_secret[0]))
-		return;
-
-	for (i = NET_SECRET_SIZE; i > 0;) {
-		do {
-			get_random_bytes(&tmp, sizeof(tmp));
-		} while (!tmp);
-		cmpxchg(&net_secret[--i], 0, tmp);
-	}
+	net_get_random_once(net_secret, sizeof(net_secret));
 }
 
 #ifdef CONFIG_INET
diff --git a/net/core/utils.c b/net/core/utils.c
index aa88e23..b420547 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -338,3 +338,24 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
 				  csum_unfold(*sum)));
 }
 EXPORT_SYMBOL(inet_proto_csum_replace16);
+
+bool __net_get_random_once(void *buf, int nbytes, bool *done)
+{
+	static DEFINE_SPINLOCK(lock);
+
+	spin_lock_bh(&lock);
+	if (*done) {
+		spin_unlock_bh(&lock);
+		return false;
+	}
+
+	get_random_bytes(buf, nbytes);
+	/* Make sure random data is published before toggeling done.
+	 * There is no corresponding rmb.
+	 */
+	smp_wmb();
+	*done = true;
+	spin_unlock_bh(&lock);
+	return true;
+}
+EXPORT_SYMBOL(__net_get_random_once);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cfeb85c..49a5c48 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -246,27 +246,6 @@ out:
 EXPORT_SYMBOL(inet_listen);
 
 u32 inet_ehash_secret __read_mostly;
-EXPORT_SYMBOL(inet_ehash_secret);
-
-u32 ipv6_hash_secret __read_mostly;
-EXPORT_SYMBOL(ipv6_hash_secret);
-
-/*
- * inet_ehash_secret must be set exactly once, and to a non nul value
- * ipv6_hash_secret must be set exactly once.
- */
-void build_ehash_secret(void)
-{
-	u32 rnd;
-
-	do {
-		get_random_bytes(&rnd, sizeof(rnd));
-	} while (rnd == 0);
-
-	if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0)
-		get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
-}
-EXPORT_SYMBOL(build_ehash_secret);
 
 /*
  *	Create an inet socket.
@@ -284,10 +263,6 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (unlikely(!inet_ehash_secret))
-		if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
-			build_ehash_secret();
-
 	sock->state = SS_UNCONNECTED;
 
 	/* Look for the requested type/protocol pair. */
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 15e0241..af79b84 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -25,15 +25,7 @@
 
 extern int sysctl_tcp_syncookies;
 
-__u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
-EXPORT_SYMBOL(syncookie_secret);
-
-static __init int init_syncookies(void)
-{
-	get_random_bytes(syncookie_secret, sizeof(syncookie_secret));
-	return 0;
-}
-__initcall(init_syncookies);
+static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
 
 #define COOKIEBITS 24	/* Upper bits store count */
 #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
@@ -44,7 +36,11 @@ static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS],
 static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport,
 		       u32 count, int c)
 {
-	__u32 *tmp = __get_cpu_var(ipv4_cookie_scratch);
+	__u32 *tmp;
+
+	net_get_random_once(syncookie_secret, sizeof(syncookie_secret));
+
+	tmp = __get_cpu_var(ipv4_cookie_scratch);
 
 	memcpy(tmp + 4, syncookie_secret[c], sizeof(syncookie_secret[c]));
 	tmp[0] = (__force u32)saddr;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 540279f..d2b5140 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -267,6 +267,11 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
 			ret = -EINVAL;
 			goto bad_key;
 		}
+		/* Generate a dummy secret but don't publish it. This
+		 * is needed so we don't regenerate a new key on the
+		 * first invocation of tcp_fastopen_cookie_gen
+		 */
+		tcp_fastopen_init_key_once(false);
 		tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH);
 	}
 
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index ab7bd35..316bfdc 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -14,6 +14,14 @@ struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
 
 static DEFINE_SPINLOCK(tcp_fastopen_ctx_lock);
 
+void tcp_fastopen_init_key_once(bool publish)
+{
+	static u8 key[TCP_FASTOPEN_KEY_LENGTH];
+
+	if (net_get_random_once(key, sizeof(key)) && publish)
+		tcp_fastopen_reset_cipher(key, sizeof(key));
+}
+
 static void tcp_fastopen_ctx_free(struct rcu_head *head)
 {
 	struct tcp_fastopen_context *ctx =
@@ -70,6 +78,8 @@ void tcp_fastopen_cookie_gen(__be32 src, __be32 dst,
 	__be32 path[4] = { src, dst, 0, 0 };
 	struct tcp_fastopen_context *ctx;
 
+	tcp_fastopen_init_key_once(true);
+
 	rcu_read_lock();
 	ctx = rcu_dereference(tcp_fastopen_ctx);
 	if (ctx) {
@@ -78,14 +88,3 @@ void tcp_fastopen_cookie_gen(__be32 src, __be32 dst,
 	}
 	rcu_read_unlock();
 }
-
-static int __init tcp_fastopen_init(void)
-{
-	__u8 key[TCP_FASTOPEN_KEY_LENGTH];
-
-	get_random_bytes(key, sizeof(key));
-	tcp_fastopen_reset_cipher(key, sizeof(key));
-	return 0;
-}
-
-late_initcall(tcp_fastopen_init);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 4966b12..5bd9b25 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -110,11 +110,6 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (sock->type != SOCK_RAW &&
-	    sock->type != SOCK_DGRAM &&
-	    !inet_ehash_secret)
-		build_ehash_secret();
-
 	/* Look for the requested type/protocol pair. */
 lookup_protocol:
 	err = -ESOCKTNOSUPPORT;
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 32b4a16..4015720 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -23,6 +23,9 @@
 #include <net/secure_seq.h>
 #include <net/ip.h>
 
+u32 inet6_ehash_secret __read_mostly;
+u32 ipv6_hash_secret __read_mostly;
+
 int __inet6_hash(struct sock *sk, struct inet_timewait_sock *tw)
 {
 	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index d703218..a454cef 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -21,6 +21,8 @@
 #include <net/ipv6.h>
 #include <net/tcp.h>
 
+static u32 syncookie6_secret[2][16-4+SHA_DIGEST_WORDS];
+
 #define COOKIEBITS 24	/* Upper bits store count */
 #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
 
@@ -61,14 +63,18 @@ static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS],
 static u32 cookie_hash(const struct in6_addr *saddr, const struct in6_addr *daddr,
 		       __be16 sport, __be16 dport, u32 count, int c)
 {
-	__u32 *tmp = __get_cpu_var(ipv6_cookie_scratch);
+	__u32 *tmp;
+
+	net_get_random_once(syncookie6_secret, sizeof(syncookie6_secret));
+
+	tmp = __get_cpu_var(ipv6_cookie_scratch);
 
 	/*
 	 * we have 320 bits of information to hash, copy in the remaining
 	 * 192 bits required for sha_transform, from the syncookie_secret
 	 * and overwrite the digest with the secret
 	 */
-	memcpy(tmp + 10, syncookie_secret[c], 44);
+	memcpy(tmp + 10, syncookie6_secret[c], 44);
 	memcpy(tmp, saddr, 16);
 	memcpy(tmp + 4, daddr, 16);
 	tmp[8] = ((__force u32)sport << 16) + (__force u32)dport;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next v5 2/6] bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
From: Ding Tianhong @ 2013-09-26  2:51 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <20130925103300.GB23575@redhat.com>

On 2013/9/25 18:33, Veaceslav Falico wrote:
> On Wed, Sep 25, 2013 at 05:52:19PM +0800, Ding Tianhong wrote:
>> There is no pointer needed read lock protection, remove the unnecessary lock
>> and improve performance for the 3ad recv path.
> 
> I don't really understand it. Here's the code path:
> 
> rx_handler (holding rcu_read_lock()) -> bond_handle_frame() ->
> bond->recv_probe -> bond_3ad_lacpdu_recv(). So we're holding only the
> rcu_read_lock() there. What stops us from racing with
> bond_3ad_unbind_slave(), for example?
> 
> As in:
> 
> CPU0                CPU1
> --------            -----------
> ...                bond_3ad_unbind_slave()
> bond_3ad_rx_indication()    ...
> if (!port->slave) {        ...            //slave is ok
>                 port->slave = NULL;
> ad_marker_info_received()    ...
> ad_marker_send()        ...
> slave = port->slave;        ...
> skb->dev = slave->dev;        ...
>        ^^^ NULL pointer dereference.
> 
> 
> I'm not saying that this approach is wrong, maybe I'm missing something,
> but when removing locks it's usually a good thing to do - to comment it in
> depth in the commit message why it's not already needed.
> 

no, it will not happend, pls review the cold:
	netdev_rx_handler_unregister(slave_dev);
	write_lock_bh(&bond->lock);

	/* Inform AD package of unbinding of slave. */
	if (bond->params.mode == BOND_MODE_8023AD) {
		/* must be called before the slave is
		 * detached from the list
		 */
		bond_3ad_unbind_slave(slave);
	}
netdev_rx_handler_unregiste() will remvoe the rx_handle before the bond_3ad_unbind_slave(),
it was safe to run bond_3ad_rx_indication(). 

Best regards
Ding Tianhong


>>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>> ---
>> drivers/net/bonding/bond_3ad.c | 2 --
>> 1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index 7a3860f..c134f43 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -2494,9 +2494,7 @@ int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond,
>>     if (!lacpdu)
>>         return ret;
>>
>> -    read_lock(&bond->lock);
>>     ret = bond_3ad_rx_indication(lacpdu, slave, skb->len);
>> -    read_unlock(&bond->lock);
>>     return ret;
>> }
>>
>> -- 
>> 1.8.2.1
>>
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply

* Re: [PATCH RFC] net: introduce support for lazy initialization of secret keys
From: Hannes Frederic Sowa @ 2013-09-26  3:03 UTC (permalink / raw)
  To: Eric Dumazet, netdev, fw, edumazet, davem, ycheng
In-Reply-To: <20130926025024.GC30920@order.stressinduktion.org>

On Thu, Sep 26, 2013 at 04:50:24AM +0200, Hannes Frederic Sowa wrote:
>  static inline unsigned int inet6_ehashfn(struct net *net,
>  				const struct in6_addr *laddr, const u16 lport,
>  				const struct in6_addr *faddr, const __be16 fport)
>  {
> -	u32 ports = (((u32)lport) << 16) | (__force u32)fport;
> +	u32 ports;
> +
> +	net_get_random_once(&inet6_ehash_secret, sizeof(inet6_ehash_secret));

I was too quick with sending, this is broken. I cannot call this in a
static inline. Sorry for the noise.

^ permalink raw reply

* Re: [PATCH] moxa: drop free_irq of devm_request_irq allocated irq
From: Wei Yongjun @ 2013-09-26  3:23 UTC (permalink / raw)
  To: jg1.han
  Cc: jonas.jensen, davem, grant.likely, rob.herring, yongjun_wei,
	netdev, sachin.kamat

On 09/26/2013 10:36 AM, Jingoo Han wrote:
> On Thursday, September 26, 2013 11:13 AM, Wei Yongjun wrote:
>> On 09/26/2013 08:47 AM, Jingoo Han wrote:
>>> On Wednesday, September 25, 2013 4:33 PM, Wei Yongjun wrote:
>>>> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>>>>
>>>> irq allocated with devm_request_irq should not be freed using
>>>> free_irq, because doing so causes a dangling pointer, and a
>>>> subsequent double free.
>>>>
>>>> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>>>> ---
>>>>  drivers/net/ethernet/moxa/moxart_ether.c | 1 -
>>>>  1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
>>>> index 83c2091..9a7fcb5 100644
>>>> --- a/drivers/net/ethernet/moxa/moxart_ether.c
>>>> +++ b/drivers/net/ethernet/moxa/moxart_ether.c
>>>> @@ -531,7 +531,6 @@ static int moxart_remove(struct platform_device *pdev)
>>>>  	struct net_device *ndev = platform_get_drvdata(pdev);
>>>>
>>>>  	unregister_netdev(ndev);
>>>> -	free_irq(ndev->irq, ndev);
>>>>  	moxart_mac_free_memory(ndev);
>>>>  	free_netdev(ndev);
>>> CC'ed Sachin Kamat,
>>>
>>> In this case, the free_irq() will be called, after calling
>>> free_netdev(). 'ndev' is freed by free_netdev(). Then, 'ndev->irq'
>>> is used by free_irq(). Is it right?
>>>
>>> In my humble opinion, it seems to make the problem.
>>>
>> devm_request_irq() has recorded the irq and dev_id, so free_irq() by devm_*
>> will not touch 'ndev' which has been freed by free_netdev().
>> So, if we not need to call free_irq() before free_netdev(), there will be
>> no problem.
> However, 'dev_id' is a pointer, not a value.
> It seems to make the problem that references the invalid pointer.

free_irq using dev_id as a raw data, so you mean the irq handle?

----
 free_irq(...)
 {
   ...
     if (action->dev_id == dev_id)
   ...
 }
---

Regards,
Yongjun Wei

^ permalink raw reply

* Re: [PATCH net-next v5 2/6] bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
From: Jay Vosburgh @ 2013-09-26  4:23 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <5243A127.80404@huawei.com>

Ding Tianhong <dingtianhong@huawei.com> wrote:

>On 2013/9/25 18:33, Veaceslav Falico wrote:
>> On Wed, Sep 25, 2013 at 05:52:19PM +0800, Ding Tianhong wrote:
>>> There is no pointer needed read lock protection, remove the unnecessary lock
>>> and improve performance for the 3ad recv path.

	How much does removing the lock around the LACPDU receive
processing improve performance?  This is not high rate traffic; the
"fast" rate is one LACPDU per second (per port); the default rate is one
every 30 seconds.

>> I don't really understand it. Here's the code path:
>> 
>> rx_handler (holding rcu_read_lock()) -> bond_handle_frame() ->
>> bond->recv_probe -> bond_3ad_lacpdu_recv(). So we're holding only the
>> rcu_read_lock() there. What stops us from racing with
>> bond_3ad_unbind_slave(), for example?
>> 
>> As in:
>> 
>> CPU0                CPU1
>> --------            -----------
>> ...                bond_3ad_unbind_slave()
>> bond_3ad_rx_indication()    ...
>> if (!port->slave) {        ...            //slave is ok
>>                 port->slave = NULL;
>> ad_marker_info_received()    ...
>> ad_marker_send()        ...
>> slave = port->slave;        ...
>> skb->dev = slave->dev;        ...
>>        ^^^ NULL pointer dereference.
>> 
>> 
>> I'm not saying that this approach is wrong, maybe I'm missing something,
>> but when removing locks it's usually a good thing to do - to comment it in
>> depth in the commit message why it's not already needed.
>> 
>
>no, it will not happend, pls review the cold:
>	netdev_rx_handler_unregister(slave_dev);
>	write_lock_bh(&bond->lock);
>
>	/* Inform AD package of unbinding of slave. */
>	if (bond->params.mode == BOND_MODE_8023AD) {
>		/* must be called before the slave is
>		 * detached from the list
>		 */
>		bond_3ad_unbind_slave(slave);
>	}
>netdev_rx_handler_unregiste() will remvoe the rx_handle before the bond_3ad_unbind_slave(),
>it was safe to run bond_3ad_rx_indication(). 

	I'm not sure this is safe if bond_3ad_rx_indication is started
prior to the unbind, e.g.,

CPU 0				CPU 1
----				-----
bond_3ad_rx_indication
[ pass port->slave test ]
[ ... ]				rx_handler_unregister

[ state machine lock could be
  contended, forcing us to wait ]
__get_state_machine_lock

				write_lock(&bond->lock)
				bond_3ad_unbind_slave()
				[ ... ]
				port->slave = NULL;

[ got the lock ]
ad_rx_machine(lacpdu, port)
[ detect loopback ]
pr_err(... port->slave->bond->dev->name)

	or that ad_marker case that Veaceslav describes.

	-J

>Best regards
>Ding Tianhong
>
>
>>>
>>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>>> Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>>> ---
>>> drivers/net/bonding/bond_3ad.c | 2 --
>>> 1 file changed, 2 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>> index 7a3860f..c134f43 100644
>>> --- a/drivers/net/bonding/bond_3ad.c
>>> +++ b/drivers/net/bonding/bond_3ad.c
>>> @@ -2494,9 +2494,7 @@ int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond,
>>>     if (!lacpdu)
>>>         return ret;
>>>
>>> -    read_lock(&bond->lock);
>>>     ret = bond_3ad_rx_indication(lacpdu, slave, skb->len);
>>> -    read_unlock(&bond->lock);
>>>     return ret;
>>> }
>>>
>>> -- 
>>> 1.8.2.1

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
From: Jason Wang @ 2013-09-26  4:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20130923071620.GB31886@redhat.com>

On 09/23/2013 03:16 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
>> > On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
>>> > > On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
>>>> > >> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
>>>> > >> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
>>>> > >> later. This could be avoided by determining zerocopy once by checking all
>>>> > >> conditions at one time before.
>>>> > >>
>>>> > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> > >> ---
>>>> > >>  drivers/vhost/net.c |   47 ++++++++++++++++++++---------------------------
>>>> > >>  1 files changed, 20 insertions(+), 27 deletions(-)
>>>> > >>
>>>> > >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> > >> index 8a6dd0d..3f89dea 100644
>>>> > >> --- a/drivers/vhost/net.c
>>>> > >> +++ b/drivers/vhost/net.c
>>>> > >> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
>>>> > >>  			       iov_length(nvq->hdr, s), hdr_size);
>>>> > >>  			break;
>>>> > >>  		}
>>>> > >> -		zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
>>>> > >> -				       nvq->upend_idx != nvq->done_idx);
>>>> > >> +
>>>> > >> +		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
>>>> > >> +				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
>>>> > >> +				      nvq->done_idx
>>> > > Thinking about this, this looks strange.
>>> > > The original idea was that once we start doing zcopy, we keep
>>> > > using the heads ring even for short packets until no zcopy is outstanding.
>> > 
>> > What's the reason for keep using the heads ring?
> To keep completions in order.

Ok, I will do some test to see the impact.
>>> > >
>>> > > What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
>>> > > here?
>> > 
>> > Because we initialize both upend_idx and done_idx to zero, so upend_idx
>> > != done_idx could not be used to check whether or not the heads ring
>> > were full.
> But what does ring full have to do with zerocopy use?
>

It was used to forbid the zerocopy when heads ring are full, but since
we have the limitation now, it could be removed.

^ permalink raw reply

* Re: [PATCH] ipvs: improved SH fallback strategy
From: Julian Anastasov @ 2013-09-26  5:30 UTC (permalink / raw)
  To: Alexander Frolkin
  Cc: Simon Horman, Sergei Shtylyov, lvs-devel, Wensong Zhang, netdev,
	linux-kernel
In-Reply-To: <20130925092638.GD19768@eldamar.org.uk>


	Hello,

On Wed, 25 Sep 2013, Alexander Frolkin wrote:

> Improve the SH fallback realserver selection strategy.
> 
> With sh and sh-fallback, if a realserver is down, this attempts to
> distribute the traffic that would have gone to that server evenly
> among the remaining servers.
>  
> Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
> 
> --
> diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
> index 3588fae..3d5ab7c 100644
> --- a/net/netfilter/ipvs/ip_vs_sh.c
> +++ b/net/netfilter/ipvs/ip_vs_sh.c
> @@ -115,27 +115,47 @@ ip_vs_sh_get(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
>  }
>  
>  
> -/* As ip_vs_sh_get, but with fallback if selected server is unavailable */
> +/* As ip_vs_sh_get, but with fallback if selected server is unavailable
> + *
> + * The fallback strategy loops around the table starting from a "random"
> + * point (in fact, it is chosen to be the original hash value to make the
> + * algorithm deterministic) to find a new server.
> + */
>  static inline struct ip_vs_dest *
>  ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
>  		      const union nf_inet_addr *addr, __be16 port)
>  {
> -	unsigned int offset;
> -	unsigned int hash;
> +	unsigned int offset, roffset;
> +	unsigned int hash, ihash;
>  	struct ip_vs_dest *dest;
>  
> -	for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
> -		hash = ip_vs_sh_hashkey(svc->af, addr, port, offset);
> -		dest = rcu_dereference(s->buckets[hash].dest);
> -		if (!dest)
> -			break;
> -		if (is_unavailable(dest))
> -			IP_VS_DBG_BUF(6, "SH: selected unavailable server "
> -				      "%s:%d (offset %d)",
> +	/* first try the dest it's supposed to go to */
> +	ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0);
> +	dest = rcu_dereference(s->buckets[ihash].dest);
> +	if (!dest)
> +		return NULL;

	Can we reduce the indentation here, eg:

	if (!is_unavailable(dest))
		return dest;
	IP_VS_DBG_BUF(6, "SH: selected unavailable server "
	...
	for ()...
	...
	return NULL;

> +	if (is_unavailable(dest)) {
> +		IP_VS_DBG_BUF(6, "SH: selected unavailable server "
> +		      "%s:%d, reselecting",
> +		      IP_VS_DBG_ADDR(svc->af, &dest->addr),
> +		      ntohs(dest->port));
> +		/* if the original dest is unavailable, loop around the table
> +		 * starting from ihash to find a new dest
> +		 */
> +		for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
> +			roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE;
> +			hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset);
> +			dest = rcu_dereference(s->buckets[hash].dest);

	Every result of rcu_dereference should be checked
for NULL (no dests anymore):
			if (!dest)
				break;

	Then make sure there is correct indentation
for IP_VS_DBG_BUF parameters.

> +			if (is_unavailable(dest))
> +				IP_VS_DBG_BUF(6, "SH: selected unavailable "
> +				      "server %s:%d (offset %d), reselecting",
>  				      IP_VS_DBG_ADDR(svc->af, &dest->addr),
> -				      ntohs(dest->port), offset);
> -		else
> -			return dest;
> +				      ntohs(dest->port), roffset);
> +			else
> +				return dest;
> +		}
> +	} else {
> +		return dest;
>  	}
>  
>  	return NULL;

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH v5 net-next 11/27] bonding: rework bond_3ad_xmit_xor() to use bond_for_each_slave() only
From: Veaceslav Falico @ 2013-09-26  5:43 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: netdev, jiri, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <52439C87.6050809@huawei.com>

On Thu, Sep 26, 2013 at 10:31:35AM +0800, Ding Tianhong wrote:

First of all - thanks a lot for the review! Answered below.

>On 2013/9/25 15:20, Veaceslav Falico wrote:
>> Currently, there are two loops - first we find the first slave in an
>> aggregator after the xmit_hash_policy() returned number, and after that we
>> loop from that slave, over bonding head, and till that slave to find any
>> suitable slave to send the packet through.
>>
>> Replace it by just one bond_for_each_slave() loop, which first loops
>> through the requested number of slaves, saving the first suitable one, and
>> after that we've hit the requested number of slaves to skip - search for
>> any up slave to send the packet through. If we don't find such kind of
>> slave - then just send the packet through the first suitable slave found.
>>
>> Logic remains unchainged, and we skip two loops. Also, refactor it a bit
>> for readability.
>>
>> CC: Jay Vosburgh <fubar@us.ibm.com>
>> CC: Andy Gospodarek <andy@greyhouse.net>
>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>> ---
>>
>> Notes:
>>     v4  -> v5
>>     No change.
>>
>>     v3  -> v4:
>>     No change.
>>
>>     v2  -> v3:
>>     No change.
>>
>>     v1  -> v2:
>>     No changes.
>>
>>     RFC -> v1:
>>     New patch.
>>
>>  drivers/net/bonding/bond_3ad.c | 46 ++++++++++++++++++++----------------------
>>  1 file changed, 22 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index 3847aee..c861ee7 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -2417,15 +2417,15 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
>>
>>  int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>  {
>> -	struct slave *slave, *start_at;
>>  	struct bonding *bond = netdev_priv(dev);
>> +	struct slave *slave, *first_ok_slave;
>> +	struct aggregator *agg;
>> +	struct ad_info ad_info;
>>  	struct list_head *iter;
>> -	int slave_agg_no;
>>  	int slaves_in_agg;
>> -	int agg_id;
>> -	int i;
>> -	struct ad_info ad_info;
>> +	int slave_agg_no;
>>  	int res = 1;
>> +	int agg_id;
>>
>>  	read_lock(&bond->lock);
>>  	if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
>> @@ -2438,20 +2438,28 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>  	agg_id = ad_info.aggregator_id;
>>
>>  	if (slaves_in_agg == 0) {
>> -		/*the aggregator is empty*/
>>  		pr_debug("%s: Error: active aggregator is empty\n", dev->name);
>>  		goto out;
>>  	}
>>
>>  	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>> +	first_ok_slave = NULL;
>>
>>  	bond_for_each_slave(bond, slave, iter) {
>> -		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>> +		agg = SLAVE_AD_INFO(slave).port.aggregator;
>> +		if (!agg || agg->aggregator_identifier != agg_id)
>> +			continue;
>>
>> -		if (agg && (agg->aggregator_identifier == agg_id)) {
>> +		if (slave_agg_no >= 0) {
>> +			if (!first_ok_slave && SLAVE_IS_OK(slave))
>> +				first_ok_slave = slave;
>>  			slave_agg_no--;
>> -			if (slave_agg_no < 0)
>> -				break;
>> +			continue;
>> +		}

[1]		^^^^^^^^^^^^^^^^^^^^^^^^^

>> +
>> +		if (SLAVE_IS_OK(slave)) {
>> +			res = bond_dev_queue_xmit(bond, skb, slave->dev);
>> +			goto out;
>>  		}

[2]		^^^^^^^^^^^^^^^^^^^^^^^^^

>
>I think you miss something, you will send skb always by the first suitable port,
>it could not support load balance.

Well, yes, it will send always by the first suitable port AFTER
slave_agg_no, returned by the xmit_hash_policy(), which is the whole point
in the hash function and in the load balance.

It will first loop through the slaves, decrementing slave_agg_no to a
negative value, while saving the first good to send slave, as shown in [1].
Notice the "continue;":

if (slave_agg_no >= 0) {
	if (!first_ok_slave && SLAVE_IS_OK(slave))
		first_ok_slave = slave;
	slave_agg_no--;
	continue;
}

Once we hit the negative value - which means we've skipped enough slaves,
as requested by the hash function - we can start looking for the first
slave that is good to send AFTER those all skipped slaves, as shown in [2].

Down the patch we also use that 'first_ok_slave' - in case we didn't find
any suitable one after we've skipped first slave_agg_no and till the last
slave, so it implements the same 'circular' logic as was in
bond_for_each_slave_from().

>pls consult my function.
> 		if (agg && (agg->aggregator_identifier == agg_id)) {
>-			slave_agg_no--;
>-			if (slave_agg_no < 0)
>-				break;
>+			if (--slave_agg_no < 0) {
>+				if (SLAVE_IS_OK(slave)) {
>+					res = bond_dev_queue_xmit(bond,
>+						skb, slave->dev);
>+					goto out;
>+				}
>+			}

I'll review your function in your patch.

> 		}
> 	}
>
>>  	}
>>
>> @@ -2461,20 +2469,10 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>  		goto out;
>>  	}
>>
>> -	start_at = slave;
>> -
>> -	bond_for_each_slave_from(bond, slave, i, start_at) {
>> -		int slave_agg_id = 0;
>> -		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>> -
>> -		if (agg)
>> -			slave_agg_id = agg->aggregator_identifier;
>> -
>> -		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
>> -			res = bond_dev_queue_xmit(bond, skb, slave->dev);
>> -			break;
>> -		}
>> -	}
>> +	/* we couldn't find any suitable slave after the agg_no, so use the
>> +	 * first suitable found, if found. */
>> +	if (first_ok_slave)
>> +		res = bond_dev_queue_xmit(bond, skb, first_ok_slave->dev);
>>
>>  out:
>>  	read_unlock(&bond->lock);
>>
>
>

^ permalink raw reply

* Established sockets remain open after iface down or address lost
From: Chris Verges @ 2013-09-26  6:04 UTC (permalink / raw)
  To: davem, kuznet, jmorris, yoshfuji, kaber, netdev

Hello all,

I've encountered a behavior that appears to be known, but am seeking
some clarity on its rationale.  The scenario is as follows:

  (0) A TCP server socket listens on :: (v4/v6).
  (1) Connect a USB/Ethernet adapter to a Linux system.
  (2) Adapter is brought up as 'eth0' with an IP address.
  (3) A remote TCP client connects to the server socket.
  (4) 'netstat -anp' shows the socket as ESTABLISHED
  (5) The TCP server starts a blocking read waiting for data.
  (6) Physically disconnect the USB/Ethernet adapter from the USB bus.
  (7) Linux removes the 'eth0' interface and associated IP address.

At this point, the socket _still_ shows as ESTABLISHED under netstat.

This is the paradox.  Why is the blocking read not interrupted with a
socket error to indicate that the socket is no longer viable?

Thank you in advance,
Chris

P.S.  I apologize in advance if I missed this answer in the netdev
archives.

^ permalink raw reply

* Re: [PATCH net-next v5 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: Veaceslav Falico @ 2013-09-26  6:07 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <5242B24F.5000309@huawei.com>

On Wed, Sep 25, 2013 at 05:52:15PM +0800, Ding Tianhong wrote:
>The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
>(bonding: initial RCU conversion) has convert the roundrobin, active-backup,
>broadcast and xor xmit path to rcu protection, the performance will be better
>for these mode, so this time, convert xmit path for 3ad mode.
>
>Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
>Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>Cc: Veaceslav Falico <vfalico@redhat.com>
>---
> drivers/net/bonding/bond_3ad.c | 32 ++++++++++++++------------------
> drivers/net/bonding/bonding.h  | 30 +++++++++++++++++++++++++++++-
> 2 files changed, 43 insertions(+), 19 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 0d8f427..13f1deb 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -143,7 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
>  */
> static inline struct port *__get_first_port(struct bonding *bond)
> {
>-	struct slave *first_slave = bond_first_slave(bond);
>+	struct slave *first_slave = bond_first_slave_rcu(bond);
>
> 	return first_slave ? &(SLAVE_AD_INFO(first_slave).port) : NULL;
> }
>@@ -163,7 +163,7 @@ static inline struct port *__get_next_port(struct port *port)
> 	// If there's no bond for this port, or this is the last slave
> 	if (bond == NULL)
> 		return NULL;
>-	slave_next = bond_next_slave(bond, slave);
>+	slave_next = bond_next_slave_rcu(bond, slave);
> 	if (!slave_next || bond_is_first_slave(bond, slave_next))
> 		return NULL;
>
>@@ -2417,16 +2417,14 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
>
> int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> {
>-	struct slave *slave, *start_at;
> 	struct bonding *bond = netdev_priv(dev);
>+	struct slave *slave;
> 	int slave_agg_no;
> 	int slaves_in_agg;
> 	int agg_id;
>-	int i;
> 	struct ad_info ad_info;
> 	int res = 1;
>
>-	read_lock(&bond->lock);
> 	if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
> 		pr_debug("%s: Error: __bond_3ad_get_active_agg_info failed\n",
> 			 dev->name);
>@@ -2444,13 +2442,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>
> 	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>
>-	bond_for_each_slave(bond, slave) {
>+	bond_for_each_slave_rcu(bond, slave) {
> 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>
> 		if (agg && (agg->aggregator_identifier == agg_id)) {
>-			slave_agg_no--;
>-			if (slave_agg_no < 0)
>-				break;
>+			if (--slave_agg_no < 0) {
>+				if (SLAVE_IS_OK(slave)) {
>+					res = bond_dev_queue_xmit(bond,
>+						skb, slave->dev);
>+					goto out;
>+				}
>+			}
> 		}
> 	}

So here you are checking for any suitable slave from slave number
slave_agg_no+1 and till the last slave. Ok.

Some nitpicks, though not critical - slave_agg_no will always get
decremented, even if it's negative. It's ok though harder to
read/understand. The triple cascaded ifs can be omited, from (used the same
code):

if (agg && (agg->aggregator_identifier == agg_id)) {
	if (--slave_agg_no < 0) {
		if (SLAVE_IS_OK(slave)) {
			do_something();
			goto out;
		}
	}
}

to (again, used the same code):

if (!agg || agg->aggregator_identifier != agg_id)
	continue;

if (--slave_agg_no >= 0)
	continue;

if (SLAVE_IS_OK(slave)) {
	do_something();
	goto out;
}


Which is a lot easier to read/understand. Though, again, there are small
nitpicks and I'm ok with your approach.

>
>@@ -2460,23 +2462,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> 		goto out;
> 	}
>
>-	start_at = slave;
>-
>-	bond_for_each_slave_from(bond, slave, i, start_at) {
>-		int slave_agg_id = 0;
>+	bond_for_each_slave_rcu(bond, slave) {
> 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>
>-		if (agg)
>-			slave_agg_id = agg->aggregator_identifier;
>-
>-		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
>+		if (SLAVE_IS_OK(slave) && agg &&
>+			agg->aggregator_identifier == agg_id) {
> 			res = bond_dev_queue_xmit(bond, skb, slave->dev);
> 			break;
> 		}
> 	}

Ok, so if the first approach fails - you loop through *all* the slaves
again, checking the slaves already checked by the first loop - i.e. the
slaves from slave number slave_agg_no and till the last slave.

It's suboptimal, but still should work. You can optimize this by storing a
'first ok slave' in the first loop, and if the first loop fails to find any
slave *after* slave_agg_no, you can use that first_ok_slave (if found) to
send. This way you'll drop the second loop entirely. It's the way that I've
done in my patch - pls consult my function :) :

http://patchwork.ozlabs.org/patch/277701/

bond_for_each_slave(bond, slave, iter) {
         agg = SLAVE_AD_INFO(slave).port.aggregator;
         if (!agg || agg->aggregator_identifier != agg_id)
                 continue;

         if (slave_agg_no >= 0) {
                 if (!first_ok_slave && SLAVE_IS_OK(slave))
                         first_ok_slave = slave;
                 slave_agg_no--;
                 continue;
         }

         if (SLAVE_IS_OK(slave)) {
                 res = bond_dev_queue_xmit(bond, skb, slave->dev);
                 goto out;
         }
}

...

if (first_ok_slave)
	res = bond_dev_queue_xmit(bond, skb, first_ok_slave->dev);

>
> out:
>-	read_unlock(&bond->lock);
> 	if (res) {
> 		/* no suitable interface, frame not sent */
> 		kfree_skb(skb);
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 03cf3fd..eb36f57 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -74,13 +74,31 @@
> /* slave list primitives */
> #define bond_to_slave(ptr) list_entry(ptr, struct slave, list)
>
>+/* slave list primitives, Caller must hold rcu_read_lock */
>+#define bond_to_slave_rcu(ptr) list_entry_rcu(ptr, struct slave, list)
>+
>+/* bond_is_empty return NULL if slave list is empty*/
>+#define bond_is_empty(bond) \
>+	(list_empty(&(bond)->slave_list))
>+
>+/* bond_is_empty_rcu return NULL if slave list is empty*/
>+#define bond_is_empty_rcu(bond) \
>+	(!list_first_or_null_rcu(&(bond)->slave_list, struct slave, list))
>+
> /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */
> #define bond_first_slave(bond) \
> 	list_first_entry_or_null(&(bond)->slave_list, struct slave, list)
> #define bond_last_slave(bond) \
>-	(list_empty(&(bond)->slave_list) ? NULL : \
>+	(bond_is_empty(bond) ? NULL : \
> 					   bond_to_slave((bond)->slave_list.prev))
>
>+/**
>+ * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an empty list
>+ * Caller must hold rcu_read_lock
>+ */
>+#define bond_first_slave_rcu(bond) \
>+	list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);
>+
> #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
> #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)
>
>@@ -93,6 +111,16 @@
> 	(bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
> 					  bond_to_slave((pos)->list.prev))
>
>+/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
>+#define bond_next_slave_rcu(bond, pos) \
>+	({struct list_head *__slave_list = &(bond)->slave_list; \
>+	 struct list_head __rcu *__next = list_next_rcu(__slave_list); \
>+	 struct list_head __rcu *__pos_next = list_next_rcu(&(pos)->list); \
>+	 likely(__pos_next != __slave_list) ? \
>+	 container_of(__pos_next, struct slave, list) : \
>+	 container_of(__next, struct slave, list); \
>+	 })
>+
> /**
>  * bond_for_each_slave_from - iterate the slaves list from a starting point
>  * @bond:	the bond holding this list.
>-- 
>1.8.0
>
>
>

^ permalink raw reply

* Re: [PATCH net-next v5 2/6] bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
From: Ding Tianhong @ 2013-09-26  6:19 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <11033.1380169422@death.nxdomain>

On 2013/9/26 12:23, Jay Vosburgh wrote:
> Ding Tianhong <dingtianhong@huawei.com> wrote:
> 
>> On 2013/9/25 18:33, Veaceslav Falico wrote:
>>> On Wed, Sep 25, 2013 at 05:52:19PM +0800, Ding Tianhong wrote:
>>>> There is no pointer needed read lock protection, remove the unnecessary lock
>>>> and improve performance for the 3ad recv path.
> 
> 	How much does removing the lock around the LACPDU receive
> processing improve performance?  This is not high rate traffic; the
> "fast" rate is one LACPDU per second (per port); the default rate is one
> every 30 seconds.
> 

agree.

>>> I don't really understand it. Here's the code path:
>>>
>>> rx_handler (holding rcu_read_lock()) -> bond_handle_frame() ->
>>> bond->recv_probe -> bond_3ad_lacpdu_recv(). So we're holding only the
>>> rcu_read_lock() there. What stops us from racing with
>>> bond_3ad_unbind_slave(), for example?
>>>
>>> As in:
>>>
>>> CPU0                CPU1
>>> --------            -----------
>>> ...                bond_3ad_unbind_slave()
>>> bond_3ad_rx_indication()    ...
>>> if (!port->slave) {        ...            //slave is ok
>>>                 port->slave = NULL;
>>> ad_marker_info_received()    ...
>>> ad_marker_send()        ...
>>> slave = port->slave;        ...
>>> skb->dev = slave->dev;        ...
>>>        ^^^ NULL pointer dereference.
>>>
>>>
>>> I'm not saying that this approach is wrong, maybe I'm missing something,
>>> but when removing locks it's usually a good thing to do - to comment it in
>>> depth in the commit message why it's not already needed.
>>>
>>
>> no, it will not happend, pls review the cold:
>> 	netdev_rx_handler_unregister(slave_dev);
>> 	write_lock_bh(&bond->lock);
>>
>> 	/* Inform AD package of unbinding of slave. */
>> 	if (bond->params.mode == BOND_MODE_8023AD) {
>> 		/* must be called before the slave is
>> 		 * detached from the list
>> 		 */
>> 		bond_3ad_unbind_slave(slave);
>> 	}
>> netdev_rx_handler_unregiste() will remvoe the rx_handle before the bond_3ad_unbind_slave(),
>> it was safe to run bond_3ad_rx_indication(). 
> 
> 	I'm not sure this is safe if bond_3ad_rx_indication is started
> prior to the unbind, e.g.,
> 
> CPU 0				CPU 1
> ----				-----
> bond_3ad_rx_indication
> [ pass port->slave test ]
> [ ... ]				rx_handler_unregister
> 
> [ state machine lock could be
>   contended, forcing us to wait ]
> __get_state_machine_lock
> 
> 				write_lock(&bond->lock)
> 				bond_3ad_unbind_slave()
> 				[ ... ]
> 				port->slave = NULL;
> 
> [ got the lock ]
> ad_rx_machine(lacpdu, port)
> [ detect loopback ]
> pr_err(... port->slave->bond->dev->name)
> 
> 	or that ad_marker case that Veaceslav describes.
> 
> 	-J
> 

yes, I miss one thing here, there  is no rcu_read_lock() here, so when enter
bond_3ad_unbind_slave(), bond_3ad_rx_indication() was running.
we should miss the patch, thanks.

>> Best regards
>> Ding Tianhong
>>



>>
>>>>
>>>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>>>> Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>>>> ---
>>>> drivers/net/bonding/bond_3ad.c | 2 --
>>>> 1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>>> index 7a3860f..c134f43 100644
>>>> --- a/drivers/net/bonding/bond_3ad.c
>>>> +++ b/drivers/net/bonding/bond_3ad.c
>>>> @@ -2494,9 +2494,7 @@ int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond,
>>>>     if (!lacpdu)
>>>>         return ret;
>>>>
>>>> -    read_lock(&bond->lock);
>>>>     ret = bond_3ad_rx_indication(lacpdu, slave, skb->len);
>>>> -    read_unlock(&bond->lock);
>>>>     return ret;
>>>> }
>>>>
>>>> -- 
>>>> 1.8.2.1
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH v5 net-next 11/27] bonding: rework bond_3ad_xmit_xor() to use bond_for_each_slave() only
From: Ding Tianhong @ 2013-09-26  6:39 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, jiri, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <20130926054317.GA2547@redhat.com>

On 2013/9/26 13:43, Veaceslav Falico wrote:
> On Thu, Sep 26, 2013 at 10:31:35AM +0800, Ding Tianhong wrote:
> 
> First of all - thanks a lot for the review! Answered below.
> 
>> On 2013/9/25 15:20, Veaceslav Falico wrote:
>>> Currently, there are two loops - first we find the first slave in an
>>> aggregator after the xmit_hash_policy() returned number, and after that we
>>> loop from that slave, over bonding head, and till that slave to find any
>>> suitable slave to send the packet through.
>>>
>>> Replace it by just one bond_for_each_slave() loop, which first loops
>>> through the requested number of slaves, saving the first suitable one, and
>>> after that we've hit the requested number of slaves to skip - search for
>>> any up slave to send the packet through. If we don't find such kind of
>>> slave - then just send the packet through the first suitable slave found.
>>>
>>> Logic remains unchainged, and we skip two loops. Also, refactor it a bit
>>> for readability.
>>>
>>> CC: Jay Vosburgh <fubar@us.ibm.com>
>>> CC: Andy Gospodarek <andy@greyhouse.net>
>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>> ---
>>>
>>> Notes:
>>>     v4  -> v5
>>>     No change.
>>>
>>>     v3  -> v4:
>>>     No change.
>>>
>>>     v2  -> v3:
>>>     No change.
>>>
>>>     v1  -> v2:
>>>     No changes.
>>>
>>>     RFC -> v1:
>>>     New patch.
>>>
>>>  drivers/net/bonding/bond_3ad.c | 46 ++++++++++++++++++++----------------------
>>>  1 file changed, 22 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>> index 3847aee..c861ee7 100644
>>> --- a/drivers/net/bonding/bond_3ad.c
>>> +++ b/drivers/net/bonding/bond_3ad.c
>>> @@ -2417,15 +2417,15 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
>>>
>>>  int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>>  {
>>> -    struct slave *slave, *start_at;
>>>      struct bonding *bond = netdev_priv(dev);
>>> +    struct slave *slave, *first_ok_slave;
>>> +    struct aggregator *agg;
>>> +    struct ad_info ad_info;
>>>      struct list_head *iter;
>>> -    int slave_agg_no;
>>>      int slaves_in_agg;
>>> -    int agg_id;
>>> -    int i;
>>> -    struct ad_info ad_info;
>>> +    int slave_agg_no;
>>>      int res = 1;
>>> +    int agg_id;
>>>
>>>      read_lock(&bond->lock);
>>>      if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
>>> @@ -2438,20 +2438,28 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>>      agg_id = ad_info.aggregator_id;
>>>
>>>      if (slaves_in_agg == 0) {
>>> -        /*the aggregator is empty*/
>>>          pr_debug("%s: Error: active aggregator is empty\n", dev->name);
>>>          goto out;
>>>      }
>>>
>>>      slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>>> +    first_ok_slave = NULL;
>>>
>>>      bond_for_each_slave(bond, slave, iter) {
>>> -        struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>>> +        agg = SLAVE_AD_INFO(slave).port.aggregator;
>>> +        if (!agg || agg->aggregator_identifier != agg_id)
>>> +            continue;
>>>
>>> -        if (agg && (agg->aggregator_identifier == agg_id)) {
>>> +        if (slave_agg_no >= 0) {
>>> +            if (!first_ok_slave && SLAVE_IS_OK(slave))
>>> +                first_ok_slave = slave;
>>>              slave_agg_no--;
>>> -            if (slave_agg_no < 0)
>>> -                break;
>>> +            continue;
>>> +        }
> 
> [1]        ^^^^^^^^^^^^^^^^^^^^^^^^^
> 
>>> +
>>> +        if (SLAVE_IS_OK(slave)) {
>>> +            res = bond_dev_queue_xmit(bond, skb, slave->dev);
>>> +            goto out;
>>>          }
> 
> [2]        ^^^^^^^^^^^^^^^^^^^^^^^^^
>

>>
>> I think you miss something, you will send skb always by the first suitable port,
>> it could not support load balance.
> 
> Well, yes, it will send always by the first suitable port AFTER
> slave_agg_no, returned by the xmit_hash_policy(), which is the whole point
> in the hash function and in the load balance.
> 
> It will first loop through the slaves, decrementing slave_agg_no to a
> negative value, while saving the first good to send slave, as shown in [1].
> Notice the "continue;":
> 
> if (slave_agg_no >= 0) {
>     if (!first_ok_slave && SLAVE_IS_OK(slave))
>         first_ok_slave = slave;
>     slave_agg_no--;
>     continue;
> }
> 

yes, I got your opinion, tnanks.

> Once we hit the negative value - which means we've skipped enough slaves,
> as requested by the hash function - we can start looking for the first
> slave that is good to send AFTER those all skipped slaves, as shown in [2].
> 
> Down the patch we also use that 'first_ok_slave' - in case we didn't find
> any suitable one after we've skipped first slave_agg_no and till the last
> slave, so it implements the same 'circular' logic as was in
> bond_for_each_slave_from().
> 
>> pls consult my function.
>>         if (agg && (agg->aggregator_identifier == agg_id)) {
>> -            slave_agg_no--;
>> -            if (slave_agg_no < 0)
>> -                break;
>> +            if (--slave_agg_no < 0) {
>> +                if (SLAVE_IS_OK(slave)) {
>> +                    res = bond_dev_queue_xmit(bond,
>> +                        skb, slave->dev);
>> +                    goto out;
>> +                }
>> +            }
> 
> I'll review your function in your patch.
> 
>>         }
>>     }
>>
>>>      }
>>>
>>> @@ -2461,20 +2469,10 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>>          goto out;
>>>      }
>>>
>>> -    start_at = slave;
>>> -
>>> -    bond_for_each_slave_from(bond, slave, i, start_at) {
>>> -        int slave_agg_id = 0;
>>> -        struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>>> -
>>> -        if (agg)
>>> -            slave_agg_id = agg->aggregator_identifier;
>>> -
>>> -        if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
>>> -            res = bond_dev_queue_xmit(bond, skb, slave->dev);
>>> -            break;
>>> -        }
>>> -    }
>>> +    /* we couldn't find any suitable slave after the agg_no, so use the
>>> +     * first suitable found, if found. */
>>> +    if (first_ok_slave)
>>> +        res = bond_dev_queue_xmit(bond, skb, first_ok_slave->dev);
>>>
>>>  out:
>>>      read_unlock(&bond->lock);
>>>
>>
>>
> 
> .
> 

^ permalink raw reply

* Re: [PATCH] ARM: shmobile: Lager: add Micrel KSZ8041 PHY fixup
From: Magnus Damm @ 2013-09-26  6:50 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Simon Horman [Horms], SH-Linux, Russell King - ARM Linux,
	linux-arm-kernel@lists.infradead.org, Laurent Pinchart, netdev
In-Reply-To: <201309140429.23474.sergei.shtylyov@cogentembedded.com>

On Fri, Sep 13, 2013 at 5:29 PM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> Currently on the Lager board NFS timeouts/delays are seen when booting.  That
> turned out to happen because the SoC's ETH_LINK signal turns on and off after
> each packet.  It is connected to Micrel KSZ8041 PHY's LED0 signal. Ether LEDs
> on the Lager board are named LINK and ACTIVE which corresponds to non-default
> 01 setting of the PHY control register 1 bits 14-15. The 'sh_eth' driver resets
> the PHY when opening the network device, so we have to set the mentioned bits
> back to 01 from the default 00 value which causes bouncing of ETH_LINK.  That
> can be achieved using the PHY platform fixup mechanism if we also modify the
> driver to use it..
>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Hi Sergei,

Thanks for your efforts on this. Nice to see that Ethernet for Lager
board support is improving.

Can you please share with us with link speeds you tested? I suspect
that this patch is only needed for some case, like for instance 100
MBit Full Duplex. Fixing the PHY settings makes sense even though only
a single mode needs it, but knowing which link speeds that are known
to work would help a lot.

Cheers,

/ magnus

^ permalink raw reply

* Re: [PATCH 04/11] ath: Remove extern from function prototypes
From: Kalle Valo @ 2013-09-26  7:30 UTC (permalink / raw)
  To: Joe Perches
  Cc: netdev, linux-wireless, Jouni Malinen, Luis R. Rodriguez, ath10k,
	John W. Linville, ath9k-devel, David S. Miller, linux-kernel
In-Reply-To: <a3dabaf02d36dbb4051188b706a3e66e6465c56b.1380137609.git.joe@perches.com>

Joe Perches <joe@perches.com> writes:

> There are a mix of function prototypes with and without extern
> in the kernel sources.  Standardize on not using extern for
> function prototypes.
>
> Function prototypes don't need to be written with extern.
> extern is assumed by the compiler.  Its use is as unnecessary as
> using auto to declare automatic/local variables in a block.
>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>  drivers/net/wireless/ath/ath10k/debug.h  | 8 ++++----
>  drivers/net/wireless/ath/ath6kl/common.h | 3 +--
>  drivers/net/wireless/ath/ath6kl/debug.h  | 9 ++++-----
>  drivers/net/wireless/ath/ath9k/ath9k.h   | 2 +-
>  4 files changed, 10 insertions(+), 12 deletions(-)

For the ath10k and ath6kl changes:

Acked-by: Kalle Valo <kvalo@qca.qualcomm.com>

-- 
Kalle Valo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox