From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cumulusnetworks.com; s=google; h=from:subject:to:references:cc:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=oA8cs2dcP5/D4fHJw3+b1dQiPnjDGv2NsTfNzYGt7fI=; b=E1FIJLjDZIeRbDqqeWUBAg4xGKhw0XP6ovrEiqFNk1sPkHeknl40aQlUFuVyK2OUtK 2BigN9WBYl7mQCiTXvc3J5Q754+MwoIE3XQc13DJ/6KYtlidyCCU+Ei4TQFbhbFPxtVF PjALez7M6c6NSkqGWkyARmzdXz1t0phg5d5cM= From: Nikolay Aleksandrov References: <1440549295-3979-1-git-send-email-razor@blackwall.org> <20150825.194222.390859854071446877.davem@davemloft.net> <20150825.230641.773630246486190390.davem@davemloft.net> <55DE98AF.8000503@cumulusnetworks.com> <2125A434-6529-4D5A-BA6B-9F64C6B7A8C0@cumulusnetworks.com> <55DFA1A3.30601@redhat.com> <691CF770-DDF3-4AC9-B99C-9640992037C5@cumulusnetworks.com> <55E05486.5090500@redhat.com> <55E106DC.5040802@redhat.com> Message-ID: <55F578AB.9000705@cumulusnetworks.com> Date: Sun, 13 Sep 2015 15:22:51 +0200 MIME-Version: 1.0 In-Reply-To: <55E106DC.5040802@redhat.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Subject: Re: [Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans List-Id: Linux Ethernet Bridging List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: vyasevic@redhat.com Cc: bridge@lists.linux-foundation.org, netdev@vger.kernel.org, roopa , wkok@cumulusnetworks.com, David Miller On 08/29/2015 03:11 AM, Vlad Yasevich wrote: > On 08/28/2015 11:26 AM, Nikolay Aleksandrov wrote: >> >>> On Aug 28, 2015, at 5:31 AM, Vlad Yasevich wrote: >>> >>> On 08/27/2015 10:17 PM, Nikolay Aleksandrov wrote: <<>> >>> >>> I don't remember learning being all that complicated. The hash only changed under >>> rtnl when vlans were added/removed. The nice this is that we wouldn't need >>> to rebalance, because if the vlan is removed all fdb links get removed too. They >>> don't move to another bucket (But that was with static hash. Need to look at rhash in >>> more detail). >>> >>> If you want, I might still have patches hanging around on my machine that had a hash >>> table implementation. I can send them to you. >>> >>> -vlad >>> >> >> :-) Okay, I’m putting the crystal ball away. If you could send me these patches it’d be great so >> I don’t have to start this from scratch. >> > > So, I forgot that I lost an old disk that had all that code, so I am a bit bummed about > that. I did however find the series that got posted. > http://www.spinics.net/lists/netdev/msg219737.html > > That was the series where I briefly switch from bitmaps to hash and list. > However, I see that the fdb code that was playing with never got posted... > > Sorry. > > -vlad > So I've been looking into this for some time now and did a basic implementation of vlan handling using rhashtables, here are some thoughts and a slightly different proposition. First a few scenarios (the memory footprint is only the extra memory needed for the vlans): Current memory footprint for 48 ports & 2000 vlans ~ 50k 1. Bridge with vlan hash with port bitmaps (similar to Vlad's first set) - On input we have hash lookup + bitmap lookup - If (r)hashtable is used we need additional list to handle stable list walks which are needed all over the place from error handling to compressed vlan dumps which actually need this list to be kept sorted since the already exposed user interfaces need to be handled without visible changes, but they also allow for per-port vlan compressed dumping which isn't easy to handle. Mostly the stability issue with rhashtable is with resizing since these entries change only under rtnl, also we need the sorted order because of the compressed dump. One alternative way to solve this is to build the sorted list each time a dump is requested, but again this falls under the workarounds needed to satisfy current behaviour requirements. If this is chosen my preference is to have the vlans also in a list which is kept sorted for the walks, then the compressed request can be satisfied easier. - memory footprint for 2000 vlans with 48 ports ~ 1.5 MB 2. Bridge with vlan hash, ports with vlan hashes (need a special per-port struct because of the tagged/untagged case, we basically need per-port per-vlan flags) - On input we have 1 hash lookup only from the port vlan hash where get a pointer to the bridge's vlan entry so we get the global vlan context as well as the local - Same rhashtable handling requirements apply + more complexity & memory due to having to keep in sync multiple (per-port, per-bridge global) rhashtables - memory footprint for 2000 vlans with 48 ports ~ 2.6 MB Up until now I've done partially point 1 to see how much churn it would take and the basic change is huge. Also the memory footprint increases a lot. So I'd propose a third option which you may call middle ground between the current implementation (which is very fast and compact) and points 1 & 2: What do you think about adding an auxiliary per-vlan global context using rhashtable which is not used in the ingress/egress decision making ? We can contain it via either a Kconfig option (so it can be compiled out) or via a dynamic run-time option so people who would like more features can enabled it on demand and are willing to trade some performance and memory. This way we won't have to change most of the current API and won't have to add workarounds to keep the user-facing behaviour the same, also the syncing is reduced to a refcount and the memory footprint is kept minimal. The initial new features I'd like to introduce are per-vlan counters and also per-vlan flags which at first will be used to enable/disable multicast on a vlan basis. In terms of performance if this is enabled it is close to point 1 but without the changes all over the API and more importantly with much less memory footprint. The memory footprint of this option with 2000 vlans & 48 ports ~ +70k (without the per-cpu counters, any additional feature will naturally add to this). This is because we don't have a per-port increase for each vlan added and only keep the global context. If it's acceptable to take the performance/memory hit and the huge churn, then I can continue with 1 or 2, but I'm not a big fan of that idea. Feedback before I go any further on this would be much appreciated. Thank you, Nik From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikolay Aleksandrov Subject: Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans Date: Sun, 13 Sep 2015 15:22:51 +0200 Message-ID: <55F578AB.9000705@cumulusnetworks.com> References: <1440549295-3979-1-git-send-email-razor@blackwall.org> <20150825.194222.390859854071446877.davem@davemloft.net> <20150825.230641.773630246486190390.davem@davemloft.net> <55DE98AF.8000503@cumulusnetworks.com> <2125A434-6529-4D5A-BA6B-9F64C6B7A8C0@cumulusnetworks.com> <55DFA1A3.30601@redhat.com> <691CF770-DDF3-4AC9-B99C-9640992037C5@cumulusnetworks.com> <55E05486.5090500@redhat.com> <55E106DC.5040802@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: bridge@lists.linux-foundation.org, netdev@vger.kernel.org, roopa , wkok@cumulusnetworks.com, David Miller To: vyasevic@redhat.com Return-path: In-Reply-To: <55E106DC.5040802@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: bridge-bounces@lists.linux-foundation.org Errors-To: bridge-bounces@lists.linux-foundation.org List-Id: netdev.vger.kernel.org On 08/29/2015 03:11 AM, Vlad Yasevich wrote: > On 08/28/2015 11:26 AM, Nikolay Aleksandrov wrote: >> >>> On Aug 28, 2015, at 5:31 AM, Vlad Yasevich wrot= e: >>> >>> On 08/27/2015 10:17 PM, Nikolay Aleksandrov wrote: <<>> >>> >>> I don't remember learning being all that complicated. The hash only = changed under >>> rtnl when vlans were added/removed. The nice this is that we wouldn'= t need >>> to rebalance, because if the vlan is removed all fdb links get remove= d too. They >>> don't move to another bucket (But that was with static hash. Need to= look at rhash in >>> more detail). >>> >>> If you want, I might still have patches hanging around on my machine = that had a hash >>> table implementation. I can send them to you. >>> >>> -vlad >>> >> >> :-) Okay, I=E2=80=99m putting the crystal ball away. If you could send= me these patches it=E2=80=99d be great so >> I don=E2=80=99t have to start this from scratch. >> >=20 > So, I forgot that I lost an old disk that had all that code, so I am a = bit bummed about > that. I did however find the series that got posted. > http://www.spinics.net/lists/netdev/msg219737.html >=20 > That was the series where I briefly switch from bitmaps to hash and lis= t. > However, I see that the fdb code that was playing with never got posted= ... >=20 > Sorry. >=20 > -vlad >=20 So I've been looking into this for some time now and did a basic implemen= tation of vlan handling using rhashtables, here are some thoughts and a slightly different propos= ition. First a few scenarios (the memory footprint is only the extra memory need= ed for the vlans): Current memory footprint for 48 ports & 2000 vlans ~ 50k 1. Bridge with vlan hash with port bitmaps (similar to Vlad's first set) - On input we have hash lookup + bitmap lookup - If (r)hashtable is used we need additional list to handle stable list w= alks which are needed all over the place from error handling to compressed vlan dumps wh= ich actually need this list to be kept sorted since the already exposed user interface= s need to be handled without visible changes, but they also allow for per-port vlan= compressed dumping which isn't easy to handle. Mostly the stability issue with rhash= table is with resizing since these entries change only under rtnl, also we need= the sorted order because of the compressed dump. One alternative way to solve this i= s to build the sorted list each time a dump is requested, but again this falls under the= workarounds needed to satisfy current behaviour requirements. If this is chosen my preference is to have the vlans also in a list which= is kept sorted for the walks, then the compressed request can be satisfied easier. - memory footprint for 2000 vlans with 48 ports ~ 1.5 MB 2. Bridge with vlan hash, ports with vlan hashes (need a special per-port= struct because of the tagged/untagged case, we basically need per-port per-vlan flags) - On input we have 1 hash lookup only from the port vlan hash where get a= pointer to the bridge's vlan entry so we get the global vlan context as well as t= he local - Same rhashtable handling requirements apply + more complexity & memory = due to having to keep in sync multiple (per-port, per-bridge global) rhashtables - memory footprint for 2000 vlans with 48 ports ~ 2.6 MB Up until now I've done partially point 1 to see how much churn it would t= ake and the basic change is huge. Also the memory footprint increases a lot. So I'd propose a third option which you may call middle ground between th= e current implementation (which is very fast and compact) and points 1 & 2: What do you think about adding an auxiliary per-vlan global context using= rhashtable which is not used in the ingress/egress decision making ? We can contain = it via either a Kconfig option (so it can be compiled out) or via a dynamic = run-time option so people who would like more features can enabled it on demand and are w= illing to trade some performance and memory. This way we won't have to change most of the current API and won't have t= o add workarounds to keep the user-facing behaviour the same, also the syncing is reduced t= o a refcount and the memory footprint is kept minimal. The initial new features I'd like to introduce are per-vlan counters and = also per-vlan flags which at first will be used to enable/disable multicast on a vlan b= asis. In terms of performance if this is enabled it is close to point 1 but wit= hout the changes all over the API and more importantly with much less memory footprint. The memory footprint of this option with 2000 vlans & 48 ports ~ +70k (wi= thout the per-cpu counters, any additional feature will naturally add to this). This is bec= ause we don't have a per-port increase for each vlan added and only keep the global con= text. If it's acceptable to take the performance/memory hit and the huge churn,= then I can continue with 1 or 2, but I'm not a big fan of that idea. Feedback before I go any further on this would be much appreciated. Thank you, Nik