netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
       [not found]         ` <alpine.DEB.2.00.1207191023130.29808-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2012-07-19 16:20           ` Or Gerlitz
       [not found]             ` <500833D9.8000001-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2012-07-19 16:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shlomo Pongartz, roland-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	erezsh-VPRAkNaXOzVWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 7/19/2012 6:24 PM, Christoph Lameter wrote:
> On Thu, 19 Jul 2012, Shlomo Pongartz wrote:
>
>> The garbage collection and stale times follow the default ipv4/6 neigh.default.gc_yyy
>> sysctl values, for example
>>
>> net.ipv4.neigh.default.gc_interval = 30
>> net.ipv4.neigh.default.gc_stale_time = 60
>>
>> If given access to these values from IPoIB, we will be happy
>> to integrate them into that logic
>
> It looks like the values are hardcoded right now.

Two points here,

1s, they are indeed hard-coded since there's no define/enum
that holds their default values (or maybe we should add one now?), see
this code snippest from net/ipv4/arp.c

>         .gc_interval    = 30 * HZ,
>         .gc_thresh1     = 128,
>         .gc_thresh2     = 512,
>         .gc_thresh3     = 1024,

2nd, and even more interesting, the little challenge here is how
to integrate with the sysctl's that allow for changing these values,
the mechanism that uses neigh_sysctl_table in net/core/neighbour.c isn't
exported to the rest of the world. And there's no point to define new
sysctl entries just for managing the IPoIB neighbours, ideas welcome.


>> Please clarify what do you mean by group expiration.
>
> If you have neighbor expiration periods of 4 hrs and it is necessary to
> run the expiration logic then please expire all the neighbor entries due a
> certain period after that as well to avoid running the expiration again in
> the next minute or so.


This is still a bit unclear here... do you mean to say that at a certain 
point in time,
**all** entries need to be deleted irrelevant of their (jiffies) age? why?

> I guess the fuzz factor needs to scale depending on the expiration period.
>
>

and this is what happens now, the factor is 0.5, entry would be deleted when
if  (60m <= unused < 90s) holds

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
       [not found]             ` <500833D9.8000001-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-07-19 17:08               ` David Miller
       [not found]                 ` <20120719.100850.1932622478297549573.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2012-07-24 14:23               ` Christoph Lameter
  1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2012-07-19 17:08 UTC (permalink / raw)
  To: ogerlitz-VPRAkNaXOzVWk0Htik3J/w
  Cc: cl-vYTEC60ixJUAvxtiuMwx3w, shlomop-VPRAkNaXOzVWk0Htik3J/w,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	erezsh-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA

From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Thu, 19 Jul 2012 19:20:41 +0300

> On 7/19/2012 6:24 PM, Christoph Lameter wrote:
>> On Thu, 19 Jul 2012, Shlomo Pongartz wrote:
>>
>>> The garbage collection and stale times follow the default ipv4/6
>>> neigh.default.gc_yyy
>>> sysctl values, for example
>>>
>>> net.ipv4.neigh.default.gc_interval = 30
>>> net.ipv4.neigh.default.gc_stale_time = 60
>>>
>>> If given access to these values from IPoIB, we will be happy
>>> to integrate them into that logic
>>
>> It looks like the values are hardcoded right now.
> 
> Two points here,
> 
> 1s, they are indeed hard-coded since there's no define/enum
> that holds their default values (or maybe we should add one now?), see
> this code snippest from net/ipv4/arp.c

These numbers come from the IPV6 Neighbour Discovery RFCs.  IPV4
replicates the Neighbour Unreachability Detection schemes of IPV6 in
pretty much it's entirety, and therefore takes on the same timeout et
al. parameters.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
       [not found] ` <1342703938-29904-2-git-send-email-ogerlitz@mellanox.com>
       [not found]   ` <alpine.DEB.2.00.1207190938190.28115@router.home>
@ 2012-07-20 15:49   ` Or Gerlitz
       [not found]     ` <CAJZOPZ+kRcBjJgB_HaMqeuB5E-SLSqskgoaLZ_hvVx4KffHgpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2012-07-20 15:49 UTC (permalink / raw)
  To: roland, davem; +Cc: linux-rdma, erezsh, Shlomo Pongratz, Or Gerlitz, netdev

On Thu, Jul 19, 2012 at 4:18 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> From: Shlomo Pongratz <shlomop@mellanox.com>
>
> Dave Miller <davem@davemloft.net> provided a detailed description of why the
> way IPoIB is using neighbours for its own ipoib_neigh struct is buggy:
[...]

> This patch aims to solve the race conditions found in the IPoIB driver.
>
> The patch breaks the connection between the core networking neighbour structure
> and the ipoib_neigh structure. Except for avoiding the race, it allows to in
> under a setup where SKBs carrying IP packets that don't have any associated
> neighbour are transmitted through IPoIB.
>
> We add an ipoib_neigh hash table with 1024 buckets. The hash table key is the destin
> hardware address. Thus the ipoib_neigh is fetched from the hash table and not
> dereferenced from the stashed location at the neighbour structure. The hash table uses
> both RCU and reference count mechanisms to guarantee that no ipoib_neigh instance is
> ever deleted while in use.
>
> Fetching the ipoib_neigh structure instance from the hash also makes the special
> code in ipoib_start_xmit that handles remote and local bonding failover redundant.
>
> Aged ipoib_neigh instances are deleted by a garbage collection task that runs every
> 30 seconds and deletes every ipoib_neigh instance that was idle for at least 60
> seconds. The deletion is safe since the ipoib_neigh instances are protected
> using RCU and reference count mechanisms.

Hi Dave, Roland, Eric

So how does this look? in the right direction? anything that need to be fixed?

Or.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
       [not found]                 ` <20120719.100850.1932622478297549573.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2012-07-22  5:29                   ` Or Gerlitz
       [not found]                     ` <500B8FBE.4030600-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2012-07-22  5:29 UTC (permalink / raw)
  To: David Miller, cl-vYTEC60ixJUAvxtiuMwx3w
  Cc: shlomop-VPRAkNaXOzVWk0Htik3J/w, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, erezsh-VPRAkNaXOzVWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA

On 7/19/2012 8:08 PM, David Miller wrote:
> These numbers come from the IPV6 Neighbour Discovery RFCs.  IPV4 replicates the Neighbour Unreachability Detection schemes of IPV6 in pretty much it's entirety, and therefore takes on the same timeout et al. parameters.

OK, got it. At this point, I guess we should enhance the patch to use 
the values plugged into the IPv4 arp table at the time IPoIB is loaded, 
with arp_tbl being exported its easy to achieve. This would allow users 
to use non-default values by the ipoib neigh handling logic. In a later 
step, we need to see if/how to allow ipoib to use the already existing 
sysctl entries, makes sense?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
       [not found]     ` <CAJZOPZ+kRcBjJgB_HaMqeuB5E-SLSqskgoaLZ_hvVx4KffHgpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-07-23 16:58       ` Or Gerlitz
  2012-07-23 17:17         ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2012-07-23 16:58 UTC (permalink / raw)
  To: Eric Dumazet, roland-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, erezsh-VPRAkNaXOzVWk0Htik3J/w,
	Shlomo Pongratz, netdev-u79uwXL29TY76Z2rM5mHXA

On 20/07/2012 18:49, Or Gerlitz wrote:
> On Thu, Jul 19, 2012 at 4:18 PM, Or Gerlitz<ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>  wrote:
>> From: Shlomo Pongratz<shlomop-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Dave Miller<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>  provided a detailed description of why the
>> way IPoIB is using neighbours for its own ipoib_neigh struct is buggy:
> [...]
>
>> This patch aims to solve the race conditions found in the IPoIB driver.
>>
>> The patch breaks the connection between the core networking neighbour structure
>> and the ipoib_neigh structure. Except for avoiding the race, it allows to in
>> under a setup where SKBs carrying IP packets that don't have any associated
>> neighbour are transmitted through IPoIB.
>>
>> We add an ipoib_neigh hash table with 1024 buckets. The hash table key is the destin
>> hardware address. Thus the ipoib_neigh is fetched from the hash table and not
>> dereferenced from the stashed location at the neighbour structure. The hash table uses
>> both RCU and reference count mechanisms to guarantee that no ipoib_neigh instance is
>> ever deleted while in use.
>>
>> Fetching the ipoib_neigh structure instance from the hash also makes the special
>> code in ipoib_start_xmit that handles remote and local bonding failover redundant.
>>
>> Aged ipoib_neigh instances are deleted by a garbage collection task that runs every
>> 30 seconds and deletes every ipoib_neigh instance that was idle for at least 60
>> seconds. The deletion is safe since the ipoib_neigh instances are protected
>> using RCU and reference count mechanisms.
>
> Hi Dave, Roland, Eric
>
> So how does this look? in the right direction? anything that need to be fixed?

Sorry for the possible spam, but resending (last time forgot to put Eric 
on the "to" list so he might missed it, also added Christoph) - this is 
a fix for very long time bug in IPoIB and we do want it to be reviewed 
&& and hopefully accepted, or if needed get feedback and fix/change.

So, any further feedback? there was one feedback on V0, not to use read 
lock for RCU protected
hash table lookup, and it was addressed in V1.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
  2012-07-23 16:58       ` Or Gerlitz
@ 2012-07-23 17:17         ` Eric Dumazet
  2012-07-23 18:37           ` Or Gerlitz
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2012-07-23 17:17 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: roland, davem, Christoph Lameter, linux-rdma, erezsh,
	Shlomo Pongratz, netdev

On Mon, 2012-07-23 at 19:58 +0300, Or Gerlitz wrote:

> Sorry for the possible spam, but resending (last time forgot to put Eric 
> on the "to" list so he might missed it, also added Christoph) - this is 
> a fix for very long time bug in IPoIB and we do want it to be reviewed 
> && and hopefully accepted, or if needed get feedback and fix/change.
> 
> So, any further feedback? there was one feedback on V0, not to use read 
> lock for RCU protected
> hash table lookup, and it was addressed in V1.
> 

I have no idea of what you are talking about, I have not the patch or a
copy of it ;)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
  2012-07-23 17:17         ` Eric Dumazet
@ 2012-07-23 18:37           ` Or Gerlitz
  0 siblings, 0 replies; 9+ messages in thread
From: Or Gerlitz @ 2012-07-23 18:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, roland-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Christoph Lameter,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, erezsh-VPRAkNaXOzVWk0Htik3J/w,
	Shlomo Pongratz, netdev-u79uwXL29TY76Z2rM5mHXA

On Mon, Jul 23, 2012 at 8:17 PM, Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> I have no idea of what you are talking about, I have not the patch or a
> copy of it ;)

http://marc.info/?t=134271491300002&r=1&w=2
http://marc.info/?t=134271491300005&r=1&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
       [not found]             ` <500833D9.8000001-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2012-07-19 17:08               ` David Miller
@ 2012-07-24 14:23               ` Christoph Lameter
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2012-07-24 14:23 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Shlomo Pongartz, roland-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	erezsh-VPRAkNaXOzVWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Thu, 19 Jul 2012, Or Gerlitz wrote:

> > If you have neighbor expiration periods of 4 hrs and it is necessary to
> > run the expiration logic then please expire all the neighbor entries due a
> > certain period after that as well to avoid running the expiration again in
> > the next minute or so.
>
>
> This is still a bit unclear here... do you mean to say that at a certain point
> in time,
> **all** entries need to be deleted irrelevant of their (jiffies) age? why?

No. Just the ones in a certain time frame.
>
> > I guess the fuzz factor needs to scale depending on the expiration period.
> >
> >
>
> and this is what happens now, the factor is 0.5, entry would be deleted when
> if  (60m <= unused < 90s) holds

Ok that sounds good and that is what I meant.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
       [not found]                     ` <500B8FBE.4030600-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-07-24 14:24                       ` Christoph Lameter
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2012-07-24 14:24 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David Miller, shlomop-VPRAkNaXOzVWk0Htik3J/w,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	erezsh-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA

On Sun, 22 Jul 2012, Or Gerlitz wrote:

> On 7/19/2012 8:08 PM, David Miller wrote:
> > These numbers come from the IPV6 Neighbour Discovery RFCs.  IPV4 replicates
> > the Neighbour Unreachability Detection schemes of IPV6 in pretty much it's
> > entirety, and therefore takes on the same timeout et al. parameters.
>
> OK, got it. At this point, I guess we should enhance the patch to use the
> values plugged into the IPv4 arp table at the time IPoIB is loaded, with
> arp_tbl being exported its easy to achieve. This would allow users to use
> non-default values by the ipoib neigh handling logic. In a later step, we need
> to see if/how to allow ipoib to use the already existing sysctl entries, makes
> sense?

Sounds about right.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-07-24 14:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1342703938-29904-1-git-send-email-ogerlitz@mellanox.com>
     [not found] ` <1342703938-29904-2-git-send-email-ogerlitz@mellanox.com>
     [not found]   ` <alpine.DEB.2.00.1207190938190.28115@router.home>
     [not found]     ` <50082183.5000402@mellanox.com>
     [not found]       ` <alpine.DEB.2.00.1207191023130.29808@router.home>
     [not found]         ` <alpine.DEB.2.00.1207191023130.29808-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2012-07-19 16:20           ` [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system Or Gerlitz
     [not found]             ` <500833D9.8000001-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-07-19 17:08               ` David Miller
     [not found]                 ` <20120719.100850.1932622478297549573.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2012-07-22  5:29                   ` Or Gerlitz
     [not found]                     ` <500B8FBE.4030600-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-07-24 14:24                       ` Christoph Lameter
2012-07-24 14:23               ` Christoph Lameter
2012-07-20 15:49   ` Or Gerlitz
     [not found]     ` <CAJZOPZ+kRcBjJgB_HaMqeuB5E-SLSqskgoaLZ_hvVx4KffHgpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-23 16:58       ` Or Gerlitz
2012-07-23 17:17         ` Eric Dumazet
2012-07-23 18:37           ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).