Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] netfilter: log: protect nf_log_register against double registering
From: Marcelo Ricardo Leitner @ 2014-10-22 12:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev
In-Reply-To: <20141022120255.GA21821@salvia>

On 22-10-2014 10:02, Pablo Neira Ayuso wrote:
> On Mon, Oct 20, 2014 at 07:58:03PM -0200, Marcelo Ricardo Leitner wrote:
>> Currently, despite the comment right before the function,
>> nf_log_register allows registering two loggers on with the same type and
>> end up overwriting the previous register.
>>
>> Not a real issue today as current tree doesn't have two loggers for the
>> same type but it's better to get this protected.
>>
>> Also make sure that all of its callers do error checking.
>
> No major objetions to this sanity check. Some comment below.
>
>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>> ---
>>
>> Notes:
>>      Please let me know if you have any issues with the identation on
>>      nf_log_register. I just couldn't find a better one.
>
> You can split nf_log_register() in two functions to avoid this.

Sorry but I don't follow this one. You mean having the check on 
nf_log_register() and then calling a __nf_log_register() to actually register it?

Now I'm thinking on wrapping
             rcu_dereference_protected(loggers[i][logger->type],
+					lockdep_is_held(&nf_log_mutex))
into a macro or something like that, because that's the issue in there and 
this construction is called several times. Something like:

#define logger_deref_protected(pf, type) \
         rcu_dereference_protected(loggers[pf][type], \
                                   lockdep_is_held(&nf_log_mutex));

WDYT?

>>      Thanks
>>
>>   net/ipv4/netfilter/nf_log_arp.c  |  8 +++++++-
>>   net/ipv4/netfilter/nf_log_ipv4.c |  8 +++++++-
>>   net/ipv6/netfilter/nf_log_ipv6.c |  8 +++++++-
>>   net/netfilter/nf_log.c           | 13 ++++++++++++-
>>   4 files changed, 33 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/ipv4/netfilter/nf_log_arp.c b/net/ipv4/netfilter/nf_log_arp.c
>> index ccfc78db12ee8acae68faf451f2cf6bc5597f2c1..8b39174b7be390397a110ec9d3ed497bf8ce6d26 100644
>> --- a/net/ipv4/netfilter/nf_log_arp.c
>> +++ b/net/ipv4/netfilter/nf_log_arp.c
>> @@ -130,7 +130,13 @@ static int __init nf_log_arp_init(void)
>>   	if (ret < 0)
>>   		return ret;
>>
>> -	nf_log_register(NFPROTO_ARP, &nf_arp_logger);
>> +	ret = nf_log_register(NFPROTO_ARP, &nf_arp_logger);
>> +	if (ret < 0) {
>> +		pr_err("log: failed to register logger\n");
>
> I think you can use pr_fmt to avoid appending log, it's also going to
> append useful information to know which logger has indeed failed.

Sure. It was taken from nfnetlink_log_init() so I'll take the shot to update 
it too.

>> +		unregister_pernet_subsys(&nf_log_arp_net_ops);
>> +		return ret;
>
> Could you add a goto err1; instead? Just in case we need to extend
> this again later on, we'll skip some refactoring.

Yes. I'll send a v2

Thanks,
Marcelo


^ permalink raw reply

* Re: [PATCH net-next 2/6] ethernet: wiznet: remove unnecessary check
From: Varka Bhadram @ 2014-10-22 12:24 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: Varka Bhadram
In-Reply-To: <544798AE.1040209@cogentembedded.com>

On 10/22/2014 05:14 PM, Sergei Shtylyov wrote:
> On 10/22/2014 8:16 AM, Varka Bhadram wrote:
>
>> devm_ioremap_resource checks platform_get_resource() return value.
>> We can remove the duplicate check here.
>
>> Signed-off-by: Varka Bhadram <varkab@cdac.in>
>> ---
>>   drivers/net/ethernet/wiznet/w5300.c |    3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>
>> diff --git a/drivers/net/ethernet/wiznet/w5300.c 
>> b/drivers/net/ethernet/wiznet/w5300.c
>> index f961f14..315d090 100644
>> --- a/drivers/net/ethernet/wiznet/w5300.c
>> +++ b/drivers/net/ethernet/wiznet/w5300.c
>> @@ -558,8 +558,7 @@ static int w5300_hw_probe(struct platform_device 
>> *pdev)
>>       }
>>
>>       mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -    if (!mem)
>> -        return -ENXIO;
>> +
>>       mem_size = resource_size(mem);
>
>    Same problem as in the patch #1
>
I will fix it . Thanks

>>
>>       priv->base = devm_ioremap_resource(&pdev->dev, mem);
>>
>
> WBR, Sergei
>


-- 
Regards,
Varka Bhadram.

^ permalink raw reply

* Re: [PATCH] nf_conntrack_proto_tcp: allow server to become a client in TW handling
From: Pablo Neira Ayuso @ 2014-10-22 12:28 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Marcelo Ricardo Leitner, netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.10.1410150854460.19251@blackhole.kfki.hu>

On Wed, Oct 15, 2014 at 09:27:43AM +0200, Jozsef Kadlecsik wrote:
> On Mon, 13 Oct 2014, Marcelo Ricardo Leitner wrote:
> 
> > When a port that was used to listen for inbound connections gets closed
> > and reused for outgoing connections (like rsh ends up doing for stderr
> > flow), current we may reject the SYN/ACK packet for the new connection
> > because tcp_conntracks states forbirds a port to become a client while
> > there is still a TIME_WAIT entry in there for it.
> > 
> > As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
> > for it is 120s, there is a ~60s window that the application can end up
> > opening a port that conntrack will end up blocking.
> > 
> > This patch fixes this by simply allowing such state transition: if we
> > see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
> > that the rest of the code already handles this situation, more
> > specificly in tcp_packet(), first switch clause.
> 
> In those code branch if there was a valid FIN in either direction, we 
> destroy the old connection and a new will be created. That way the rules 
> about NEW connections will be applied, so the policies are not bypassed. 
> Otherwise we just ignore the SYN packet, so if it's invalid, we'll catch 
> the RST from the other side and destroy the conntrack entry. The event 
> flow looks OK to me.
> 
> > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> 
> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/6] ethernet: wiznet: remove unnecessary check
From: Varka Bhadram @ 2014-10-22 12:23 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: Varka Bhadram
In-Reply-To: <54479876.5050202@cogentembedded.com>

On 10/22/2014 05:13 PM, Sergei Shtylyov wrote:
> Hello.
>
> On 10/22/2014 8:16 AM, Varka Bhadram wrote:
>
>> devm_ioremap_resource checks platform_get_resource() return value.
>> We can remove the duplicate check here.
>
>> Signed-off-by: Varka Bhadram <varkab@cdac.in>
>> ---
>>   drivers/net/ethernet/wiznet/w5100.c |    3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>
>> diff --git a/drivers/net/ethernet/wiznet/w5100.c 
>> b/drivers/net/ethernet/wiznet/w5100.c
>> index 0f56b1c..bf195e3 100644
>> --- a/drivers/net/ethernet/wiznet/w5100.c
>> +++ b/drivers/net/ethernet/wiznet/w5100.c
>> @@ -638,8 +638,7 @@ static int w5100_hw_probe(struct platform_device 
>> *pdev)
>>       }
>>
>>       mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -    if (!mem)
>> -        return -ENXIO;
>> +
>>       mem_size = resource_size(mem);
>
>    This would cause a NULL dereference on resource_size() call if 
> 'mem' is NULL. You can't remove the NULL check because of that.
>
Ok i will fix it . Thankx for the review.

>>
>>       priv->base = devm_ioremap_resource(&pdev->dev, mem);
>>
>
> WBR, Sergei
>


-- 
Regards,
Varka Bhadram.

^ permalink raw reply

* Re: [PATCH net-next 5/6] ethernet: renesas: remove unnecessary check
From: Varka Bhadram @ 2014-10-22 12:22 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: Varka Bhadram
In-Reply-To: <5447995F.5080602@cogentembedded.com>

On 10/22/2014 05:17 PM, Sergei Shtylyov wrote:
> On 10/22/2014 8:16 AM, Varka Bhadram wrote:
>
>> devm_ioremap_resource checks platform_get_resource() return value.
>> We can remove the duplicate check here.
>
>> Signed-off-by: Varka Bhadram <varkab@cdac.in>
>> ---
>>   drivers/net/ethernet/renesas/sh_eth.c |    4 ----
>>   1 file changed, 4 deletions(-)
>
>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
>> b/drivers/net/ethernet/renesas/sh_eth.c
>> index 60e9c2c..d824ba5 100644
>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
>> @@ -2769,10 +2769,6 @@ static int sh_eth_drv_probe(struct 
>> platform_device *pdev)
>>
>>       /* get base addr */
>>       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -    if (unlikely(res == NULL)) {
>> -        dev_err(&pdev->dev, "invalid resource\n");
>> -        return -EINVAL;
>> -    }
>
>    The driver dereferences 'res' further on, so you can't remove this 
> check.
>
Yes its my mistake . I will fix it . Thankx

> WBR, Sergei
>
-- 
Regards,
Varka Bhadram.

^ permalink raw reply

* Re: [RFC PATCH v2 net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: Hannes Frederic Sowa @ 2014-10-22 12:13 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: Erik Kline, netdev, Ben Hutchings
In-Reply-To: <CAKD1Yr08muZ0Dwwt5Xmi+mEFP9OBBYe9CceswsEzpm5PkMUCMg@mail.gmail.com>

On Mi, 2014-10-22 at 20:50 +0900, Lorenzo Colitti wrote:
> On Wed, Oct 22, 2014 at 8:36 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > In the near future we also must introduce specific DAD events needed for
> > RFC7217 addresses (to count DAD progress and safe the information per
> > prefix in user space). I would prefer to keep the logic for RTM_NEWADDR
> > that the kernel will definitely allow the use of the address but
> > RTM_NEWADDR should be handled idempotent by user space.
> 
> Duplicate RTM_NEWADDR notifications aren't a problem though, right?
> The current code send RTM_NEWADDR every time an RA increases the
> lifetime of an address, so I think it's reasonable to assume that apps
> have to be able to deal with it, because it's pretty frequent.

I don't think it is a serious problem (or a problem at all). I would
just try to reduce the number of notifications, that's all.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH nf] netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation
From: Pablo Neira Ayuso @ 2014-10-22 12:09 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: kaber, kadlec, netfilter-devel, coreteam, netdev
In-Reply-To: <20141021090821.GA31730@kria>

On Tue, Oct 21, 2014 at 11:08:21AM +0200, Sabrina Dubroca wrote:
> alloc_percpu returns NULL on failure, not a negative error code.

Good catch. Applied, thanks.

^ permalink raw reply

* Re: [PATCH] netfilter: log: protect nf_log_register against double registering
From: Pablo Neira Ayuso @ 2014-10-22 12:02 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netfilter-devel, netdev
In-Reply-To: <c7fcb46fa24c1a64f00c9322bb5cddbde977da0a.1413842217.git.mleitner@redhat.com>

On Mon, Oct 20, 2014 at 07:58:03PM -0200, Marcelo Ricardo Leitner wrote:
> Currently, despite the comment right before the function,
> nf_log_register allows registering two loggers on with the same type and
> end up overwriting the previous register.
> 
> Not a real issue today as current tree doesn't have two loggers for the
> same type but it's better to get this protected.
> 
> Also make sure that all of its callers do error checking.

No major objetions to this sanity check. Some comment below.

> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
> 
> Notes:
>     Please let me know if you have any issues with the identation on
>     nf_log_register. I just couldn't find a better one.

You can split nf_log_register() in two functions to avoid this.

>     Thanks
> 
>  net/ipv4/netfilter/nf_log_arp.c  |  8 +++++++-
>  net/ipv4/netfilter/nf_log_ipv4.c |  8 +++++++-
>  net/ipv6/netfilter/nf_log_ipv6.c |  8 +++++++-
>  net/netfilter/nf_log.c           | 13 ++++++++++++-
>  4 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/nf_log_arp.c b/net/ipv4/netfilter/nf_log_arp.c
> index ccfc78db12ee8acae68faf451f2cf6bc5597f2c1..8b39174b7be390397a110ec9d3ed497bf8ce6d26 100644
> --- a/net/ipv4/netfilter/nf_log_arp.c
> +++ b/net/ipv4/netfilter/nf_log_arp.c
> @@ -130,7 +130,13 @@ static int __init nf_log_arp_init(void)
>  	if (ret < 0)
>  		return ret;
>  
> -	nf_log_register(NFPROTO_ARP, &nf_arp_logger);
> +	ret = nf_log_register(NFPROTO_ARP, &nf_arp_logger);
> +	if (ret < 0) {
> +		pr_err("log: failed to register logger\n");

I think you can use pr_fmt to avoid appending log, it's also going to
append useful information to know which logger has indeed failed.

> +		unregister_pernet_subsys(&nf_log_arp_net_ops);
> +		return ret;

Could you add a goto err1; instead? Just in case we need to extend
this again later on, we'll skip some refactoring.

Thanks.

^ permalink raw reply

* Re: [RFC PATCH v2 net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: Lorenzo Colitti @ 2014-10-22 11:50 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Erik Kline, netdev, Ben Hutchings
In-Reply-To: <1413977771.2337.50.camel@localhost>

On Wed, Oct 22, 2014 at 8:36 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> In the near future we also must introduce specific DAD events needed for
> RFC7217 addresses (to count DAD progress and safe the information per
> prefix in user space). I would prefer to keep the logic for RTM_NEWADDR
> that the kernel will definitely allow the use of the address but
> RTM_NEWADDR should be handled idempotent by user space.

Duplicate RTM_NEWADDR notifications aren't a problem though, right?
The current code send RTM_NEWADDR every time an RA increases the
lifetime of an address, so I think it's reasonable to assume that apps
have to be able to deal with it, because it's pretty frequent.

^ permalink raw reply

* Re: [PATCH net-next 5/6] ethernet: renesas: remove unnecessary check
From: Sergei Shtylyov @ 2014-10-22 11:47 UTC (permalink / raw)
  To: Varka Bhadram, netdev; +Cc: Varka Bhadram
In-Reply-To: <1413951386-29645-6-git-send-email-varkab@cdac.in>

On 10/22/2014 8:16 AM, Varka Bhadram wrote:

> devm_ioremap_resource checks platform_get_resource() return value.
> We can remove the duplicate check here.

> Signed-off-by: Varka Bhadram <varkab@cdac.in>
> ---
>   drivers/net/ethernet/renesas/sh_eth.c |    4 ----
>   1 file changed, 4 deletions(-)

> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> index 60e9c2c..d824ba5 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -2769,10 +2769,6 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
>
>   	/* get base addr */
>   	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	if (unlikely(res == NULL)) {
> -		dev_err(&pdev->dev, "invalid resource\n");
> -		return -EINVAL;
> -	}

    The driver dereferences 'res' further on, so you can't remove this check.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 2/6] ethernet: wiznet: remove unnecessary check
From: Sergei Shtylyov @ 2014-10-22 11:44 UTC (permalink / raw)
  To: Varka Bhadram, netdev; +Cc: Varka Bhadram
In-Reply-To: <1413951386-29645-3-git-send-email-varkab@cdac.in>

On 10/22/2014 8:16 AM, Varka Bhadram wrote:

> devm_ioremap_resource checks platform_get_resource() return value.
> We can remove the duplicate check here.

> Signed-off-by: Varka Bhadram <varkab@cdac.in>
> ---
>   drivers/net/ethernet/wiznet/w5300.c |    3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)

> diff --git a/drivers/net/ethernet/wiznet/w5300.c b/drivers/net/ethernet/wiznet/w5300.c
> index f961f14..315d090 100644
> --- a/drivers/net/ethernet/wiznet/w5300.c
> +++ b/drivers/net/ethernet/wiznet/w5300.c
> @@ -558,8 +558,7 @@ static int w5300_hw_probe(struct platform_device *pdev)
>   	}
>
>   	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	if (!mem)
> -		return -ENXIO;
> +
>   	mem_size = resource_size(mem);

    Same problem as in the patch #1

>
>   	priv->base = devm_ioremap_resource(&pdev->dev, mem);
>

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 1/6] ethernet: wiznet: remove unnecessary check
From: Sergei Shtylyov @ 2014-10-22 11:43 UTC (permalink / raw)
  To: Varka Bhadram, netdev; +Cc: Varka Bhadram
In-Reply-To: <1413951386-29645-2-git-send-email-varkab@cdac.in>

Hello.

On 10/22/2014 8:16 AM, Varka Bhadram wrote:

> devm_ioremap_resource checks platform_get_resource() return value.
> We can remove the duplicate check here.

> Signed-off-by: Varka Bhadram <varkab@cdac.in>
> ---
>   drivers/net/ethernet/wiznet/w5100.c |    3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)

> diff --git a/drivers/net/ethernet/wiznet/w5100.c b/drivers/net/ethernet/wiznet/w5100.c
> index 0f56b1c..bf195e3 100644
> --- a/drivers/net/ethernet/wiznet/w5100.c
> +++ b/drivers/net/ethernet/wiznet/w5100.c
> @@ -638,8 +638,7 @@ static int w5100_hw_probe(struct platform_device *pdev)
>   	}
>
>   	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	if (!mem)
> -		return -ENXIO;
> +
>   	mem_size = resource_size(mem);

    This would cause a NULL dereference on resource_size() call if 'mem' is 
NULL. You can't remove the NULL check because of that.

>
>   	priv->base = devm_ioremap_resource(&pdev->dev, mem);
>

WBR, Sergei

^ permalink raw reply

* Re: [RFC PATCH v2 net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: Hannes Frederic Sowa @ 2014-10-22 11:36 UTC (permalink / raw)
  To: Erik Kline; +Cc: netdev, Ben Hutchings, Lorenzo Colitti
In-Reply-To: <CAAedzxpG1TEVVm_kHgRJcOO7qkFmyyYnfPL0ONN7GcypmtRq1g@mail.gmail.com>

Hi,

On Mi, 2014-10-22 at 20:15 +0900, Erik Kline wrote:
> >> >>       case IPV6_SADDR_RULE_PREFIX:
> >> >>               /* Rule 8: Use longest matching prefix */
> >> >>               ret = ipv6_addr_diff(&score->ifa->addr, dst->addr);
> >> >> @@ -3222,8 +3240,13 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
> >> >>        * Optimistic nodes can start receiving
> >> >>        * Frames right away
> >> >>        */
> >> >> -     if (ifp->flags & IFA_F_OPTIMISTIC)
> >> >> +     if (ifp->flags & IFA_F_OPTIMISTIC) {
> >> >>               ip6_ins_rt(ifp->rt);
> >> >> +             /* Because optimistic nodes can receive frames, notify
> >> >> +              * listeners. If DAD fails, RTM_DELADDR is sent.
> >> >> +              */
> >> >> +             ipv6_ifa_notify(RTM_NEWADDR, ifp);
> >> >> +     }
> >> >
> >> > I wonder if we can now delete the ipv6_ifa_notify(RTM_NEWADDR, ifp) in
> >> > addrconf_dad_completed.
> >>
> >> I don't know what everyone's general preference would be, but mine
> >> would be to err on the side of notifying on state changes.  It seems
> >> harmless to me to keep it in, and something in userspace might want to
> >> know if/when DAD completes.
> >
> > Userspace expects to communicate with an address which gets announced
> > via RTM_NEWADDR, so I would make the RTM_NEWADDR notifications
> > conditional on use_optimistic flag in both, the completed and the
> > dad_begin function. This sounds like the best option to me, no?
> 
> That's easy enough to do in addrconf_dad_begin().  Unfortunately, by
> the time we get to addrconf_dad_completed() the IFA_F_OPTIMISTIC flag
> has already been cleared.
> 
> I have a version that attempts to take its best guess as to whether an
> RTM_NEWADDR _should_ have already been sent--something like:
> 
> #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
>         // We probably already sent a notification in addrconf_dad_begin().
>         if (!ifp->idev->cnf.optimistic_dad || !ifp->idev->cnf.use_optimistic)
> #endif
>         ipv6_ifa_notify(RTM_NEWADDR, ifp);
> 
> but that doesn't seem that clean to me (apart from the ifdef-y
> messiness of it).  Do you think this "best guess" approach is
> reasonable?

I would definitely ack a patch which removes the macros and the config
option CONFIG_IPV6_OPTIMISTIC_DAD completely.

You also can split the ifdefed part into a small helper function:

static inline bool ipv6_use_optimistic_addr(struct inet6_dev *idev)
{
#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
        return idev->cnf.optimistic_dad && idev->cnf.use_optimistic;
#else
        return false;
#endif
}

...

and then in both locations update the RTM_NEWADDR like this:

addrconf_dad_begin:

if (ipv6_use_optimisitic_addr(idev))
   notify...

and the dad_completed with a negated check. I think this is the easiest
option.

> With just the "use_optimistic" check in addrconf_dad_begin(),
> userspace can still communicate with addresses it gets via
> RTM_NEWADDR, it will just get /two/ such notifications: one when it
> can probably use it and one when it definitely can.
> 
> Thoughts?

In the near future we also must introduce specific DAD events needed for
RFC7217 addresses (to count DAD progress and safe the information per
prefix in user space). I would prefer to keep the logic for RTM_NEWADDR
that the kernel will definitely allow the use of the address but
RTM_NEWADDR should be handled idempotent by user space.

Thanks,
Hannes

^ permalink raw reply

* Re: [stable request <= 3.11] net/mlx4_en: Fix BlueFlame race
From: Amir Vadai @ 2014-10-22 11:15 UTC (permalink / raw)
  To: David S. Miller
  Cc: Or Gerlitz, Cong Wang, Vinson Lee, Or Gerlitz, Jack Morgenstein,
	Eugenia Emantayev, Matan Barak, netdev, saeedm
In-Reply-To: <CAJ3xEMiOEZ9fDtLd4qqPmqAGb3hZneBXidyaxA9Pv1Ki6cKnAg@mail.gmail.com>

On 10/22/2014 7:17 AM, Or Gerlitz wrote:
> On Wed, Oct 22, 2014 at 2:15 AM, Cong Wang <cwang@twopensource.com> wrote:
>> On Sat, Oct 18, 2014 at 2:14 PM, Vinson Lee <vlee@twopensource.com> wrote:
>>> Hi.
>>>
>>> Please consider backporting upstream commit
>>> 2d4b646613d6b12175b017aca18113945af1faf3 "net/mlx4_en: Fix BlueFlame
>>> race" to stable kernels <= 3.11.
>>>
>>
>> David, could you take care of it if you have time? It fixes a real
>> bug in production. :)
> 
> Let out folks here look on that 1st.
> 
> Or.
> 
Verified patch above [1] on 3.10.y (v3.10.58) as is.

Dave, please pull it to stable 3.10

I don't know about an LTS 3.11 kernel - but if there is such an option,
I verified that the patch also applies cleanly on 3.11.y.

Thanks,
Amir

[1] - 2d4b646 ("net/mlx4_en: Fix BlueFlame race")

^ permalink raw reply

* Re: [RFC PATCH v2 net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: Erik Kline @ 2014-10-22 11:15 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, Ben Hutchings, Lorenzo Colitti
In-Reply-To: <1413972774.2337.37.camel@localhost>

On Wed, Oct 22, 2014 at 7:12 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Mi, 2014-10-22 at 14:25 +0900, Erik Kline wrote:
>> Hello.
>>
>> On Wed, Oct 22, 2014 at 4:45 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > Hi,
>> >
>> > On Di, 2014-10-21 at 13:05 +0900, Erik Kline wrote:
>> >> Add a sysctl that causes an interface's optimistic addresses
>> >> to be considered equivalent to other non-deprecated addresses
>> >> for source address selection purposes.  Preferred addresses
>> >> will still take precedence over optimistic addresses, subject
>> >> to other ranking in the source address selection algorithm.
>> >>
>> >> This is useful where different interfaces are connected to
>> >> different networks from different ISPs (e.g., a cell network
>> >> and a home wifi network).
>> >>
>> >> The current behaviour complies with RFC 3484/6724, and it
>> >> makes sense if the host has only one interface, or has
>> >> multiple interfaces on the same network (same or cooperating
>> >> administrative domain(s), but not in the multiple distinct
>> >> networks case.
>> >>
>> >> For example, if a mobile device has an IPv6 address on an LTE
>> >> network and then connects to IPv6-enabled wifi, while the wifi
>> >> IPv6 address is undergoing DAD, IPv6 connections will try use
>> >> the wifi default route with the LTE IPv6 address, and will get
>> >> stuck until they time out.
>> >>
>> >> Also, because optimistic addresses can actually be used, issue
>> >> an RTM_NEWADDR as soon as DAD starts.  If DAD fails an separate
>> >> RTM_DELADDR will be sent.
>> >>
>> >> Also: add an entry in ip-sysctl.txt for optimistic_dad.
>> >>
>> >> Signed-off-by: Erik Kline <ek@google.com>
>> >> ---
>> >>  Documentation/networking/ip-sysctl.txt | 13 ++++++++++++
>> >>  include/linux/ipv6.h                   |  1 +
>> >>  include/uapi/linux/ipv6.h              |  1 +
>> >>  net/ipv6/addrconf.c                    | 36 ++++++++++++++++++++++++++++++++--
>> >>  4 files changed, 49 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>> >> index 0307e28..e03cf49 100644
>> >> --- a/Documentation/networking/ip-sysctl.txt
>> >> +++ b/Documentation/networking/ip-sysctl.txt
>> >> @@ -1452,6 +1452,19 @@ suppress_frag_ndisc - INTEGER
>> >>       1 - (default) discard fragmented neighbor discovery packets
>> >>       0 - allow fragmented neighbor discovery packets
>> >>
>> >> +optimistic_dad - BOOLEAN
>> >> +     Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
>> >> +             0: disabled (default)
>> >> +             1: enabled
>> >> +
>> >> +use_optimistic - BOOLEAN
>> >> +     If enabled, do not classify optimistic addresses as deprecated during
>> >> +     source address selection.  Preferred addresses will still be chosen
>> >> +     before optimistic addresses, subject to other ranking in the source
>> >> +     address selection algorithm.
>> >> +             0: disabled (default)
>> >> +             1: enabled
>> >> +
>> >>  icmp/*:
>> >>  ratelimit - INTEGER
>> >>       Limit the maximal rates for sending ICMPv6 packets.
>> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> >> index ff56053..7121a2e 100644
>> >> --- a/include/linux/ipv6.h
>> >> +++ b/include/linux/ipv6.h
>> >> @@ -42,6 +42,7 @@ struct ipv6_devconf {
>> >>       __s32           accept_ra_from_local;
>> >>  #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
>> >>       __s32           optimistic_dad;
>> >> +     __s32           use_optimistic;
>> >>  #endif
>> >>  #ifdef CONFIG_IPV6_MROUTE
>> >>       __s32           mc_forwarding;
>> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
>> >> index efa2666..e863d08 100644
>> >> --- a/include/uapi/linux/ipv6.h
>> >> +++ b/include/uapi/linux/ipv6.h
>> >> @@ -164,6 +164,7 @@ enum {
>> >>       DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL,
>> >>       DEVCONF_SUPPRESS_FRAG_NDISC,
>> >>       DEVCONF_ACCEPT_RA_FROM_LOCAL,
>> >> +     DEVCONF_USE_OPTIMISTIC,
>> >>       DEVCONF_MAX
>> >>  };
>> >>
>> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> >> index 725c763..c2fddb7 100644
>> >> --- a/net/ipv6/addrconf.c
>> >> +++ b/net/ipv6/addrconf.c
>> >> @@ -1169,6 +1169,9 @@ enum {
>> >>       IPV6_SADDR_RULE_LABEL,
>> >>       IPV6_SADDR_RULE_PRIVACY,
>> >>       IPV6_SADDR_RULE_ORCHID,
>> >> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
>> >> +     IPV6_SADDR_RULE_NOT_OPTIMISTIC,
>> >> +#endif
>> >>       IPV6_SADDR_RULE_PREFIX,
>> >>       IPV6_SADDR_RULE_MAX
>> >>  };
>> >> @@ -1257,10 +1260,17 @@ static int ipv6_get_saddr_eval(struct net *net,
>> >>               score->scopedist = ret;
>> >>               break;
>> >>       case IPV6_SADDR_RULE_PREFERRED:
>> >> +         {
>> >>               /* Rule 3: Avoid deprecated and optimistic addresses */
>> >> +             u8 avoid = IFA_F_DEPRECATED;
>> >> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
>> >> +             if (!score->ifa->idev->cnf.use_optimistic)
>> >> +                     avoid |= IFA_F_OPTIMISTIC;
>> >> +#endif
>> >>               ret = ipv6_saddr_preferred(score->addr_type) ||
>> >> -                   !(score->ifa->flags & (IFA_F_DEPRECATED|IFA_F_OPTIMISTIC));
>> >> +                   !(score->ifa->flags & avoid);
>> >>               break;
>> >> +         }
>> >>  #ifdef CONFIG_IPV6_MIP6
>> >>       case IPV6_SADDR_RULE_HOA:
>> >>           {
>> >> @@ -1299,6 +1309,14 @@ static int ipv6_get_saddr_eval(struct net *net,
>> >>               ret = !(ipv6_addr_orchid(&score->ifa->addr) ^
>> >>                       ipv6_addr_orchid(dst->addr));
>> >>               break;
>> >> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
>> >> +     case IPV6_SADDR_RULE_NOT_OPTIMISTIC:
>> >> +             /* Optimistic addresses still have lower precedence than other
>> >> +              * preferred addresses.
>> >> +              */
>> >> +             ret = !(score->ifa->flags & IFA_F_OPTIMISTIC);
>> >> +             break;
>> >> +#endif
>> >
>> > I wonder a bit why this rule is not directly ordered after
>> > IPV6_SADDR_RULE_PREFERRED? This would e.g. matter for privacy addresses.
>>
>> Privacy addresses ("tempaddrs") that win in earlier checks are
>> preferred before optimistic in this code (i.e. a tempaddr on the same
>> outgoing interface is preferred before an optimistic address).
>>
>> Similarly, a non-tentative non-privacy address (same outgoing
>> interface, same label, ...) will match before an optimistic address,
>> but only until DAD completes and the address is no longer optimistic.
>> I think this is in keeping with the spirit of the RFC 3484/6724 rules.
>
> Oh yes, I see. I had the evaluation order messed up. ;)
>
> So the question I should be asking would be. AFAIR optimistic addresses
> should be handled like deprecated ones, so I am a bit concerned adding a
> non-conditional rule before the RULE_PREFIX check.
>
> Shouldn't we only break the tie *after* longest prefix match then? If
> you do that before longest prefix match I would prefer ret being masked
> by use_optimisitic flag.

There was some text suggesting that the longest prefix could be
ignored (like for DNS load-balancing reasons).  But I think in this
particular case it doesn't matter, so I'll move it after.  Ack.

>> After there's an RFC 7217 implementation, EUI-64-based SLAAC could be
>> disabled by folks.
>
> Ack.
>
>>
>> >>       case IPV6_SADDR_RULE_PREFIX:
>> >>               /* Rule 8: Use longest matching prefix */
>> >>               ret = ipv6_addr_diff(&score->ifa->addr, dst->addr);
>> >> @@ -3222,8 +3240,13 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
>> >>        * Optimistic nodes can start receiving
>> >>        * Frames right away
>> >>        */
>> >> -     if (ifp->flags & IFA_F_OPTIMISTIC)
>> >> +     if (ifp->flags & IFA_F_OPTIMISTIC) {
>> >>               ip6_ins_rt(ifp->rt);
>> >> +             /* Because optimistic nodes can receive frames, notify
>> >> +              * listeners. If DAD fails, RTM_DELADDR is sent.
>> >> +              */
>> >> +             ipv6_ifa_notify(RTM_NEWADDR, ifp);
>> >> +     }
>> >
>> > I wonder if we can now delete the ipv6_ifa_notify(RTM_NEWADDR, ifp) in
>> > addrconf_dad_completed.
>>
>> I don't know what everyone's general preference would be, but mine
>> would be to err on the side of notifying on state changes.  It seems
>> harmless to me to keep it in, and something in userspace might want to
>> know if/when DAD completes.
>
> Userspace expects to communicate with an address which gets announced
> via RTM_NEWADDR, so I would make the RTM_NEWADDR notifications
> conditional on use_optimistic flag in both, the completed and the
> dad_begin function. This sounds like the best option to me, no?

That's easy enough to do in addrconf_dad_begin().  Unfortunately, by
the time we get to addrconf_dad_completed() the IFA_F_OPTIMISTIC flag
has already been cleared.

I have a version that attempts to take its best guess as to whether an
RTM_NEWADDR _should_ have already been sent--something like:

#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
        // We probably already sent a notification in addrconf_dad_begin().
        if (!ifp->idev->cnf.optimistic_dad || !ifp->idev->cnf.use_optimistic)
#endif
        ipv6_ifa_notify(RTM_NEWADDR, ifp);

but that doesn't seem that clean to me (apart from the ifdef-y
messiness of it).  Do you think this "best guess" approach is
reasonable?

With just the "use_optimistic" check in addrconf_dad_begin(),
userspace can still communicate with addresses it gets via
RTM_NEWADDR, it will just get /two/ such notifications: one when it
can probably use it and one when it definitely can.

Thoughts?

^ permalink raw reply

* [PATCH 1/6] qeth: qeth_core_main make local functions static
From: Frank Blaschka @ 2014-10-22 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Thomas Richter, Frank Blaschka
In-Reply-To: <1413973087-18740-1-git-send-email-blaschka@linux.vnet.ibm.com>

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

This patch makes some global functions static and removes
the prototypes from the header file.
Also function qeth_query_card_info is not exported anymore,
there is no external user for it, this function should never
have been exported in the first place.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_core.h      | 16 ----------------
 drivers/s390/net/qeth_core_main.c | 18 +++++++++++++++---
 2 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index e7646ce..7a8bb9f 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -380,11 +380,6 @@ enum qeth_header_ids {
 #define QETH_HDR_EXT_CSUM_TRANSP_REQ  0x20
 #define QETH_HDR_EXT_UDP	      0x40 /*bit off for TCP*/
 
-static inline int qeth_is_last_sbale(struct qdio_buffer_element *sbale)
-{
-	return (sbale->eflags & SBAL_EFLAGS_LAST_ENTRY);
-}
-
 enum qeth_qdio_buffer_states {
 	/*
 	 * inbound: read out by driver; owned by hardware in order to be filled
@@ -843,13 +838,6 @@ struct qeth_trap_id {
 /*some helper functions*/
 #define QETH_CARD_IFNAME(card) (((card)->dev)? (card)->dev->name : "")
 
-static inline struct qeth_card *CARD_FROM_CDEV(struct ccw_device *cdev)
-{
-	struct qeth_card *card = dev_get_drvdata(&((struct ccwgroup_device *)
-		dev_get_drvdata(&cdev->dev))->dev);
-	return card;
-}
-
 static inline int qeth_get_micros(void)
 {
 	return (int) (get_tod_clock() >> 12);
@@ -894,7 +882,6 @@ const char *qeth_get_cardname_short(struct qeth_card *);
 int qeth_realloc_buffer_pool(struct qeth_card *, int);
 int qeth_core_load_discipline(struct qeth_card *, enum qeth_discipline_id);
 void qeth_core_free_discipline(struct qeth_card *);
-void qeth_buffer_reclaim_work(struct work_struct *);
 
 /* exports for qeth discipline device drivers */
 extern struct qeth_card_list_struct qeth_core_card_list;
@@ -913,7 +900,6 @@ int qeth_core_hardsetup_card(struct qeth_card *);
 void qeth_print_status_message(struct qeth_card *);
 int qeth_init_qdio_queues(struct qeth_card *);
 int qeth_send_startlan(struct qeth_card *);
-int qeth_send_stoplan(struct qeth_card *);
 int qeth_send_ipa_cmd(struct qeth_card *, struct qeth_cmd_buffer *,
 		  int (*reply_cb)
 		  (struct qeth_card *, struct qeth_reply *, unsigned long),
@@ -954,8 +940,6 @@ int qeth_snmp_command(struct qeth_card *, char __user *);
 int qeth_query_oat_command(struct qeth_card *, char __user *);
 int qeth_query_switch_attributes(struct qeth_card *card,
 				  struct qeth_switch_info *sw_info);
-int qeth_query_card_info(struct qeth_card *card,
-	struct carrier_info *carrier_info);
 int qeth_send_control_data(struct qeth_card *, int, struct qeth_cmd_buffer *,
 	int (*reply_cb)(struct qeth_card *, struct qeth_reply*, unsigned long),
 	void *reply_param);
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index fd22c81..098bb0f 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -718,6 +718,13 @@ static int qeth_check_idx_response(struct qeth_card *card,
 	return 0;
 }
 
+static struct qeth_card *CARD_FROM_CDEV(struct ccw_device *cdev)
+{
+	struct qeth_card *card = dev_get_drvdata(&((struct ccwgroup_device *)
+		dev_get_drvdata(&cdev->dev))->dev);
+	return card;
+}
+
 static void qeth_setup_ccw(struct qeth_channel *channel, unsigned char *iob,
 		__u32 len)
 {
@@ -1431,6 +1438,7 @@ static void qeth_start_kernel_thread(struct work_struct *work)
 	}
 }
 
+static void qeth_buffer_reclaim_work(struct work_struct *);
 static int qeth_setup_card(struct qeth_card *card)
 {
 
@@ -3232,7 +3240,7 @@ int qeth_check_qdio_errors(struct qeth_card *card, struct qdio_buffer *buf,
 }
 EXPORT_SYMBOL_GPL(qeth_check_qdio_errors);
 
-void qeth_buffer_reclaim_work(struct work_struct *work)
+static void qeth_buffer_reclaim_work(struct work_struct *work)
 {
 	struct qeth_card *card = container_of(work, struct qeth_card,
 		buffer_reclaim_work.work);
@@ -4717,7 +4725,7 @@ static int qeth_query_card_info_cb(struct qeth_card *card,
 	return 0;
 }
 
-int qeth_query_card_info(struct qeth_card *card,
+static int qeth_query_card_info(struct qeth_card *card,
 				struct carrier_info *carrier_info)
 {
 	struct qeth_cmd_buffer *iob;
@@ -4730,7 +4738,6 @@ int qeth_query_card_info(struct qeth_card *card,
 	return qeth_send_ipa_cmd(card, iob, qeth_query_card_info_cb,
 					(void *)carrier_info);
 }
-EXPORT_SYMBOL_GPL(qeth_query_card_info);
 
 static inline int qeth_get_qdio_q_format(struct qeth_card *card)
 {
@@ -5113,6 +5120,11 @@ static inline int qeth_create_skb_frag(struct qeth_qdio_buffer *qethbuffer,
 	return 0;
 }
 
+static inline int qeth_is_last_sbale(struct qdio_buffer_element *sbale)
+{
+	return (sbale->eflags & SBAL_EFLAGS_LAST_ENTRY);
+}
+
 struct sk_buff *qeth_core_get_next_skb(struct qeth_card *card,
 		struct qeth_qdio_buffer *qethbuffer,
 		struct qdio_buffer_element **__element, int *__offset,
-- 
1.8.5.5

^ permalink raw reply related

* [PATCH 6/6] ctcm: replace sscanf by kstrto function
From: Frank Blaschka @ 2014-10-22 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Thomas Richter, Frank Blaschka
In-Reply-To: <1413973087-18740-1-git-send-email-blaschka@linux.vnet.ibm.com>

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

Since a single integer value is read from the supplied buffer
use the kstrto functions instead of sscanf.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
---
 drivers/s390/net/ctcm_sysfs.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/net/ctcm_sysfs.c b/drivers/s390/net/ctcm_sysfs.c
index 6bcfbbb..47773c4 100644
--- a/drivers/s390/net/ctcm_sysfs.c
+++ b/drivers/s390/net/ctcm_sysfs.c
@@ -44,8 +44,8 @@ static ssize_t ctcm_buffer_write(struct device *dev,
 		return -ENODEV;
 	}
 
-	rc = sscanf(buf, "%u", &bs1);
-	if (rc != 1)
+	rc = kstrtouint(buf, 0, &bs1);
+	if (rc)
 		goto einval;
 	if (bs1 > CTCM_BUFSIZE_LIMIT)
 					goto einval;
@@ -151,8 +151,8 @@ static ssize_t ctcm_proto_store(struct device *dev,
 
 	if (!priv)
 		return -ENODEV;
-	rc = sscanf(buf, "%d", &value);
-	if ((rc != 1) ||
+	rc = kstrtoint(buf, 0, &value);
+	if (rc ||
 	    !((value == CTCM_PROTO_S390)  ||
 	      (value == CTCM_PROTO_LINUX) ||
 	      (value == CTCM_PROTO_MPC) ||
-- 
1.8.5.5

^ permalink raw reply related

* [PATCH 3/6] qeth: make local functions static in qeth_l3 module
From: Frank Blaschka @ 2014-10-22 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Thomas Richter, Frank Blaschka
In-Reply-To: <1413973087-18740-1-git-send-email-blaschka@linux.vnet.ibm.com>

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

This patch makes 4 local functions static and removes
the prototypes from the header file.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_l3.h      | 4 ----
 drivers/s390/net/qeth_l3_main.c | 8 ++++----
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/s390/net/qeth_l3.h b/drivers/s390/net/qeth_l3.h
index 29c1c00..551a4b4 100644
--- a/drivers/s390/net/qeth_l3.h
+++ b/drivers/s390/net/qeth_l3.h
@@ -42,10 +42,6 @@ struct qeth_ipato_entry {
 };
 
 
-void qeth_l3_ipaddr4_to_string(const __u8 *, char *);
-int qeth_l3_string_to_ipaddr4(const char *, __u8 *);
-void qeth_l3_ipaddr6_to_string(const __u8 *, char *);
-int qeth_l3_string_to_ipaddr6(const char *, __u8 *);
 void qeth_l3_ipaddr_to_string(enum qeth_prot_versions, const __u8 *, char *);
 int qeth_l3_string_to_ipaddr(const char *, enum qeth_prot_versions, __u8 *);
 int qeth_l3_create_device_attributes(struct device *);
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 857a990..625227ad 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -55,12 +55,12 @@ static int qeth_l3_isxdigit(char *buf)
 	return 1;
 }
 
-void qeth_l3_ipaddr4_to_string(const __u8 *addr, char *buf)
+static void qeth_l3_ipaddr4_to_string(const __u8 *addr, char *buf)
 {
 	sprintf(buf, "%i.%i.%i.%i", addr[0], addr[1], addr[2], addr[3]);
 }
 
-int qeth_l3_string_to_ipaddr4(const char *buf, __u8 *addr)
+static int qeth_l3_string_to_ipaddr4(const char *buf, __u8 *addr)
 {
 	int count = 0, rc = 0;
 	unsigned int in[4];
@@ -78,12 +78,12 @@ int qeth_l3_string_to_ipaddr4(const char *buf, __u8 *addr)
 	return 0;
 }
 
-void qeth_l3_ipaddr6_to_string(const __u8 *addr, char *buf)
+static void qeth_l3_ipaddr6_to_string(const __u8 *addr, char *buf)
 {
 	sprintf(buf, "%pI6", addr);
 }
 
-int qeth_l3_string_to_ipaddr6(const char *buf, __u8 *addr)
+static int qeth_l3_string_to_ipaddr6(const char *buf, __u8 *addr)
 {
 	const char *end, *end_tmp, *start;
 	__u16 *in;
-- 
1.8.5.5

^ permalink raw reply related

* [PATCH 2/6] qeth: fix some trace formating issues
From: Frank Blaschka @ 2014-10-22 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Thomas Richter, Frank Blaschka
In-Reply-To: <1413973087-18740-1-git-send-email-blaschka@linux.vnet.ibm.com>

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

This patch fixes trace formatting issues using the
QETH_CARD_TEXT_ macro. The total size of each trace entry
is 8 bytes. Some of the sprintf formats exceed these 8
bytes (for example using abcd:%d and the converted value
needs more than 3 bytes). The solution is to shorten the
text prepending the value or use a different format (%x).

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 6 +++---
 drivers/s390/net/qeth_l2_main.c   | 2 +-
 drivers/s390/net/qeth_l3_main.c   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 098bb0f..f407e37 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -4134,7 +4134,7 @@ static int qeth_setadp_promisc_mode_cb(struct qeth_card *card,
 
 	qeth_default_setadapterparms_cb(card, reply, (unsigned long)cmd);
 	if (cmd->hdr.return_code) {
-		QETH_CARD_TEXT_(card, 4, "prmrc%2.2x", cmd->hdr.return_code);
+		QETH_CARD_TEXT_(card, 4, "prmrc%x", cmd->hdr.return_code);
 		setparms->data.mode = SET_PROMISC_MODE_OFF;
 	}
 	card->info.promisc_mode = setparms->data.mode;
@@ -4501,13 +4501,13 @@ static int qeth_snmp_command_cb(struct qeth_card *card,
 	snmp = &cmd->data.setadapterparms.data.snmp;
 
 	if (cmd->hdr.return_code) {
-		QETH_CARD_TEXT_(card, 4, "scer1%i", cmd->hdr.return_code);
+		QETH_CARD_TEXT_(card, 4, "scer1%x", cmd->hdr.return_code);
 		return 0;
 	}
 	if (cmd->data.setadapterparms.hdr.return_code) {
 		cmd->hdr.return_code =
 			cmd->data.setadapterparms.hdr.return_code;
-		QETH_CARD_TEXT_(card, 4, "scer2%i", cmd->hdr.return_code);
+		QETH_CARD_TEXT_(card, 4, "scer2%x", cmd->hdr.return_code);
 		return 0;
 	}
 	data_len = *((__u16 *)QETH_IPA_PDU_LEN_PDU1(data));
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index c2679bf..d02cd1a 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1512,7 +1512,7 @@ static void qeth_bridge_state_change(struct qeth_card *card,
 
 	QETH_CARD_TEXT(card, 2, "brstchng");
 	if (qports->entry_length != sizeof(struct qeth_sbp_port_entry)) {
-		QETH_CARD_TEXT_(card, 2, "BPsz%.8d", qports->entry_length);
+		QETH_CARD_TEXT_(card, 2, "BPsz%04x", qports->entry_length);
 		return;
 	}
 	extrasize = sizeof(struct qeth_sbp_port_entry) * qports->num_entries;
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index afebb97..857a990 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2502,7 +2502,7 @@ static int qeth_l3_arp_query(struct qeth_card *card, char __user *udata)
 			rc = -EFAULT;
 			goto free_and_out;
 		}
-		QETH_CARD_TEXT_(card, 4, "qacts");
+		QETH_CARD_TEXT(card, 4, "qacts");
 	}
 free_and_out:
 	kfree(qinfo.udata);
-- 
1.8.5.5

^ permalink raw reply related

* [PATCH 5/6] lcs: replace sscanf by kstrto function
From: Frank Blaschka @ 2014-10-22 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Thomas Richter, Frank Blaschka
In-Reply-To: <1413973087-18740-1-git-send-email-blaschka@linux.vnet.ibm.com>

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

Since a single integer value is read from the supplied buffer
use the kstrto functions instead of sscanf.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
---
 drivers/s390/net/lcs.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/net/lcs.c b/drivers/s390/net/lcs.c
index 0a7d87c..92190aa 100644
--- a/drivers/s390/net/lcs.c
+++ b/drivers/s390/net/lcs.c
@@ -1943,15 +1943,16 @@ static ssize_t
 lcs_portno_store (struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
 {
         struct lcs_card *card;
-	int value, rc;
+	int rc;
+	s16 value;
 
 	card = dev_get_drvdata(dev);
 
         if (!card)
                 return 0;
 
-	rc = sscanf(buf, "%d", &value);
-	if (rc != 1)
+	rc = kstrtos16(buf, 0, &value);
+	if (rc)
 		return -EINVAL;
         /* TODO: sanity checks */
         card->portno = value;
@@ -2007,8 +2008,8 @@ lcs_timeout_store (struct device *dev, struct device_attribute *attr, const char
         if (!card)
                 return 0;
 
-	rc = sscanf(buf, "%u", &value);
-	if (rc != 1)
+	rc = kstrtouint(buf, 0, &value);
+	if (rc)
 		return -EINVAL;
         /* TODO: sanity checks */
         card->lancmd_timeout = value;
-- 
1.8.5.5

^ permalink raw reply related

* [PATCH 0/6 resend] s390: network patches for net-next
From: Frank Blaschka @ 2014-10-22 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Frank Blaschka

Hi Dave,

looks like there was a problem with my previous posting. Hope this time
it will work. Sorry for any inconvenience. The patches are mostly
cleanups and small enhancements for net-next

Thanks,
        Frank

Thomas Richter (6):
  qeth: qeth_core_main make local functions static
  qeth: fix some trace formating issues
  qeth: make local functions static in qeth_l3 module
  qeth: s390 ethernet device driver dependency
  lcs: replace sscanf by kstrto function
  ctcm: replace sscanf by kstrto function

 drivers/s390/net/Kconfig          |  2 +-
 drivers/s390/net/ctcm_sysfs.c     |  8 ++++----
 drivers/s390/net/lcs.c            | 11 ++++++-----
 drivers/s390/net/qeth_core.h      | 16 ----------------
 drivers/s390/net/qeth_core_main.c | 24 ++++++++++++++++++------
 drivers/s390/net/qeth_l2_main.c   |  2 +-
 drivers/s390/net/qeth_l3.h        |  4 ----
 drivers/s390/net/qeth_l3_main.c   | 10 +++++-----
 8 files changed, 35 insertions(+), 42 deletions(-)

-- 
1.8.5.5

^ permalink raw reply

* [PATCH 4/6] qeth: s390 ethernet device driver dependency
From: Frank Blaschka @ 2014-10-22 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Thomas Richter, Frank Blaschka
In-Reply-To: <1413973087-18740-1-git-send-email-blaschka@linux.vnet.ibm.com>

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

Compile the s390 10GB ethernet device driver only when
ETHERNET has been defined in the kernel configuration file.
Right now the qeth device driver is always built regardless
of which network connectivity is active.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
---
 drivers/s390/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/net/Kconfig b/drivers/s390/net/Kconfig
index 8b3f559..f1b5111 100644
--- a/drivers/s390/net/Kconfig
+++ b/drivers/s390/net/Kconfig
@@ -71,7 +71,7 @@ config CLAW
 config QETH
 	def_tristate y
 	prompt "Gigabit Ethernet device support"
-	depends on CCW && NETDEVICES && IP_MULTICAST && QDIO
+	depends on CCW && NETDEVICES && IP_MULTICAST && QDIO && ETHERNET
 	help
 	  This driver supports the IBM System z OSA Express adapters
 	  in QDIO mode (all media types), HiperSockets interfaces and z/VM
-- 
1.8.5.5

^ permalink raw reply related

* [PATCHv2 net-next] xen-netfront: always keep the Rx ring full of requests
From: David Vrabel @ 2014-10-22 10:17 UTC (permalink / raw)
  To: netdev; +Cc: David Vrabel, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky

A full Rx ring only requires 1 MiB of memory.  This is not enough
memory that it is useful to dynamically scale the number of Rx
requests in the ring based on traffic rates, because:

a) Even the full 1 MiB is a tiny fraction of a typically modern Linux
   VM (for example, the AWS micro instance still has 1 GiB of memory).

b) Netfront would have used up to 1 MiB already even with moderate
   data rates (there was no adjustment of target based on memory
   pressure).

c) Small VMs are going to typically have one VCPU and hence only one
   queue.

Keeping the ring full of Rx requests handles bursty traffic better
than trying to converge on an optimal number of requests to keep
filled.

On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
traffic are expected to be higher.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
Changes in v2:
- Keep rxbuf_* sysfs files.
---
 drivers/net/xen-netfront.c |  255 +++++++++++---------------------------------
 1 file changed, 63 insertions(+), 192 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index cca8713..83e39a1 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -77,7 +77,9 @@ struct netfront_cb {
 
 #define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
 #define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
-#define TX_MAX_TARGET min_t(int, NET_TX_RING_SIZE, 256)
+
+/* Minimum number of Rx slots (includes slot for GSO metadata). */
+#define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
 
 /* Queue name is interface name with "-qNNN" appended */
 #define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
@@ -137,13 +139,6 @@ struct netfront_queue {
 	struct xen_netif_rx_front_ring rx;
 	int rx_ring_ref;
 
-	/* Receive-ring batched refills. */
-#define RX_MIN_TARGET 8
-#define RX_DFL_MIN_TARGET 64
-#define RX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
-	unsigned rx_min_target, rx_max_target, rx_target;
-	struct sk_buff_head rx_batch;
-
 	struct timer_list rx_refill_timer;
 
 	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
@@ -251,7 +246,7 @@ static void rx_refill_timeout(unsigned long data)
 static int netfront_tx_slot_available(struct netfront_queue *queue)
 {
 	return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) <
-		(TX_MAX_TARGET - MAX_SKB_FRAGS - 2);
+		(NET_TX_RING_SIZE - MAX_SKB_FRAGS - 2);
 }
 
 static void xennet_maybe_wake_tx(struct netfront_queue *queue)
@@ -265,77 +260,55 @@ static void xennet_maybe_wake_tx(struct netfront_queue *queue)
 		netif_tx_wake_queue(netdev_get_tx_queue(dev, queue->id));
 }
 
-static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
+
+static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 {
-	unsigned short id;
 	struct sk_buff *skb;
 	struct page *page;
-	int i, batch_target, notify;
-	RING_IDX req_prod = queue->rx.req_prod_pvt;
-	grant_ref_t ref;
-	unsigned long pfn;
-	void *vaddr;
-	struct xen_netif_rx_request *req;
 
-	if (unlikely(!netif_carrier_ok(queue->info->netdev)))
-		return;
+	skb = __netdev_alloc_skb(queue->info->netdev,
+				 RX_COPY_THRESHOLD + NET_IP_ALIGN,
+				 GFP_ATOMIC | __GFP_NOWARN);
+	if (unlikely(!skb))
+		return NULL;
 
-	/*
-	 * Allocate skbuffs greedily, even though we batch updates to the
-	 * receive ring. This creates a less bursty demand on the memory
-	 * allocator, so should reduce the chance of failed allocation requests
-	 * both for ourself and for other kernel subsystems.
-	 */
-	batch_target = queue->rx_target - (req_prod - queue->rx.rsp_cons);
-	for (i = skb_queue_len(&queue->rx_batch); i < batch_target; i++) {
-		skb = __netdev_alloc_skb(queue->info->netdev,
-					 RX_COPY_THRESHOLD + NET_IP_ALIGN,
-					 GFP_ATOMIC | __GFP_NOWARN);
-		if (unlikely(!skb))
-			goto no_skb;
-
-		/* Align ip header to a 16 bytes boundary */
-		skb_reserve(skb, NET_IP_ALIGN);
-
-		page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
-		if (!page) {
-			kfree_skb(skb);
-no_skb:
-			/* Could not allocate any skbuffs. Try again later. */
-			mod_timer(&queue->rx_refill_timer,
-				  jiffies + (HZ/10));
-
-			/* Any skbuffs queued for refill? Force them out. */
-			if (i != 0)
-				goto refill;
-			break;
-		}
-
-		skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE);
-		__skb_queue_tail(&queue->rx_batch, skb);
+	page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+	if (!page) {
+		kfree_skb(skb);
+		return NULL;
 	}
+	skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE);
 
-	/* Is the batch large enough to be worthwhile? */
-	if (i < (queue->rx_target/2)) {
-		if (req_prod > queue->rx.sring->req_prod)
-			goto push;
+	/* Align ip header to a 16 bytes boundary */
+	skb_reserve(skb, NET_IP_ALIGN);
+	skb->dev = queue->info->netdev;
+
+	return skb;
+}
+	
+
+static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
+{
+	RING_IDX req_prod = queue->rx.req_prod_pvt;
+	int notify;
+
+	if (unlikely(!netif_carrier_ok(queue->info->netdev)))
 		return;
-	}
 
-	/* Adjust our fill target if we risked running out of buffers. */
-	if (((req_prod - queue->rx.sring->rsp_prod) < (queue->rx_target / 4)) &&
-	    ((queue->rx_target *= 2) > queue->rx_max_target))
-		queue->rx_target = queue->rx_max_target;
+	for (req_prod = queue->rx.req_prod_pvt;
+	     req_prod - queue->rx.rsp_cons < NET_RX_RING_SIZE;
+	     req_prod++) {
+		struct sk_buff *skb;
+		unsigned short id;
+		grant_ref_t ref;
+		unsigned long pfn;
+		struct xen_netif_rx_request *req;
 
- refill:
-	for (i = 0; ; i++) {
-		skb = __skb_dequeue(&queue->rx_batch);
-		if (skb == NULL)
+		skb = xennet_alloc_one_rx_buffer(queue);
+		if (!skb)
 			break;
 
-		skb->dev = queue->info->netdev;
-
-		id = xennet_rxidx(req_prod + i);
+		id = xennet_rxidx(req_prod);
 
 		BUG_ON(queue->rx_skbs[id]);
 		queue->rx_skbs[id] = skb;
@@ -345,9 +318,8 @@ no_skb:
 		queue->grant_rx_ref[id] = ref;
 
 		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
-		vaddr = page_address(skb_frag_page(&skb_shinfo(skb)->frags[0]));
 
-		req = RING_GET_REQUEST(&queue->rx, req_prod + i);
+		req = RING_GET_REQUEST(&queue->rx, req_prod);
 		gnttab_grant_foreign_access_ref(ref,
 						queue->info->xbdev->otherend_id,
 						pfn_to_mfn(pfn),
@@ -357,11 +329,16 @@ no_skb:
 		req->gref = ref;
 	}
 
+	queue->rx.req_prod_pvt = req_prod;
+
+	/* Not enough requests? Try again later. */
+	if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
+		mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
+		return;
+	}
+
 	wmb();		/* barrier so backend seens requests */
 
-	/* Above is a suitable barrier to ensure backend will see requests. */
-	queue->rx.req_prod_pvt = req_prod + i;
- push:
 	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->rx, notify);
 	if (notify)
 		notify_remote_via_irq(queue->rx_irq);
@@ -1070,13 +1047,6 @@ err:
 
 	work_done -= handle_incoming_queue(queue, &rxq);
 
-	/* If we get a callback with very few responses, reduce fill target. */
-	/* NB. Note exponential increase, linear decrease. */
-	if (((queue->rx.req_prod_pvt - queue->rx.sring->rsp_prod) >
-	     ((3*queue->rx_target) / 4)) &&
-	    (--queue->rx_target < queue->rx_min_target))
-		queue->rx_target = queue->rx_min_target;
-
 	xennet_alloc_rx_buffers(queue);
 
 	if (work_done < budget) {
@@ -1643,11 +1613,6 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	spin_lock_init(&queue->tx_lock);
 	spin_lock_init(&queue->rx_lock);
 
-	skb_queue_head_init(&queue->rx_batch);
-	queue->rx_target     = RX_DFL_MIN_TARGET;
-	queue->rx_min_target = RX_DFL_MIN_TARGET;
-	queue->rx_max_target = RX_MAX_TARGET;
-
 	init_timer(&queue->rx_refill_timer);
 	queue->rx_refill_timer.data = (unsigned long)queue;
 	queue->rx_refill_timer.function = rx_refill_timeout;
@@ -1670,7 +1635,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	}
 
 	/* A grant for every tx ring slot */
-	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
+	if (gnttab_alloc_grant_references(NET_TX_RING_SIZE,
 					  &queue->gref_tx_head) < 0) {
 		pr_alert("can't alloc tx grant refs\n");
 		err = -ENOMEM;
@@ -1678,7 +1643,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	}
 
 	/* A grant for every rx ring slot */
-	if (gnttab_alloc_grant_references(RX_MAX_TARGET,
+	if (gnttab_alloc_grant_references(NET_RX_RING_SIZE,
 					  &queue->gref_rx_head) < 0) {
 		pr_alert("can't alloc rx grant refs\n");
 		err = -ENOMEM;
@@ -2146,129 +2111,35 @@ static const struct ethtool_ops xennet_ethtool_ops =
 };
 
 #ifdef CONFIG_SYSFS
-static ssize_t show_rxbuf_min(struct device *dev,
-			      struct device_attribute *attr, char *buf)
-{
-	struct net_device *netdev = to_net_dev(dev);
-	struct netfront_info *info = netdev_priv(netdev);
-	unsigned int num_queues = netdev->real_num_tx_queues;
-
-	if (num_queues)
-		return sprintf(buf, "%u\n", info->queues[0].rx_min_target);
-	else
-		return sprintf(buf, "%u\n", RX_MIN_TARGET);
-}
-
-static ssize_t store_rxbuf_min(struct device *dev,
-			       struct device_attribute *attr,
-			       const char *buf, size_t len)
+static ssize_t show_rxbuf(struct device *dev,
+			  struct device_attribute *attr, char *buf)
 {
-	struct net_device *netdev = to_net_dev(dev);
-	struct netfront_info *np = netdev_priv(netdev);
-	unsigned int num_queues = netdev->real_num_tx_queues;
-	char *endp;
-	unsigned long target;
-	unsigned int i;
-	struct netfront_queue *queue;
-
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
-	target = simple_strtoul(buf, &endp, 0);
-	if (endp == buf)
-		return -EBADMSG;
-
-	if (target < RX_MIN_TARGET)
-		target = RX_MIN_TARGET;
-	if (target > RX_MAX_TARGET)
-		target = RX_MAX_TARGET;
-
-	for (i = 0; i < num_queues; ++i) {
-		queue = &np->queues[i];
-		spin_lock_bh(&queue->rx_lock);
-		if (target > queue->rx_max_target)
-			queue->rx_max_target = target;
-		queue->rx_min_target = target;
-		if (target > queue->rx_target)
-			queue->rx_target = target;
-
-		xennet_alloc_rx_buffers(queue);
-
-		spin_unlock_bh(&queue->rx_lock);
-	}
-	return len;
-}
-
-static ssize_t show_rxbuf_max(struct device *dev,
-			      struct device_attribute *attr, char *buf)
-{
-	struct net_device *netdev = to_net_dev(dev);
-	struct netfront_info *info = netdev_priv(netdev);
-	unsigned int num_queues = netdev->real_num_tx_queues;
-
-	if (num_queues)
-		return sprintf(buf, "%u\n", info->queues[0].rx_max_target);
-	else
-		return sprintf(buf, "%u\n", RX_MAX_TARGET);
+	return sprintf(buf, "%lu\n", NET_RX_RING_SIZE);
 }
 
-static ssize_t store_rxbuf_max(struct device *dev,
-			       struct device_attribute *attr,
-			       const char *buf, size_t len)
+static ssize_t store_rxbuf(struct device *dev,
+			   struct device_attribute *attr,
+			   const char *buf, size_t len)
 {
-	struct net_device *netdev = to_net_dev(dev);
-	struct netfront_info *np = netdev_priv(netdev);
-	unsigned int num_queues = netdev->real_num_tx_queues;
 	char *endp;
 	unsigned long target;
-	unsigned int i = 0;
-	struct netfront_queue *queue = NULL;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
-
+ 
 	target = simple_strtoul(buf, &endp, 0);
 	if (endp == buf)
 		return -EBADMSG;
 
-	if (target < RX_MIN_TARGET)
-		target = RX_MIN_TARGET;
-	if (target > RX_MAX_TARGET)
-		target = RX_MAX_TARGET;
-
-	for (i = 0; i < num_queues; ++i) {
-		queue = &np->queues[i];
-		spin_lock_bh(&queue->rx_lock);
-		if (target < queue->rx_min_target)
-			queue->rx_min_target = target;
-		queue->rx_max_target = target;
-		if (target < queue->rx_target)
-			queue->rx_target = target;
-
-		xennet_alloc_rx_buffers(queue);
+	/* rxbuf_min and rxbuf_max are no longer configurable. */
 
-		spin_unlock_bh(&queue->rx_lock);
-	}
 	return len;
 }
 
-static ssize_t show_rxbuf_cur(struct device *dev,
-			      struct device_attribute *attr, char *buf)
-{
-	struct net_device *netdev = to_net_dev(dev);
-	struct netfront_info *info = netdev_priv(netdev);
-	unsigned int num_queues = netdev->real_num_tx_queues;
-
-	if (num_queues)
-		return sprintf(buf, "%u\n", info->queues[0].rx_target);
-	else
-		return sprintf(buf, "0\n");
-}
-
 static struct device_attribute xennet_attrs[] = {
-	__ATTR(rxbuf_min, S_IRUGO|S_IWUSR, show_rxbuf_min, store_rxbuf_min),
-	__ATTR(rxbuf_max, S_IRUGO|S_IWUSR, show_rxbuf_max, store_rxbuf_max),
-	__ATTR(rxbuf_cur, S_IRUGO, show_rxbuf_cur, NULL),
+	__ATTR(rxbuf_min, S_IRUGO|S_IWUSR, show_rxbuf, store_rxbuf),
+	__ATTR(rxbuf_max, S_IRUGO|S_IWUSR, show_rxbuf, store_rxbuf),
+	__ATTR(rxbuf_cur, S_IRUGO, show_rxbuf, NULL),
 };
 
 static int xennet_sysfs_addif(struct net_device *netdev)
-- 
1.7.10.4

^ permalink raw reply related

* Re: [RFC PATCH v2 net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: Hannes Frederic Sowa @ 2014-10-22 10:12 UTC (permalink / raw)
  To: Erik Kline; +Cc: netdev, Ben Hutchings, Lorenzo Colitti
In-Reply-To: <CAAedzxpsaEGt+oO0xJWLxcjapdRYauHP4rAsKkf_skcdzbGEmg@mail.gmail.com>

On Mi, 2014-10-22 at 14:25 +0900, Erik Kline wrote:
> Hello.
> 
> On Wed, Oct 22, 2014 at 4:45 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi,
> >
> > On Di, 2014-10-21 at 13:05 +0900, Erik Kline wrote:
> >> Add a sysctl that causes an interface's optimistic addresses
> >> to be considered equivalent to other non-deprecated addresses
> >> for source address selection purposes.  Preferred addresses
> >> will still take precedence over optimistic addresses, subject
> >> to other ranking in the source address selection algorithm.
> >>
> >> This is useful where different interfaces are connected to
> >> different networks from different ISPs (e.g., a cell network
> >> and a home wifi network).
> >>
> >> The current behaviour complies with RFC 3484/6724, and it
> >> makes sense if the host has only one interface, or has
> >> multiple interfaces on the same network (same or cooperating
> >> administrative domain(s), but not in the multiple distinct
> >> networks case.
> >>
> >> For example, if a mobile device has an IPv6 address on an LTE
> >> network and then connects to IPv6-enabled wifi, while the wifi
> >> IPv6 address is undergoing DAD, IPv6 connections will try use
> >> the wifi default route with the LTE IPv6 address, and will get
> >> stuck until they time out.
> >>
> >> Also, because optimistic addresses can actually be used, issue
> >> an RTM_NEWADDR as soon as DAD starts.  If DAD fails an separate
> >> RTM_DELADDR will be sent.
> >>
> >> Also: add an entry in ip-sysctl.txt for optimistic_dad.
> >>
> >> Signed-off-by: Erik Kline <ek@google.com>
> >> ---
> >>  Documentation/networking/ip-sysctl.txt | 13 ++++++++++++
> >>  include/linux/ipv6.h                   |  1 +
> >>  include/uapi/linux/ipv6.h              |  1 +
> >>  net/ipv6/addrconf.c                    | 36 ++++++++++++++++++++++++++++++++--
> >>  4 files changed, 49 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> >> index 0307e28..e03cf49 100644
> >> --- a/Documentation/networking/ip-sysctl.txt
> >> +++ b/Documentation/networking/ip-sysctl.txt
> >> @@ -1452,6 +1452,19 @@ suppress_frag_ndisc - INTEGER
> >>       1 - (default) discard fragmented neighbor discovery packets
> >>       0 - allow fragmented neighbor discovery packets
> >>
> >> +optimistic_dad - BOOLEAN
> >> +     Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
> >> +             0: disabled (default)
> >> +             1: enabled
> >> +
> >> +use_optimistic - BOOLEAN
> >> +     If enabled, do not classify optimistic addresses as deprecated during
> >> +     source address selection.  Preferred addresses will still be chosen
> >> +     before optimistic addresses, subject to other ranking in the source
> >> +     address selection algorithm.
> >> +             0: disabled (default)
> >> +             1: enabled
> >> +
> >>  icmp/*:
> >>  ratelimit - INTEGER
> >>       Limit the maximal rates for sending ICMPv6 packets.
> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> index ff56053..7121a2e 100644
> >> --- a/include/linux/ipv6.h
> >> +++ b/include/linux/ipv6.h
> >> @@ -42,6 +42,7 @@ struct ipv6_devconf {
> >>       __s32           accept_ra_from_local;
> >>  #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
> >>       __s32           optimistic_dad;
> >> +     __s32           use_optimistic;
> >>  #endif
> >>  #ifdef CONFIG_IPV6_MROUTE
> >>       __s32           mc_forwarding;
> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> >> index efa2666..e863d08 100644
> >> --- a/include/uapi/linux/ipv6.h
> >> +++ b/include/uapi/linux/ipv6.h
> >> @@ -164,6 +164,7 @@ enum {
> >>       DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL,
> >>       DEVCONF_SUPPRESS_FRAG_NDISC,
> >>       DEVCONF_ACCEPT_RA_FROM_LOCAL,
> >> +     DEVCONF_USE_OPTIMISTIC,
> >>       DEVCONF_MAX
> >>  };
> >>
> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> index 725c763..c2fddb7 100644
> >> --- a/net/ipv6/addrconf.c
> >> +++ b/net/ipv6/addrconf.c
> >> @@ -1169,6 +1169,9 @@ enum {
> >>       IPV6_SADDR_RULE_LABEL,
> >>       IPV6_SADDR_RULE_PRIVACY,
> >>       IPV6_SADDR_RULE_ORCHID,
> >> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
> >> +     IPV6_SADDR_RULE_NOT_OPTIMISTIC,
> >> +#endif
> >>       IPV6_SADDR_RULE_PREFIX,
> >>       IPV6_SADDR_RULE_MAX
> >>  };
> >> @@ -1257,10 +1260,17 @@ static int ipv6_get_saddr_eval(struct net *net,
> >>               score->scopedist = ret;
> >>               break;
> >>       case IPV6_SADDR_RULE_PREFERRED:
> >> +         {
> >>               /* Rule 3: Avoid deprecated and optimistic addresses */
> >> +             u8 avoid = IFA_F_DEPRECATED;
> >> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
> >> +             if (!score->ifa->idev->cnf.use_optimistic)
> >> +                     avoid |= IFA_F_OPTIMISTIC;
> >> +#endif
> >>               ret = ipv6_saddr_preferred(score->addr_type) ||
> >> -                   !(score->ifa->flags & (IFA_F_DEPRECATED|IFA_F_OPTIMISTIC));
> >> +                   !(score->ifa->flags & avoid);
> >>               break;
> >> +         }
> >>  #ifdef CONFIG_IPV6_MIP6
> >>       case IPV6_SADDR_RULE_HOA:
> >>           {
> >> @@ -1299,6 +1309,14 @@ static int ipv6_get_saddr_eval(struct net *net,
> >>               ret = !(ipv6_addr_orchid(&score->ifa->addr) ^
> >>                       ipv6_addr_orchid(dst->addr));
> >>               break;
> >> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
> >> +     case IPV6_SADDR_RULE_NOT_OPTIMISTIC:
> >> +             /* Optimistic addresses still have lower precedence than other
> >> +              * preferred addresses.
> >> +              */
> >> +             ret = !(score->ifa->flags & IFA_F_OPTIMISTIC);
> >> +             break;
> >> +#endif
> >
> > I wonder a bit why this rule is not directly ordered after
> > IPV6_SADDR_RULE_PREFERRED? This would e.g. matter for privacy addresses.
> 
> Privacy addresses ("tempaddrs") that win in earlier checks are
> preferred before optimistic in this code (i.e. a tempaddr on the same
> outgoing interface is preferred before an optimistic address).
> 
> Similarly, a non-tentative non-privacy address (same outgoing
> interface, same label, ...) will match before an optimistic address,
> but only until DAD completes and the address is no longer optimistic.
> I think this is in keeping with the spirit of the RFC 3484/6724 rules.

Oh yes, I see. I had the evaluation order messed up. ;)

So the question I should be asking would be. AFAIR optimistic addresses
should be handled like deprecated ones, so I am a bit concerned adding a
non-conditional rule before the RULE_PREFIX check.

Shouldn't we only break the tie *after* longest prefix match then? If
you do that before longest prefix match I would prefer ret being masked
by use_optimisitic flag.

> After there's an RFC 7217 implementation, EUI-64-based SLAAC could be
> disabled by folks.

Ack.

> 
> >>       case IPV6_SADDR_RULE_PREFIX:
> >>               /* Rule 8: Use longest matching prefix */
> >>               ret = ipv6_addr_diff(&score->ifa->addr, dst->addr);
> >> @@ -3222,8 +3240,13 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
> >>        * Optimistic nodes can start receiving
> >>        * Frames right away
> >>        */
> >> -     if (ifp->flags & IFA_F_OPTIMISTIC)
> >> +     if (ifp->flags & IFA_F_OPTIMISTIC) {
> >>               ip6_ins_rt(ifp->rt);
> >> +             /* Because optimistic nodes can receive frames, notify
> >> +              * listeners. If DAD fails, RTM_DELADDR is sent.
> >> +              */
> >> +             ipv6_ifa_notify(RTM_NEWADDR, ifp);
> >> +     }
> >
> > I wonder if we can now delete the ipv6_ifa_notify(RTM_NEWADDR, ifp) in
> > addrconf_dad_completed.
> 
> I don't know what everyone's general preference would be, but mine
> would be to err on the side of notifying on state changes.  It seems
> harmless to me to keep it in, and something in userspace might want to
> know if/when DAD completes.

Userspace expects to communicate with an address which gets announced
via RTM_NEWADDR, so I would make the RTM_NEWADDR notifications
conditional on use_optimistic flag in both, the completed and the
dad_begin function. This sounds like the best option to me, no?

Bye && thanks,
Hannes

^ permalink raw reply

* Re: IPv6 UFO for VMs
From: Hannes Frederic Sowa @ 2014-10-22  9:35 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, virtualization
In-Reply-To: <1413935045.5994.2.camel@decadent.org.uk>

On Mi, 2014-10-22 at 00:44 +0100, Ben Hutchings wrote:
> There are several ways that VMs can take advantage of UFO and get the
> host to do fragmentation for them:
> 
> drivers/net/macvtap.c:                  gso_type = SKB_GSO_UDP;
> drivers/net/tun.c:                      skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> drivers/net/virtio_net.c:                       skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> 
> Our implementation of UFO for IPv6 does:
> 
> 		fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
> 		fptr->nexthdr = nexthdr;
> 		fptr->reserved = 0;
> 		fptr->identification = skb_shinfo(skb)->ip6_frag_id;
> 
> which assumes ip6_frag_id has been set.  That's only true if the local
> stack constructed the skb; otherwise it appears we get zero.
> 
> This seems to be a regression as a result of:
> 
> commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6
> Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date:   Fri Feb 21 02:55:35 2014 +0100
> 
>     ipv6: reuse ip6_frag_id from ip6_ufo_append_data
> 
> However, that change seems reasonable - we *shouldn't* be choosing IDs
> for any other stack.  Any paravirt net driver that can use IPv6 UFO
> needs to have some way of passing a fragmentation ID to put in
> skb_shared_info::ip6_frag_id.

Do we really gain a lot of performance by enabling UFO on those devices
or would it make sense to just drop support? It only helps fragmenting
large UDP packets, so I don't think it is worth it.

Otherwise I agree with Ben, we need to pass a fragmentation id from the
host over to the system segmenting the gso frame. Fragmentation ids must
be generated by the end system.

Hmm...

Bye,
Hannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox