Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/2] net: phy: relax error checking when creating sysfs link netdev->phydev
From: Grygorii Strashko @ 2018-03-16 20:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David S. Miller, netdev, Greg Kroah-Hartman,
	Sekhar Nori, linux-kernel, linux-omap
In-Reply-To: <20180316195403.GA8735@lunn.ch>



On 03/16/2018 02:54 PM, Andrew Lunn wrote:
>> The phydrv->mdiodrv.flags can be accessible only after call to of_phy_connect()/phy_connect(),
> 
> You need to use a function like of_phy_find_device() to get the
> phydev, set the flag, and then call phy_connect_direct().


So, do you propose me to replace direct calls of of_phy_connect()/phy_connect() in
CPSW driver with buddies of the same functions? Right?

cpsw_slave_open()
{
....
	if (slave->data->phy_node) {
		phy = of_phy_connect(priv->ndev, slave->data->phy_node,
				 &cpsw_adjust_link, 0, slave->data->phy_if);
----- replace ^^^^ with below
{
	struct phy_device *phy = of_phy_find_device(phy_np);
	int ret;

	if (!phy)
		return NULL;

	phy->dev_flags = flags;

-----	[set flag in phydrv->mdiodrv.flags]

	ret = phy_connect_direct(dev, phy, hndlr, iface);

	/* refcount is held by phy_connect_direct() on success */
	put_device(&phy->mdio.dev);

	return ret ? NULL : phy;
}
-----
		if (!phy) {
			dev_err(priv->dev, "phy \"%pOF\" not found on slave %d\n",
				slave->data->phy_node,
				slave->slave_num);
			return;
		}
	} else {
		phy = phy_connect(priv->ndev, slave->data->phy_id,
				 &cpsw_adjust_link, slave->data->phy_if);
----- replace ^^^^ with below
{
	struct phy_device *phydev;
	struct device *d;
	int rc;

	/* Search the list of PHY devices on the mdio bus for the
	 * PHY with the requested name
	 */
	d = bus_find_device_by_name(&mdio_bus_type, NULL, bus_id);
	if (!d) {
		pr_err("PHY %s not found\n", bus_id);
		return ERR_PTR(-ENODEV);
	}
	phydev = to_phy_device(d);

-----	[set flag in phydrv->mdiodrv.flags]

	rc = phy_connect_direct(dev, phydev, handler, interface);
	put_device(d);
	if (rc)
		return ERR_PTR(rc);

	return phydev;
}
-----
		if (IS_ERR(phy)) {
			dev_err(priv->dev,
				"phy \"%s\" not found on slave %d, err %ld\n",
				slave->data->phy_id, slave->slave_num,
				PTR_ERR(phy));
			return;
		}
	}
}

and all above just to set a flag which will be used by just one driver as of now.

Hm. Is this some sort of punishment ;) Sry. I'll probably will take a pause.

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
From: Al Viro @ 2018-03-16 20:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Florian Weimer, Kees Cook, Andrew Morton, Josh Poimboeuf,
	Rasmus Villemoes, Randy Dunlap, Miguel Ojeda, Ingo Molnar,
	David Laight, Ian Abbott, linux-input, linux-btrfs,
	Network Development, Linux Kernel Mailing List, Kernel Hardening
In-Reply-To: <CA+55aFzsrVF7W+YQye=EeF4Wk3yDOTbb_vSiM8s1zkKbE4JV4Q@mail.gmail.com>

On Fri, Mar 16, 2018 at 12:27:23PM -0700, Linus Torvalds wrote:

> But it sure isn't "variable" either as far as the standard is
> concerned, because the standard doesn't even have that concept (it
> uses "variable" for argument numbers and for variables).

Huh?  6.7.5.2p4:

If the size is not present, the array type is an incomplete type.
If the size is * instead of being an expression, the array type is
a variable length array type of unspecified size, which can only be
used in declarations with function prototype scope [footnote]; such
arrays are nonetheless complete types.  If the size is an integer
constant expression and the element type has a known constant size,
the array type is not a variable length array type; otherwise, the
array type is a variable length array type.

footnote: Thus, * can be used only in function declarations that are
not definitions (see 6.7.5.3).

That's C99, straight from N1256.pdf (C99-TC3)...

^ permalink raw reply

* Re: [PATCH net-next 3/9] selftests: pmtu: Introduce support for multiple tests
From: Stefano Brivio @ 2018-03-16 20:03 UTC (permalink / raw)
  To: David Ahern; +Cc: David S . Miller, Sabrina Dubroca, Steffen Klassert, netdev
In-Reply-To: <626b0eb0-846a-b00e-58a2-d5c2e3b24a95@gmail.com>

On Fri, 16 Mar 2018 12:53:23 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 3/16/18 12:29 PM, Stefano Brivio wrote:
> > This might be equally counter-intuitive for somebody. If one does a lot
> > of explicit error checking with early returns, my return convention is
> > also rather practical. E.g. in the setup() function I can do:
> > 
> > 	eval setup_${arg} && echo "  ${arg} not supported" && return 0
> > 
> > instead of:
> > 
> > 	eval setup_${arg} || { echo "  ${arg} not supported" && return 0; }  
> 
> I think it is weird to have 'a && b' where b is done when a fails as
> opposed to succeeds hence the comment. I think a common convention
> across scripts is important but having the tests is more so. Just a
> suggestion.

Yeah, also true. I'll change this.

> Look at fib_tests and fib-onlink-tests. As the number of tests grows,
> output consistency makes your life easier. With printf:
> 
> ...
> TEST: IPv4 linkdown flag set                                  [ OK ]
> TEST: IPv6 linkdown flag set                                  [ OK ]
> TEST: Directly connected nexthop, unicast address             [ OK ]
> TEST: Directly connected nexthop, unicast address with device [ OK ]
> TEST: Gateway is linklocal address                            [ OK ]
> TEST: Gateway is linklocal address, no device                 [ OK ]
> TEST: Gateway can not be local unicast address                [ OK ]
> TEST: Gateway can not be local unicast address, with device   [ OK ]
> TEST: Gateway can not be a local linklocal address            [ OK ]
> TEST: Gateway can be local address in a VRF                   [FAIL]
> TEST: Gateway can be local address in a VRF, with device      [FAIL]
> TEST: Gateway can be local linklocal address in a VRF         [ OK ]
> TEST: Redirect to VRF lookup                                  [ OK ]
> ...
> 
> the FAIL cases jump out versus echo
> 
> ...
> TEST: IPv4 linkdown flag set [ OK ]
> TEST: IPv6 linkdown flag set [ OK ]
> TEST: Directly connected nexthop, unicast address [ OK ]
> TEST: Directly connected nexthop, unicast address with device [ OK ]
> TEST: Gateway is linklocal address [ OK ]
> TEST: Gateway is linklocal address, no device [ OK ]
> TEST: Gateway can not be local unicast address [ OK ]
> TEST: Gateway can not be local unicast address, with device [ OK ]
> TEST: Gateway can not be a local linklocal address [ OK ]
> TEST: Gateway can be local address in a VRF [FAIL]
> TEST: Gateway can be local address in a VRF, with device [FAIL]
> TEST: Gateway can be local linklocal address in a VRF [ OK ]
> TEST: Redirect to VRF lookup [ OK ]
> ...
> 
> where your mind has a lot more work to do to find the tests broken by a
> change.
> 
> That is also why I chose OK versus PASS -- ok at 2 letters, fail at 4
> the failures really stand out.

I see your point. I'll use printf and make it a bit prettier in v2.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
From: Miguel Ojeda @ 2018-03-16 20:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Florian Weimer, Kees Cook, Andrew Morton, Josh Poimboeuf,
	Rasmus Villemoes, Randy Dunlap, Ingo Molnar, David Laight,
	Ian Abbott, linux-input, linux-btrfs, Network Development,
	Linux Kernel Mailing List, Kernel Hardening
In-Reply-To: <CA+55aFzsrVF7W+YQye=EeF4Wk3yDOTbb_vSiM8s1zkKbE4JV4Q@mail.gmail.com>

On Fri, Mar 16, 2018 at 8:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Mar 16, 2018 at 10:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> That's not them, that's C standard regarding ICE.
>
> Yes. The C standard talks about "integer constant expression". I know.
> It's come up in this very thread before.
>
> The C standard at no point talks about - or forbids - "variable length
> arrays". That never comes up in the whole standard, I checked.
>
> So we are right now hindered by a _syntactic_ check, without any way
> to have a _semantic_ check.
>
> That's my problem. The warnings are misleading and imply semantics.
>
> And apparently there is no way to actually check semantics.
>
>> 1,100 is *not* a constant expression as far as the standard is concerned,
>
> I very much know.
>
> But it sure isn't "variable" either as far as the standard is
> concerned, because the standard doesn't even have that concept (it
> uses "variable" for argument numbers and for variables).
>
> So being pedantic doesn't really change anything.
>
>> Would you argue that in
>> void foo(char c)
>> {
>>         int a[(c<<1) + 10 - c + 2 - c];
>
> Yeah, I don't think that even counts as a constant value, even if it
> can be optimized to one. I would not at all be unhppy to see
> __builtin_constant_p() to return zero.
>
> But that is very much different from the syntax issue.
>
> So I would like to get some way to get both type-checking and constant
> checking without the annoying syntax issue.
>
>> expr, constant_expression is not a constant_expression.  And in
>> this particular case the standard is not insane - the only reason
>> for using that is typechecking and _that_ can be achieved without
>> violating 6.6p6:
>>         sizeof(expr,0) * 0 + ICE
>> *is* an integer constant expression, and it gives you exact same
>> typechecking.  So if somebody wants to play odd games, they can
>> do that just fine, without complicating the logics for compilers...
>
> Now that actually looks like a good trick. Maybe we can use that
> instead of the comma expression that causes problems.
>
> And using sizeof() to make sure that __builtin_choose_expression()
> really gets an integer constant expression and that there should be no
> ambiguity looks good.
>
> Hmm.
>
> This works for me, and I'm being *very* careful (those casts to
> pointer types are inside that sizeof, because it's not an integral
> type, and non-integral casts are not valid in an ICE either) but
> somebody needs to check gcc-4.4:
>
>   #define __typecheck(a,b) \
>         (!!(sizeof((typeof(a)*)1==(typeof(b)*)1)))
>
>   #define __no_side_effects(a,b) \
>         (__builtin_constant_p(a)&&__builtin_constant_p(b))
>
>   #define __safe_cmp(a,b) \
>         (__typecheck(a,b) && __no_side_effects(a,b))
>
>   #define __cmp(a,b,op) ((a)op(b)?(a):(b))
>
>   #define __cmp_once(a,b,op) ({ \
>         typeof(a) __a = (a);            \
>         typeof(b) __b = (b);            \
>         __cmp(__a,__b,op); })
>
>   #define __careful_cmp(a,b,op) \
>         __builtin_choose_expr(__safe_cmp(a,b), __cmp(a,b,op),
> __cmp_once(a,b,op))
>
>   #define min(a,b)              __careful_cmp(a,b,<)
>   #define max(a,b)              __careful_cmp(a,b,>)
>   #define min_t(t,a,b)  __careful_cmp((t)(a),(t)(b),<)
>   #define max_t(t,a,b)  __careful_cmp((t)(a),(t)(b),>)
>
> and yes, it does cause new warnings for that
>
>     comparison between ‘enum tis_defaults’ and ‘enum tpm2_const’
>
> in drivers/char/tpm/tpm_tis_core.h due to
>
>    #define TIS_TIMEOUT_A_MAX       max(TIS_SHORT_TIMEOUT, TPM2_TIMEOUT_A)
>
> but technically that warning is actually correct, I'm just confused
> why gcc cares about the cast placement or something.
>
> That warning is easy to fix by turning it into a "max_t(int, enum1,
> enum2)' and that is technically the right thing to do, it's just not
> warned about for some odd reason with the current code.
>
> Kees - is there some online "gcc-4.4 checker" somewhere? This does
> seem to work with my gcc. I actually tested some of those files you
> pointed at now.

I use this one:

  https://godbolt.org/

Cheers,
Miguel

^ permalink raw reply

* Re: Fw: [Bug 199109] New: pptp: kernel printk "recursion detected", and then reboot itself
From: Guillaume Nault @ 2018-03-16 20:02 UTC (permalink / raw)
  To: xu heng; +Cc: Stephen Hemminger, xeb, netdev
In-Reply-To: <1622d9239ec.fb8c9df225155.9220077804249612652@zoho.com>

On Fri, Mar 16, 2018 at 02:49:40PM +0800, xu heng wrote:
> 
>         For testing, in __ppp_channel_push(), disable sending anything from the attached unit, just disable __ppp_xmit_process(ppp) in __ppp_channel_push(). In my opinion, __ppp_xmit_process() should only called by ppp_xmit_process(), because of ppp-&gt;xmit_recursion __percpu.
> 
ppp_channel_push() needs to call __ppp_xmit_process() because some
drivers (like ppp_async) need to notify ppp_generic when they can
accept new packets. This is done by ppp_output_wakeup() which then
calls ppp_channel_push(). So we have to handle the unit backlog there.


Please try the following patch (untested).

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index b883af93929c..af22eb11bbaa 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -255,7 +255,7 @@ struct ppp_net {
 /* Prototypes. */
 static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
 			struct file *file, unsigned int cmd, unsigned long arg);
-static void ppp_xmit_process(struct ppp *ppp);
+static void ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb);
 static void ppp_send_frame(struct ppp *ppp, struct sk_buff *skb);
 static void ppp_push(struct ppp *ppp);
 static void ppp_channel_push(struct channel *pch);
@@ -511,13 +511,12 @@ static ssize_t ppp_write(struct file *file, const char __user *buf,
 		goto out;
 	}
 
-	skb_queue_tail(&pf->xq, skb);
-
 	switch (pf->kind) {
 	case INTERFACE:
-		ppp_xmit_process(PF_TO_PPP(pf));
+		ppp_xmit_process(PF_TO_PPP(pf), skb);
 		break;
 	case CHANNEL:
+		skb_queue_tail(&pf->xq, skb);
 		ppp_channel_push(PF_TO_CHANNEL(pf));
 		break;
 	}
@@ -1260,8 +1259,8 @@ ppp_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	put_unaligned_be16(proto, pp);
 
 	skb_scrub_packet(skb, !net_eq(ppp->ppp_net, dev_net(dev)));
-	skb_queue_tail(&ppp->file.xq, skb);
-	ppp_xmit_process(ppp);
+	ppp_xmit_process(ppp, skb);
+
 	return NETDEV_TX_OK;
 
  outf:
@@ -1415,13 +1414,14 @@ static void ppp_setup(struct net_device *dev)
  */
 
 /* Called to do any work queued up on the transmit side that can now be done */
-static void __ppp_xmit_process(struct ppp *ppp)
+static void __ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
 {
-	struct sk_buff *skb;
-
 	ppp_xmit_lock(ppp);
 	if (!ppp->closing) {
 		ppp_push(ppp);
+
+		if (skb)
+			skb_queue_tail(&ppp->file.xq, skb);
 		while (!ppp->xmit_pending &&
 		       (skb = skb_dequeue(&ppp->file.xq)))
 			ppp_send_frame(ppp, skb);
@@ -1435,7 +1435,7 @@ static void __ppp_xmit_process(struct ppp *ppp)
 	ppp_xmit_unlock(ppp);
 }
 
-static void ppp_xmit_process(struct ppp *ppp)
+static void ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
 {
 	local_bh_disable();
 
@@ -1443,7 +1443,7 @@ static void ppp_xmit_process(struct ppp *ppp)
 		goto err;
 
 	(*this_cpu_ptr(ppp->xmit_recursion))++;
-	__ppp_xmit_process(ppp);
+	__ppp_xmit_process(ppp, skb);
 	(*this_cpu_ptr(ppp->xmit_recursion))--;
 
 	local_bh_enable();
@@ -1452,6 +1452,7 @@ static void ppp_xmit_process(struct ppp *ppp)
 
 err:
 	local_bh_enable();
+	kfree_skb(skb);
 
 	if (net_ratelimit())
 		netdev_err(ppp->dev, "recursion detected\n");
@@ -1937,7 +1938,7 @@ static void __ppp_channel_push(struct channel *pch)
 	if (skb_queue_empty(&pch->file.xq)) {
 		ppp = pch->ppp;
 		if (ppp)
-			__ppp_xmit_process(ppp);
+			__ppp_xmit_process(ppp, NULL);
 	}
 }
 

^ permalink raw reply related

* Re: [PATCH net-next 2/2] net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit
From: Eric Dumazet @ 2018-03-16 20:02 UTC (permalink / raw)
  To: Yousuk Seung, David Miller
  Cc: netdev, Neal Cardwell, Priyaranjan Jha, Soheil Hassas Yeganeh,
	Yuchung Cheng
In-Reply-To: <20180316175149.224421-1-ysseung@google.com>



On 03/16/2018 10:51 AM, Yousuk Seung wrote:
> Set tp->snd_ssthresh to BDP upon STARTUP exit. This allows us
> to check if a BBR flow exited STARTUP and the BDP at the
> time of STARTUP exit with SCM_TIMESTAMPING_OPT_STATS. Since BBR does not
> use snd_ssthresh this fix has no impact on BBR's behavior.


Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next 3/9] selftests: pmtu: Introduce support for multiple tests
From: David Ahern @ 2018-03-16 19:53 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David S . Miller, Sabrina Dubroca, Steffen Klassert, netdev
In-Reply-To: <20180316202920.3dc280be@epycfail>

On 3/16/18 12:29 PM, Stefano Brivio wrote:
> On Fri, 16 Mar 2018 11:06:07 -0700
> David Ahern <dsahern@gmail.com> wrote:
> 
>> On 3/15/18 9:18 AM, Stefano Brivio wrote:
>>>  trap cleanup EXIT
>>>  
>>> -test_pmtu_vti6_exception
>>> +exitcode=0
>>> +for name in ${tests}; do
>>> +	echo "${name}: START"
>>> +	eval test_${name}
>>> +	ret=$?
>>> +	cleanup
>>> +
>>> +	if [ $ret -eq 0 ]; then echo "${name}: FAIL"; exitcode=1  
>>
>> ret = 0 == failure is counterintuitive for Linux.
> 
> However, in POSIX shell's AND and OR lists with function calls, a
> function returning zero behaves in a similar fashion to a C function
> evaluating to true, and a function returning non-zero behaves like a C
> function evaluating to false [1]:
> 
> 	a() {
> 		return 0
> 	}
> 
> 	b() {
> 		return 1
> 	}
> 
> 	a && echo this gets printed
> 	b && echo and this does not
> 
> This might be equally counter-intuitive for somebody. If one does a lot
> of explicit error checking with early returns, my return convention is
> also rather practical. E.g. in the setup() function I can do:
> 
> 	eval setup_${arg} && echo "  ${arg} not supported" && return 0
> 
> instead of:
> 
> 	eval setup_${arg} || { echo "  ${arg} not supported" && return 0; }

I think it is weird to have 'a && b' where b is done when a fails as
opposed to succeeds hence the comment. I think a common convention
across scripts is important but having the tests is more so. Just a
suggestion.


> 
> Still, I went ahead and reversed return codes in the whole script, and
> it doesn't look *too* bad with the setup() function from patch 3/9. It
> would have been quite ugly earlier.
> 
> So I don't have a strong preference. If you still prefer that I reverse
> my return codes, I will re-spin the series (and probably I'll need a
> first patch that reverses the existing logic, too).
> 
>>> +	elif [ $ret -eq 1 ]; then echo "${name}: PASS"
>>> +	elif [ $ret -eq 2 ]; then echo "${name}: SKIP"  
>>
>> I use printf in other scripts so that the pass/fail verdict lineup. e.g.,
>> printf "        %-60s  [PASS]\n" "${name}"
> 
> I avoided 'printf' so far because it's not a built-in utility on some
> shells (e.g. csh), being a "recent" addition to the POSIX Base
> Specifications (issue 4, while 'echo' is from issue 2), and it might be
> unavailable on some (embedded?) systems.
> 
> I don't have a strong preference about this either. It's a very minor
> portability concern vs. a very minor readability improvement, what do
> you think? 

Look at fib_tests and fib-onlink-tests. As the number of tests grows,
output consistency makes your life easier. With printf:

...
TEST: IPv4 linkdown flag set                                  [ OK ]
TEST: IPv6 linkdown flag set                                  [ OK ]
TEST: Directly connected nexthop, unicast address             [ OK ]
TEST: Directly connected nexthop, unicast address with device [ OK ]
TEST: Gateway is linklocal address                            [ OK ]
TEST: Gateway is linklocal address, no device                 [ OK ]
TEST: Gateway can not be local unicast address                [ OK ]
TEST: Gateway can not be local unicast address, with device   [ OK ]
TEST: Gateway can not be a local linklocal address            [ OK ]
TEST: Gateway can be local address in a VRF                   [FAIL]
TEST: Gateway can be local address in a VRF, with device      [FAIL]
TEST: Gateway can be local linklocal address in a VRF         [ OK ]
TEST: Redirect to VRF lookup                                  [ OK ]
...

the FAIL cases jump out versus echo

...
TEST: IPv4 linkdown flag set [ OK ]
TEST: IPv6 linkdown flag set [ OK ]
TEST: Directly connected nexthop, unicast address [ OK ]
TEST: Directly connected nexthop, unicast address with device [ OK ]
TEST: Gateway is linklocal address [ OK ]
TEST: Gateway is linklocal address, no device [ OK ]
TEST: Gateway can not be local unicast address [ OK ]
TEST: Gateway can not be local unicast address, with device [ OK ]
TEST: Gateway can not be a local linklocal address [ OK ]
TEST: Gateway can be local address in a VRF [FAIL]
TEST: Gateway can be local address in a VRF, with device [FAIL]
TEST: Gateway can be local linklocal address in a VRF [ OK ]
TEST: Redirect to VRF lookup [ OK ]
...

where your mind has a lot more work to do to find the tests broken by a
change.

That is also why I chose OK versus PASS -- ok at 2 letters, fail at 4
the failures really stand out.

^ permalink raw reply

* Re: [PATCH 0/2] net: phy: relax error checking when creating sysfs link netdev->phydev
From: Andrew Lunn @ 2018-03-16 19:54 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Florian Fainelli, David S. Miller, netdev, Greg Kroah-Hartman,
	Sekhar Nori, linux-kernel, linux-omap
In-Reply-To: <f9f7ea0b-4e23-d837-c1b1-73ed7d6fb686@ti.com>

> The phydrv->mdiodrv.flags can be accessible only after call to of_phy_connect()/phy_connect(), 

You need to use a function like of_phy_find_device() to get the
phydev, set the flag, and then call phy_connect_direct(). 

	Andrew

^ permalink raw reply

* Re: [PATCH -next 00/22] remove in-kernel syscall invocations (part 2 == netdev)
From: Linus Torvalds @ 2018-03-16 19:42 UTC (permalink / raw)
  To: David Miller
  Cc: Dominik Brodowski, Linux Kernel Mailing List, Network Development
In-Reply-To: <20180316.143021.1158193720540239477.davem@davemloft.net>

On Fri, Mar 16, 2018 at 11:30 AM, David Miller <davem@davemloft.net> wrote:
>
> I imagine one of the things you'd like to do is declare that syscall
> entries use a different (better) argument passing scheme.  For
> example, passing values in registers instead of on the stack.

Actually, it's almost exactly the reverse.

On x86-64, we'd like to just pass the 'struct pt_regs *' pointer, and
have the sys_xyz() function itself just pick out the arguments it
needs from there.

That has a few reasons for it:

 - we can clear all registers at system call entry, which helps defeat
some of the "pass seldom used register with user-controlled value that
survives deep into the callchain" things that people used to leak
information

 - we can streamline the low-level system call code, which needs to
pass around 'struct pt_regs *' anyway, and the system call only picks
up the values it actually needs

 - it's really quite easy(*) to just make the SYSCALL_DEFINEx() macros
just do it all with a wrapper inline function

but it fundamentally means that you *cannot* call 'sys_xyz()' from
within the kernel, unless you then do it with something crazy like

    struct pt_regs myregs;
    ... fill in the right registers for this architecture _if_ this
architecture uses ptregs ..
    sys_xyz(&regs);

which I somehow really doubt you want to do in the networking code.

Now, I did do one version that just created two entrypoints for every
single system call - the "kernel version" and the "real" system call
version. That sucks, because you have two choices:

 - either pointlessly generate extra code for the 200+ system calls
that are *not* used by the kernel

 - or let gcc just merge the two, and make code generation suck where
the real system call just loads the registers and jumps to the common
code.

That second option really does suck, because if you let the compiler
just generate the _single_ system call, it will do the "load actual
value from ptregs" much more nicely, and only when it needs it, and
schedules it all into the system call code.

So just making the rule be: "you mustn't call the SYSCALL_DEFINEx()
functions from anything but the system call code" really makes
everything better.

Then you only need to fix up the *handful* of so system calls that
actually have in-kernel callers.

Many of them end up being things that could be improved on further
anyway (ie there's discussion about further cleanup and trying to
avoid using "set_fs()" for arguments etc, because there already exists
helper functions that take the kernel-space versions, and the
sys_xyz() version is actually just going through stupid extra work for
a kernel user).

                    Linus

(*) The "really quite easy" is only true on 64-bit architectures.
32-bit architectures have issues with packing 64-bit values into two
registers, so using macro expansion with just the number of arguments
doesn't work.

^ permalink raw reply

* Re: [PATCH 0/2] net: phy: relax error checking when creating sysfs link netdev->phydev
From: Grygorii Strashko @ 2018-03-16 19:41 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn
  Cc: David S. Miller, netdev, Greg Kroah-Hartman, Sekhar Nori,
	linux-kernel, linux-omap
In-Reply-To: <44DFE938-3FF8-4608-87DD-BC84705EFFF8@gmail.com>



On 03/16/2018 02:11 PM, Florian Fainelli wrote:
> On March 16, 2018 11:42:21 AM PDT, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>>
>>
>> On 03/16/2018 12:34 PM, Florian Fainelli wrote:
>>>
>>>
>>> On 03/16/2018 10:22 AM, Andrew Lunn wrote:
>>>> On Wed, Mar 14, 2018 at 05:26:22PM -0500, Grygorii Strashko wrote:
>>>>> Some ethernet drivers (like TI CPSW) may connect and manage >1 Net
>> PHYs per
>>>>> one netdevice, as result such drivers will produce warning during
>> system
>>>>> boot and fail to connect second phy to netdevice when PHYLIB
>> framework
>>>>> will try to create sysfs link netdev->phydev for second PHY
>>>>> in phy_attach_direct(), because sysfs link with the same name has
>> been
>>>>> created already for the first PHY.
>>>>> As result, second CPSW external port will became unusable.
>>>>> This issue was introduced by commits:
>>>>> 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for
>> attached_dev/phydev"
>>>>> a3995460491d ("net: phy: Relax error checking on
>> sysfs_create_link()"
>>>>
>>>> I wonder if it would be better to add a flag to the phydev that
>>>> indicates it is the second PHY connected to a MAC? Add a bit to
>>>> phydrv->mdiodrv.flags. If that bit is set, don't create the sysfs
>>>> file.
>>>
>>> We could indeed do that, I am fine with Grygorii's approach though in
>>> making the creation more silent and non fatal.
>>
>> The link phydev->netdev still can be created. And failure to create
>> links
>> is non fatal error in my opinion.
> 
> They should not be fatal I agree, but it's nice to know when you are doing something wrong anyway.
> 
>>
>>>
>>>>
>>>> For 99% of MAC drivers, having two PHYs is an error, so we want to
>> aid
>>>> debug by reporting the sysfs error.
>>> That is true, either way is fine with me, really.
>>>
>>
>> Error still will be reported, just not warning and it will be
>> non-fatal.
>> So, with this patch set it will be possible now to continue boot (NFS
>> for example),
>> connect to the system and gather logs.
> 
> The point Andrew is trying to make is that you address one particular failure in the PHY creation path when using >
 1 PHY devices with a network device. Using a flag would easily allow us to be more future proof with other parts of PHYLIB
  for your particular use case if that becomes necessary. This gives you less incentive to fix this use case though.
> 

That's true, I'm fixing use case with >1 and I'll try to re-implement using flag as requested.
But note, this patch in its current form fixes 1:1 (phydev:netdev) use case also (at least as i understand it),
because current code will just kill net connection if create sysfs link fails, so in case of net boot -
failure logs will not be accessible without direct access to the device.

Actually, how can i pass this flag "<name of the flag>" from CPSW to of_phy_connect()->phy_attach_direct()?
The parameter "flags" == phy_device->dev_flags is used to pass PHY driver's specific options, so can't be used.

The phydrv->mdiodrv.flags can be accessible only after call to of_phy_connect()/phy_connect(), 
but sysfs links are created inside these functions.

Thanks.
-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH -next 00/22] remove in-kernel syscall invocations (part 2 == netdev)
From: Dominik Brodowski @ 2018-03-16 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, torvalds, netdev
In-Reply-To: <20180316.143021.1158193720540239477.davem@davemloft.net>

On Fri, Mar 16, 2018 at 02:30:21PM -0400, David Miller wrote:
> From: Dominik Brodowski <linux@dominikbrodowski.net>
> Date: Fri, 16 Mar 2018 18:05:52 +0100
> 
> > The rationale of this change is described in patch 1 of part 1[*] as follows:
> > 
> > 	The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> > 	and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> > 	through kernel entry points, but not from the kernel itself. This
> > 	will allow cleanups and optimizations to the entry paths *and* to
> > 	the parts of the kernel code which currently need to pretend to be
> > 	userspace in order to make use of syscalls.
> > 
> > At present, these patches are based on v4.16-rc5; there is one trivial
> > conflict against net-next. Dave, I presume that you prefer to take them
> > through net-next? If you want to, I can re-base them against net-next.
> > If you prefer otherwise, though, I can route them as part of my whole
> > syscall series.
> 
> So the transformations themeselves are relatively trivial, so on that
> aspect I don't have any problems with these changes.

Thank you for your fast feedback.

> But overall I have to wonder.
> 
> I imagine one of the things you'd like to do is declare that syscall
> entries use a different (better) argument passing scheme.  For
> example, passing values in registers instead of on the stack.

Well, sort of. Currently, x86-64 decodes all six registers unconditionally:

		regs->ax = sys_call_table[nr](
			regs->di, regs->si, regs->dx,
			regs->r10, regs->r8, regs->r9);

so that in do_syscall_64(), we have to get six parameters from the
stack:

	mov    0x38(%rbx),%rcx
	mov    0x60(%rbx),%rdx
	mov    0x68(%rbx),%rsi
	mov    0x70(%rbx),%rdi
	mov    0x40(%rbx),%r9
	mov    0x48(%rbx),%r8

Instead, the aim is to do

	regs->ax = sys_call_table[nr](regs)

... which results in just a register rename operation:

	mov    %rbp,%rdi

> But in situations where you split out the system call function
> completely into one of these "helpers", the compiler is going
> to have two choices:
> 
> 1) Expand the helper into the syscall function inline, thus we end up
>    with two copies of the function.

That's only sensible for very short stubs, which just call another function
(e.g. __compat_sys_sendmsg()).

> 2) Call the helper from the syscall function.  Well, then the compiler
>    will need to pop the syscal obtained arguments from the registers
>    onto the stack.
> 
> So this doesn't seem like such a total win to me.
> 
> Maybe you can explain things better to ease my concerns.

For example, for sys_recv() and sys_recvfrom(), if all is complete, this
results in:

sys_x86_64_recv:
	callq <__fentry__>
	/* decode struct pt_regs for exactly those parameters
	 * we care about
	 */
	mov    0x38(%rdi),%rcx
	xor    %r9d,%r9d
	xor    %r8d,%r8d
	mov    0x60(%rdi),%rdx
	mov    0x68(%rdi),%rsi
	mov    0x70(%rdi),%rdi

	/* call __sys_recvfrom */
	callq  <__sys_recvfrom>

	/* cleanup and return */
	cltq
	retq

That's only obtaining four entries from the stack, and two register clearing
operations; sys_x86_64_recvfrom is similar (6 movs from stack, one register
rename mov, no xor).

__sys_recvfrom() then does the actual work, starting with pushing some
register contect out of the way and moving registers around, more or less
what SyS_recvfrom() does today.

So the result is nothing spectacular or unusual, but pretty equivalent and
possibly even shorter compared to current codepath.

Thanks,
	Dominik

^ permalink raw reply

* Re: [v2] vhost: add vsock compat ioctl
From: David Miller @ 2018-03-16 19:30 UTC (permalink / raw)
  To: sonnyrao; +Cc: netdev


Although the top level ioctls are probably size and layout compatible,
I do not think that the deeper ioctls can be called by compat binaries
without some translations in order for them to work.

^ permalink raw reply

* Re: [PATCH net-next 3/9] selftests: pmtu: Introduce support for multiple tests
From: Stefano Brivio @ 2018-03-16 19:29 UTC (permalink / raw)
  To: David Ahern; +Cc: David S . Miller, Sabrina Dubroca, Steffen Klassert, netdev
In-Reply-To: <b12fdfeb-87b8-ef88-11c3-7a13a352fd00@gmail.com>

On Fri, 16 Mar 2018 11:06:07 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 3/15/18 9:18 AM, Stefano Brivio wrote:
> >  trap cleanup EXIT
> >  
> > -test_pmtu_vti6_exception
> > +exitcode=0
> > +for name in ${tests}; do
> > +	echo "${name}: START"
> > +	eval test_${name}
> > +	ret=$?
> > +	cleanup
> > +
> > +	if [ $ret -eq 0 ]; then echo "${name}: FAIL"; exitcode=1  
> 
> ret = 0 == failure is counterintuitive for Linux.

However, in POSIX shell's AND and OR lists with function calls, a
function returning zero behaves in a similar fashion to a C function
evaluating to true, and a function returning non-zero behaves like a C
function evaluating to false [1]:

	a() {
		return 0
	}

	b() {
		return 1
	}

	a && echo this gets printed
	b && echo and this does not

This might be equally counter-intuitive for somebody. If one does a lot
of explicit error checking with early returns, my return convention is
also rather practical. E.g. in the setup() function I can do:

	eval setup_${arg} && echo "  ${arg} not supported" && return 0

instead of:

	eval setup_${arg} || { echo "  ${arg} not supported" && return 0; }

Still, I went ahead and reversed return codes in the whole script, and
it doesn't look *too* bad with the setup() function from patch 3/9. It
would have been quite ugly earlier.

So I don't have a strong preference. If you still prefer that I reverse
my return codes, I will re-spin the series (and probably I'll need a
first patch that reverses the existing logic, too).

> > +	elif [ $ret -eq 1 ]; then echo "${name}: PASS"
> > +	elif [ $ret -eq 2 ]; then echo "${name}: SKIP"  
> 
> I use printf in other scripts so that the pass/fail verdict lineup. e.g.,
> printf "        %-60s  [PASS]\n" "${name}"

I avoided 'printf' so far because it's not a built-in utility on some
shells (e.g. csh), being a "recent" addition to the POSIX Base
Specifications (issue 4, while 'echo' is from issue 2), and it might be
unavailable on some (embedded?) systems.

I don't have a strong preference about this either. It's a very minor
portability concern vs. a very minor readability improvement, what do
you think? 


[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_03_06

-- 
Stefano

^ permalink raw reply

* Re: [PATCH RFC 0/2] Add support for warnings to extack
From: Marcelo Ricardo Leitner @ 2018-03-16 19:27 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Aring, Jiri Pirko, Jakub Kicinski
In-Reply-To: <cover.1521226621.git.marcelo.leitner@gmail.com>

On Fri, Mar 16, 2018 at 04:23:08PM -0300, Marcelo Ricardo Leitner wrote:
> Currently we have the limitation that warnings cannot be reported though
> extack. For example, when tc flower failed to get offloaded but got
> installed on software datapath. The hardware failure is not fatal and
> thus extack is not even shared with the driver, so the error is simply
> omitted from any logging.
> 
> The idea here is to allow such kind of warnings to get through and be
> available for the sysadmin or the tool managing such commands (like Open
> vSwitch), so that if this happens, we will have such log message in a
> file later.
> 
> The first patch extends extack to support more than one message and with
> different log level (currently only error and warning). The second
> shares extack with the drivers regardless of skip_sw.
> 
> The iproute patch also follows.
> 
> This kernel change is backward compatible with older iproute because
> iproute will only process the last message, which should be the error
> one in case of failure, or a warning if it suceeded. 
> 
> The iproute change is compatible with older kernels because it will find
> only one message to be processed and will handle it properly.
> 
> With this patches, this is now possible:
> # tc qdisc add dev p7p1 ingress
> # tc filter add dev p7p1 parent ffff: protocol ip prio 1 flower \
> 	src_mac ec:13:db:00:00:00 dst_mac ec:14:c2:00:00:00 \
> 	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
> Warning: TC offload is disabled on net device.
> # echo $?
> 0

Will have more and better examples once we actually have more than one
message being added.

> 
> Marcelo Ricardo Leitner (2):
>   netlink: extend extack so it can carry more than one message
>   sched: pass extack through even if skip_sw is not set
> 
>  include/linux/netlink.h  | 50 +++++++++++++++++++++++++++++-------------------
>  include/net/pkt_cls.h    |  3 +--
>  net/netlink/af_netlink.c | 12 +++++++-----
>  3 files changed, 38 insertions(+), 27 deletions(-)
> 
> -- 
> 2.14.3
> 

^ permalink raw reply

* Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
From: Linus Torvalds @ 2018-03-16 19:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Florian Weimer, Kees Cook, Andrew Morton, Josh Poimboeuf,
	Rasmus Villemoes, Randy Dunlap, Miguel Ojeda, Ingo Molnar,
	David Laight, Ian Abbott, linux-input, linux-btrfs,
	Network Development, Linux Kernel Mailing List, Kernel Hardening
In-Reply-To: <20180316175502.GE30522@ZenIV.linux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 3975 bytes --]

On Fri, Mar 16, 2018 at 10:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> That's not them, that's C standard regarding ICE.

Yes. The C standard talks about "integer constant expression". I know.
It's come up in this very thread before.

The C standard at no point talks about - or forbids - "variable length
arrays". That never comes up in the whole standard, I checked.

So we are right now hindered by a _syntactic_ check, without any way
to have a _semantic_ check.

That's my problem. The warnings are misleading and imply semantics.

And apparently there is no way to actually check semantics.

> 1,100 is *not* a constant expression as far as the standard is concerned,

I very much know.

But it sure isn't "variable" either as far as the standard is
concerned, because the standard doesn't even have that concept (it
uses "variable" for argument numbers and for variables).

So being pedantic doesn't really change anything.

> Would you argue that in
> void foo(char c)
> {
>         int a[(c<<1) + 10 - c + 2 - c];

Yeah, I don't think that even counts as a constant value, even if it
can be optimized to one. I would not at all be unhppy to see
__builtin_constant_p() to return zero.

But that is very much different from the syntax issue.

So I would like to get some way to get both type-checking and constant
checking without the annoying syntax issue.

> expr, constant_expression is not a constant_expression.  And in
> this particular case the standard is not insane - the only reason
> for using that is typechecking and _that_ can be achieved without
> violating 6.6p6:
>         sizeof(expr,0) * 0 + ICE
> *is* an integer constant expression, and it gives you exact same
> typechecking.  So if somebody wants to play odd games, they can
> do that just fine, without complicating the logics for compilers...

Now that actually looks like a good trick. Maybe we can use that
instead of the comma expression that causes problems.

And using sizeof() to make sure that __builtin_choose_expression()
really gets an integer constant expression and that there should be no
ambiguity looks good.

Hmm.

This works for me, and I'm being *very* careful (those casts to
pointer types are inside that sizeof, because it's not an integral
type, and non-integral casts are not valid in an ICE either) but
somebody needs to check gcc-4.4:

  #define __typecheck(a,b) \
        (!!(sizeof((typeof(a)*)1==(typeof(b)*)1)))

  #define __no_side_effects(a,b) \
        (__builtin_constant_p(a)&&__builtin_constant_p(b))

  #define __safe_cmp(a,b) \
        (__typecheck(a,b) && __no_side_effects(a,b))

  #define __cmp(a,b,op) ((a)op(b)?(a):(b))

  #define __cmp_once(a,b,op) ({ \
        typeof(a) __a = (a);            \
        typeof(b) __b = (b);            \
        __cmp(__a,__b,op); })

  #define __careful_cmp(a,b,op) \
        __builtin_choose_expr(__safe_cmp(a,b), __cmp(a,b,op),
__cmp_once(a,b,op))

  #define min(a,b)              __careful_cmp(a,b,<)
  #define max(a,b)              __careful_cmp(a,b,>)
  #define min_t(t,a,b)  __careful_cmp((t)(a),(t)(b),<)
  #define max_t(t,a,b)  __careful_cmp((t)(a),(t)(b),>)

and yes, it does cause new warnings for that

    comparison between ‘enum tis_defaults’ and ‘enum tpm2_const’

in drivers/char/tpm/tpm_tis_core.h due to

   #define TIS_TIMEOUT_A_MAX       max(TIS_SHORT_TIMEOUT, TPM2_TIMEOUT_A)

but technically that warning is actually correct, I'm just confused
why gcc cares about the cast placement or something.

That warning is easy to fix by turning it into a "max_t(int, enum1,
enum2)' and that is technically the right thing to do, it's just not
warned about for some odd reason with the current code.

Kees - is there some online "gcc-4.4 checker" somewhere? This does
seem to work with my gcc. I actually tested some of those files you
pointed at now.

                  Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2965 bytes --]

 include/linux/kernel.h | 77 +++++++++++++-------------------------------------
 1 file changed, 20 insertions(+), 57 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..23c31bf1d7fb 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -787,37 +787,29 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
  * strict type-checking.. See the
  * "unnecessary" pointer comparison.
  */
-#define __min(t1, t2, min1, min2, x, y) ({		\
-	t1 min1 = (x);					\
-	t2 min2 = (y);					\
-	(void) (&min1 == &min2);			\
-	min1 < min2 ? min1 : min2; })
+#define __typecheck(a,b) \
+	(!!(sizeof((typeof(a)*)1==(typeof(b)*)1)))
 
-/**
- * min - return minimum of two values of the same or compatible types
- * @x: first value
- * @y: second value
- */
-#define min(x, y)					\
-	__min(typeof(x), typeof(y),			\
-	      __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),	\
-	      x, y)
+#define __no_side_effects(a,b) \
+	(__builtin_constant_p(a)&&__builtin_constant_p(b))
 
-#define __max(t1, t2, max1, max2, x, y) ({		\
-	t1 max1 = (x);					\
-	t2 max2 = (y);					\
-	(void) (&max1 == &max2);			\
-	max1 > max2 ? max1 : max2; })
+#define __safe_cmp(a,b) \
+	(__typecheck(a,b) && __no_side_effects(a,b))
 
-/**
- * max - return maximum of two values of the same or compatible types
- * @x: first value
- * @y: second value
- */
-#define max(x, y)					\
-	__max(typeof(x), typeof(y),			\
-	      __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),	\
-	      x, y)
+#define __cmp(a,b,op) ((a)op(b)?(a):(b))
+
+#define __cmp_once(a,b,op) ({	\
+	typeof(a) __a = (a);	\
+	typeof(b) __b = (b);	\
+	__cmp(__a,__b,op); })
+
+#define __careful_cmp(a,b,op) \
+	__builtin_choose_expr(__safe_cmp(a,b), __cmp(a,b,op), __cmp_once(a,b,op))
+
+#define min(a,b)	__careful_cmp(a,b,<)
+#define max(a,b)	__careful_cmp(a,b,>)
+#define min_t(t,a,b)	__careful_cmp((t)(a),(t)(b),<)
+#define max_t(t,a,b)	__careful_cmp((t)(a),(t)(b),>)
 
 /**
  * min3 - return minimum of three values
@@ -856,35 +848,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
  */
 #define clamp(val, lo, hi) min((typeof(val))max(val, lo), hi)
 
-/*
- * ..and if you can't take the strict
- * types, you can specify one yourself.
- *
- * Or not use min/max/clamp at all, of course.
- */
-
-/**
- * min_t - return minimum of two values, using the specified type
- * @type: data type to use
- * @x: first value
- * @y: second value
- */
-#define min_t(type, x, y)				\
-	__min(type, type,				\
-	      __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),	\
-	      x, y)
-
-/**
- * max_t - return maximum of two values, using the specified type
- * @type: data type to use
- * @x: first value
- * @y: second value
- */
-#define max_t(type, x, y)				\
-	__max(type, type,				\
-	      __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),	\
-	      x, y)
-
 /**
  * clamp_t - return a value clamped to a given range using a given type
  * @type: the type of variable to use

^ permalink raw reply related

* Re: [PATCH] net: ethernet: arc: Fix a potential memory leak if an optional regulator is deferred
From: David Miller @ 2018-03-16 19:27 UTC (permalink / raw)
  To: christophe.jaillet
  Cc: heiko, branislav, netdev, linux-arm-kernel, linux-rockchip,
	linux-kernel, kernel-janitors
In-Reply-To: <7fd57a4e7f7a41dccf55430cbc43de3b0a7f7826.1521061530.git.christophe.jaillet@wanadoo.fr>

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Wed, 14 Mar 2018 22:09:34 +0100

> diff --git a/drivers/net/ethernet/arc/emac_rockchip.c b/drivers/net/ethernet/arc/emac_rockchip.c
> index 16f9bee992fe..8ee9dfd0e363 100644
> --- a/drivers/net/ethernet/arc/emac_rockchip.c
> +++ b/drivers/net/ethernet/arc/emac_rockchip.c
> @@ -169,8 +169,10 @@ static int emac_rockchip_probe(struct platform_device *pdev)
>  	/* Optional regulator for PHY */
>  	priv->regulator = devm_regulator_get_optional(dev, "phy");
>  	if (IS_ERR(priv->regulator)) {
> -		if (PTR_ERR(priv->regulator) == -EPROBE_DEFER)
> -			return -EPROBE_DEFER;
> +		if (PTR_ERR(priv->regulator) == -EPROBE_DEFER) {
> +			ret = -EPROBE_DEFER;
> +			goto out_clk_disable;
> +		}

Please build test your changes.

There is no 'ret' variable in this function, perhaps you meant 'err'.

^ permalink raw reply

* [PATCH RFC 2/2] sched: pass extack through even if skip_sw is not set
From: Marcelo Ricardo Leitner @ 2018-03-16 19:23 UTC (permalink / raw)
  To: netdev; +Cc: marcelo.leitner, Alexander Aring, Jiri Pirko, Jakub Kicinski
In-Reply-To: <cover.1521226621.git.marcelo.leitner@gmail.com>

Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along. The idea was that hardware
failures are okay because the rule will get executed in software then.

This is not helpful in case one needs to understand why a certain rule
failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
it is triggering the same reason may be challenging.

This patch thus removes the condition and makes it always pass extack
structure along.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/pkt_cls.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e828d31be5dae0ae8c69016dfde50379296484aa..7cec393bb47974b48a6d510b8aa84534a7a98594 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -705,8 +705,7 @@ tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common,
 	cls_common->chain_index = tp->chain->index;
 	cls_common->protocol = tp->protocol;
 	cls_common->prio = tp->prio;
-	if (tc_skip_sw(flags))
-		cls_common->extack = extack;
+	cls_common->extack = extack;
 }
 
 enum tc_fl_command {
-- 
2.14.3

^ permalink raw reply related

* [PATCH RFC 1/2] netlink: extend extack so it can carry more than one message
From: Marcelo Ricardo Leitner @ 2018-03-16 19:23 UTC (permalink / raw)
  To: netdev; +Cc: marcelo.leitner, Alexander Aring, Jiri Pirko, Jakub Kicinski
In-Reply-To: <cover.1521226621.git.marcelo.leitner@gmail.com>

Currently extack can carry only a single message, which is usually the
error message.

This imposes a limitation on a more verbose error reporting. For
example, it's not able to carry warning messages together with the error
message, or 2 warning messages.

One use case is when dealing with tc offloading. If it failed to
offload, and also failed to install on software, it will report only
regarding the error about the software datapath, but the error from the
hardware path would also have been welcomed.

This patch extends extack so it now can carry up to 8 messages and these
messages may be prefixed similarly to printk/pr_warning, so thus they
can be tagged either was warning or error.

Fixed number of messages because supporting a dynamic limit seem to be
an overkill for the moment. Remember that this is not meant to be a
trace tool, but an error reporting one.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/linux/netlink.h  | 50 +++++++++++++++++++++++++++++-------------------
 net/netlink/af_netlink.c | 12 +++++++-----
 2 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index f3075d6c7e8229c999ab650537f1e3b11e1f457b..d9780836cf263d4c436d732e9b7a8cde0739ac23 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -71,43 +71,53 @@ netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg)
  * @cookie: cookie data to return to userspace (for success)
  * @cookie_len: actual cookie data length
  */
+#define NETLINK_MAX_EXTACK_MSGS 8
 struct netlink_ext_ack {
-	const char *_msg;
+	const char *_msg[NETLINK_MAX_EXTACK_MSGS];
 	const struct nlattr *bad_attr;
 	u8 cookie[NETLINK_MAX_COOKIE_LEN];
 	u8 cookie_len;
+	u8 _msg_count;
 };
 
-/* Always use this macro, this allows later putting the
- * message into a separate section or such for things
- * like translation or listing all possible messages.
- * Currently string formatting is not supported (due
- * to the lack of an output buffer.)
+/* Always use these macros, this allows later putting
+ * the message into a separate section or such for
+ * things like translation or listing all possible
+ * messages.  Currently string formatting is not
+ * supported (due to the lack of an output buffer.)
  */
-#define NL_SET_ERR_MSG(extack, msg) do {		\
-	static const char __msg[] = msg;		\
-	struct netlink_ext_ack *__extack = (extack);	\
-							\
-	if (__extack)					\
-		__extack->_msg = __msg;			\
+#define NL_SET_MSG(extack, msg) do {					\
+	static const char __msg[] = msg;				\
+	struct netlink_ext_ack *__extack = (extack);			\
+									\
+	if (__extack &&							\
+	    !WARN_ON(__extack->_msg_count >= NETLINK_MAX_EXTACK_MSGS))	\
+		__extack->_msg[__extack->_msg_count++] = __msg;		\
 } while (0)
 
+#define NL_SET_ERR_MSG(extack, msg)	NL_SET_MSG(extack, msg)
+#define NL_SET_WARN_MSG(extack, msg)	NL_SET_MSG(extack, KERN_WARNING msg)
+
 #define NL_SET_ERR_MSG_MOD(extack, msg)			\
 	NL_SET_ERR_MSG((extack), KBUILD_MODNAME ": " msg)
+#define NL_SET_WARN_MSG_MOD(extack, msg)		\
+	NL_SET_WARN_MSG((extack), KBUILD_MODNAME ": " msg)
+
 
 #define NL_SET_BAD_ATTR(extack, attr) do {		\
 	if ((extack))					\
 		(extack)->bad_attr = (attr);		\
 } while (0)
 
-#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) do {	\
-	static const char __msg[] = msg;		\
-	struct netlink_ext_ack *__extack = (extack);	\
-							\
-	if (__extack) {					\
-		__extack->_msg = __msg;			\
-		__extack->bad_attr = (attr);		\
-	}						\
+#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) do {			\
+	static const char __msg[] = msg;				\
+	struct netlink_ext_ack *__extack = (extack);			\
+									\
+	if (__extack) {							\
+		if (!WARN_ON(__extack->_msg_count >= NETLINK_MAX_EXTACK_MSGS)) \
+			__extack->_msg[__extack->_msg_count++] = __msg;	\
+		__extack->bad_attr = (attr);				\
+	}								\
 } while (0)
 
 extern void netlink_kernel_release(struct sock *sk);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 5d10dcfe6411e745e2abcda925fe7b6fabdac0fc..71b4ece5c760979e082b0c644af9d2f222e1b721 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2350,13 +2350,15 @@ void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err,
 	struct netlink_sock *nlk = nlk_sk(NETLINK_CB(in_skb).sk);
 	unsigned int flags = 0;
 	bool nlk_has_extack = nlk->flags & NETLINK_F_EXT_ACK;
+	int i;
 
 	/* Error messages get the original request appened, unless the user
 	 * requests to cap the error message, and get extra error data if
 	 * requested.
 	 */
-	if (nlk_has_extack && extack && extack->_msg)
-		tlvlen += nla_total_size(strlen(extack->_msg) + 1);
+	if (nlk_has_extack && extack)
+		for (i = 0; i < extack->_msg_count; i++)
+			tlvlen += nla_total_size(strlen(extack->_msg[i]) + 1);
 
 	if (err) {
 		if (!(nlk->flags & NETLINK_F_CAP_ACK))
@@ -2389,10 +2391,10 @@ void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err,
 	memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh->nlmsg_len : sizeof(*nlh));
 
 	if (nlk_has_extack && extack) {
-		if (extack->_msg) {
+		for (i = 0; i < extack->_msg_count; i++)
 			WARN_ON(nla_put_string(skb, NLMSGERR_ATTR_MSG,
-					       extack->_msg));
-		}
+					       extack->_msg[i]));
+
 		if (err) {
 			if (extack->bad_attr &&
 			    !WARN_ON((u8 *)extack->bad_attr < in_skb->data ||
-- 
2.14.3

^ permalink raw reply related

* [PATCH RFC iproute2] libnetlink: allow reading more than one message from extack
From: Marcelo Ricardo Leitner @ 2018-03-16 19:23 UTC (permalink / raw)
  To: netdev; +Cc: marcelo.leitner, Alexander Aring, Jiri Pirko, Jakub Kicinski
In-Reply-To: <cover.1521226621.git.marcelo.leitner@gmail.com>

This patch introduces support for reading more than one message from
extack's and to adjust their level (warning/error) accordingly.

Yes, there is a FIXME tag in the callback call for now.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 lib/libnetlink.c | 55 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 928de1dd16d84b7802c06d0a659f5d73e1bbcb4b..7c4c81e11e02ea857888190eb5e7a9e99d159bb3 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -43,9 +43,16 @@ static const enum mnl_attr_data_type extack_policy[NLMSGERR_ATTR_MAX + 1] = {
 	[NLMSGERR_ATTR_OFFS]	= MNL_TYPE_U32,
 };
 
+#define NETLINK_MAX_EXTACK_MSGS 8
+struct extack_args {
+	const struct nlattr *msg[NETLINK_MAX_EXTACK_MSGS];
+	const struct nlattr *offs;
+	int msg_count;
+};
+
 static int err_attr_cb(const struct nlattr *attr, void *data)
 {
-	const struct nlattr **tb = data;
+	struct extack_args *tb = data;
 	uint16_t type;
 
 	if (mnl_attr_type_valid(attr, NLMSGERR_ATTR_MAX) < 0) {
@@ -60,19 +67,23 @@ static int err_attr_cb(const struct nlattr *attr, void *data)
 		return MNL_CB_ERROR;
 	}
 
-	tb[type] = attr;
+	if (type == NLMSGERR_ATTR_OFFS)
+		tb->offs = attr;
+	else if (tb->msg_count < NETLINK_MAX_EXTACK_MSGS)
+		tb->msg[tb->msg_count++] = attr;
 	return MNL_CB_OK;
 }
 
 /* dump netlink extended ack error message */
 int nl_dump_ext_ack(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn)
 {
-	struct nlattr *tb[NLMSGERR_ATTR_MAX + 1] = {};
+	struct extack_args tb = {};
 	const struct nlmsgerr *err = mnl_nlmsg_get_payload(nlh);
 	const struct nlmsghdr *err_nlh = NULL;
 	unsigned int hlen = sizeof(*err);
-	const char *msg = NULL;
+	const char *msg[NETLINK_MAX_EXTACK_MSGS] = {};
 	uint32_t off = 0;
+	int ret, i;
 
 	/* no TLVs, nothing to do here */
 	if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS))
@@ -82,14 +93,14 @@ int nl_dump_ext_ack(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn)
 	if (!(nlh->nlmsg_flags & NLM_F_CAPPED))
 		hlen += mnl_nlmsg_get_payload_len(&err->msg);
 
-	if (mnl_attr_parse(nlh, hlen, err_attr_cb, tb) != MNL_CB_OK)
+	if (mnl_attr_parse(nlh, hlen, err_attr_cb, &tb) != MNL_CB_OK)
 		return 0;
 
-	if (tb[NLMSGERR_ATTR_MSG])
-		msg = mnl_attr_get_str(tb[NLMSGERR_ATTR_MSG]);
+	for (i = 0; i < NETLINK_MAX_EXTACK_MSGS && tb.msg[i]; i++)
+		msg[i] = mnl_attr_get_str(tb.msg[i]);
 
-	if (tb[NLMSGERR_ATTR_OFFS]) {
-		off = mnl_attr_get_u32(tb[NLMSGERR_ATTR_OFFS]);
+	if (tb.offs) {
+		off = mnl_attr_get_u32(tb.offs);
 
 		if (off > nlh->nlmsg_len) {
 			fprintf(stderr,
@@ -100,21 +111,35 @@ int nl_dump_ext_ack(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn)
 	}
 
 	if (errfn)
-		return errfn(msg, off, err_nlh);
+		return errfn(*msg, off, err_nlh); /* FIXME */
 
-	if (msg && *msg != '\0') {
+	ret = 0;
+	for (i = 0; i < NETLINK_MAX_EXTACK_MSGS && msg[i]; i++) {
 		bool is_err = !!err->error;
+		const char *_msg = msg[i];
+
+		/* Message tagging has precedence.
+		 * KERN_WARNING = ASCII Start Of Header ('\001') + '4'
+		 */
+		if (!strncmp(_msg, "\0014", 2)) {
+			is_err = false;
+			_msg += 2;
+		}
+		/* But we can't have Error if it didn't fail. */
+		if (is_err && !err->error)
+			is_err = 0;
 
 		fprintf(stderr, "%s: %s",
-			is_err ? "Error" : "Warning", msg);
-		if (msg[strlen(msg) - 1] != '.')
+			is_err ? "Error" : "Warning", _msg);
+		if (_msg[strlen(_msg) - 1] != '.')
 			fprintf(stderr, ".");
 		fprintf(stderr, "\n");
 
-		return is_err ? 1 : 0;
+		if (is_err)
+			ret = 1;
 	}
 
-	return 0;
+	return ret;
 }
 #else
 #warning "libmnl required for error support"
-- 
2.14.3

^ permalink raw reply related

* [PATCH RFC 0/2] Add support for warnings to extack
From: Marcelo Ricardo Leitner @ 2018-03-16 19:23 UTC (permalink / raw)
  To: netdev; +Cc: marcelo.leitner, Alexander Aring, Jiri Pirko, Jakub Kicinski

Currently we have the limitation that warnings cannot be reported though
extack. For example, when tc flower failed to get offloaded but got
installed on software datapath. The hardware failure is not fatal and
thus extack is not even shared with the driver, so the error is simply
omitted from any logging.

The idea here is to allow such kind of warnings to get through and be
available for the sysadmin or the tool managing such commands (like Open
vSwitch), so that if this happens, we will have such log message in a
file later.

The first patch extends extack to support more than one message and with
different log level (currently only error and warning). The second
shares extack with the drivers regardless of skip_sw.

The iproute patch also follows.

This kernel change is backward compatible with older iproute because
iproute will only process the last message, which should be the error
one in case of failure, or a warning if it suceeded. 

The iproute change is compatible with older kernels because it will find
only one message to be processed and will handle it properly.

With this patches, this is now possible:
# tc qdisc add dev p7p1 ingress
# tc filter add dev p7p1 parent ffff: protocol ip prio 1 flower \
	src_mac ec:13:db:00:00:00 dst_mac ec:14:c2:00:00:00 \
	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0

Marcelo Ricardo Leitner (2):
  netlink: extend extack so it can carry more than one message
  sched: pass extack through even if skip_sw is not set

 include/linux/netlink.h  | 50 +++++++++++++++++++++++++++++-------------------
 include/net/pkt_cls.h    |  3 +--
 net/netlink/af_netlink.c | 12 +++++++-----
 3 files changed, 38 insertions(+), 27 deletions(-)

-- 
2.14.3

^ permalink raw reply

* Re: [PATCH 0/2] net: phy: relax error checking when creating sysfs link netdev->phydev
From: Florian Fainelli @ 2018-03-16 19:11 UTC (permalink / raw)
  To: Grygorii Strashko, Andrew Lunn
  Cc: David S. Miller, netdev, Greg Kroah-Hartman, Sekhar Nori,
	linux-kernel, linux-omap
In-Reply-To: <e28f0101-cf3e-58f0-82f3-7e45bbe4bf74@ti.com>

On March 16, 2018 11:42:21 AM PDT, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>
>
>On 03/16/2018 12:34 PM, Florian Fainelli wrote:
>> 
>> 
>> On 03/16/2018 10:22 AM, Andrew Lunn wrote:
>>> On Wed, Mar 14, 2018 at 05:26:22PM -0500, Grygorii Strashko wrote:
>>>> Some ethernet drivers (like TI CPSW) may connect and manage >1 Net
>PHYs per
>>>> one netdevice, as result such drivers will produce warning during
>system
>>>> boot and fail to connect second phy to netdevice when PHYLIB
>framework
>>>> will try to create sysfs link netdev->phydev for second PHY
>>>> in phy_attach_direct(), because sysfs link with the same name has
>been
>>>> created already for the first PHY.
>>>> As result, second CPSW external port will became unusable.
>>>> This issue was introduced by commits:
>>>> 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for
>attached_dev/phydev"
>>>> a3995460491d ("net: phy: Relax error checking on
>sysfs_create_link()"
>>>
>>> I wonder if it would be better to add a flag to the phydev that
>>> indicates it is the second PHY connected to a MAC? Add a bit to
>>> phydrv->mdiodrv.flags. If that bit is set, don't create the sysfs
>>> file.
>> 
>> We could indeed do that, I am fine with Grygorii's approach though in
>> making the creation more silent and non fatal.
>
>The link phydev->netdev still can be created. And failure to create
>links
>is non fatal error in my opinion. 

They should not be fatal I agree, but it's nice to know when you are doing something wrong anyway.

>
>> 
>>>
>>> For 99% of MAC drivers, having two PHYs is an error, so we want to
>aid
>>> debug by reporting the sysfs error.
>> That is true, either way is fine with me, really.
>> 
>
>Error still will be reported, just not warning and it will be
>non-fatal.
>So, with this patch set it will be possible now to continue boot (NFS
>for example),
>connect to the system and gather logs.

The point Andrew is trying to make is that you address one particular failure in the PHY creation path when using > 1 PHY devices with a network device. Using a flag would easily allow us to be more future proof with other parts of PHYLIB  for your particular use case if that becomes necessary. This gives you less incentive to fix this use case though.

-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 2/2] net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit
From: David Miller @ 2018-03-16 19:08 UTC (permalink / raw)
  To: ysseung; +Cc: netdev, ncardwell, priyarjha, soheil, ycheng
In-Reply-To: <20180316175149.224421-1-ysseung@google.com>

From: Yousuk Seung <ysseung@google.com>
Date: Fri, 16 Mar 2018 10:51:49 -0700

> Set tp->snd_ssthresh to BDP upon STARTUP exit. This allows us
> to check if a BBR flow exited STARTUP and the BDP at the
> time of STARTUP exit with SCM_TIMESTAMPING_OPT_STATS. Since BBR does not
> use snd_ssthresh this fix has no impact on BBR's behavior.
> 
> Signed-off-by: Yousuk Seung <ysseung@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>

Applied.

Please submit future patch series with a proper cover letter.

^ permalink raw reply

* Re: [PATCH net-next 1/2] tcp: add snd_ssthresh stat in SCM_TIMESTAMPING_OPT_STATS
From: David Miller @ 2018-03-16 19:08 UTC (permalink / raw)
  To: ysseung; +Cc: netdev, ncardwell, priyarjha, soheil, ycheng
In-Reply-To: <20180316175107.224147-1-ysseung@google.com>

From: Yousuk Seung <ysseung@google.com>
Date: Fri, 16 Mar 2018 10:51:07 -0700

> This patch adds TCP_NLA_SND_SSTHRESH stat into SCM_TIMESTAMPING_OPT_STATS
> that reports tcp_sock.snd_ssthresh.
> 
> Signed-off-by: Yousuk Seung <ysseung@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>

Applied.

Please submit future patch series with a proper cover letter.

^ permalink raw reply

* Re: [PATCH net-next v3 1/1] selftests/txtimestamp: Add more configurable parameters
From: David Miller @ 2018-03-16 19:06 UTC (permalink / raw)
  To: vinicius.gomes; +Cc: netdev, randy.e.witt, eric.dumazet
In-Reply-To: <20180316174114.4575-2-vinicius.gomes@intel.com>

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Date: Fri, 16 Mar 2018 10:41:14 -0700

> Add a way to configure if poll() should wait forever for an event, the
> number of packets that should be sent for each and if there should be
> any delay between packets.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: Simplified napi poll
From: David Miller @ 2018-03-16 19:03 UTC (permalink / raw)
  To: felix.manlunas
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	intiyaz.basha
In-Reply-To: <20180316172131.GA2285@felix-thinkpad.cavium.com>

From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Fri, 16 Mar 2018 10:21:31 -0700

> From: Intiyaz Basha <intiyaz.basha@cavium.com>
> 
> 1) Moved interrupt enable related code from octeon_process_droq_poll_cmd()
>    to separate function octeon_enable_irq().
> 2) Removed wrapper function octeon_process_droq_poll_cmd(), and directlyi
>    using octeon_droq_process_poll_pkts().
> 3) Removed unused macros POLL_EVENT_XXX.
>  
> Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>

Looks good, applied, thank you.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox