Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 1/1] tc-testing: initial version of tunnel_key unit tests
From: Davide Caratti @ 2018-06-28 17:26 UTC (permalink / raw)
  To: Lucas Bates
  Cc: Keara Leibovitz, David Miller, Linux Kernel Network Developers,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
In-Reply-To: <CAMDBHYLmhtmtNR00yiK2i9Pt=r7Z-mRxjj7X=bR6JiDgKvCEVA@mail.gmail.com>

hello Lucas,

On Wed, 2018-06-27 at 14:50 -0400, Lucas Bates wrote:
> On Tue, Jun 26, 2018 at 10:51 AM, Davide Caratti <dcaratti@redhat.com> wrote:
> > On Tue, 2018-06-26 at 09:17 -0400, Keara Leibovitz wrote:
> > > Create unittests for the tc tunnel_key action.
> > > 
> > > 
> > > Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
> > > ---
> > >  .../tc-testing/tc-tests/actions/tunnel_key.json    | 676 +++++++++++++++++++++
> > >  1 file changed, 676 insertions(+)
> > >  create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> > > 
> > > diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> > > new file mode 100644
> > > index 000000000000..bfe522ac8177
> > 
> > hello Keara!
> > 
> > I think the 'teardown' stage in some of these tests should be reviewed.
> > Those that are meant to test invalid configurations (like dc6b) should
> > allow non-zero exit codes in the teardown stage, if the wrong
> > configuration is catched by the userspace TC tool, before talking to the
> > kernel.
> > 
> > Otherwise, those tests will fail when they are invoked one by one with the
> > act_tunnel_key module unloaded.
> > 
> 
> Hi Davide, I thought I'd weigh in here.

glad to hear your feedback!

> In the short term, I think this is reasonable, but it's not a feasible
> long-term solution.  Here's why:
> 
> Allowing non-zero exit codes on setup and teardown was a precaution
> that needed to be implemented as flushing actions in a freshly-booted
> kernel returned errors - certain actions would only allow you to flush
> after that action had been added.

I guess this is a desired behavior, and it's common to all TC actions:

# grep bpf /proc/modules
# tc actions flush action bpf
RTNETLINK answers: Invalid argument
We have an error flushing
# modprobe act_bpf
 tc actions flush action bpf
# echo $?
0

> But, doing this on so many test cases means that we can lose control
> of the test environment, especially since a lot of commands get copied
> between test cases.  One test's command under test becomes the next
> test case's setup command, etc.  This can cause false results and
> potentially waste a lot of time for someone trying to track down a
> bug... Or cause bugs to be missed.

I understand, you want to ensure that 'teardown' leaves the scenario in a
status which is the same as before the 'setup' phase. Whether or not this
happened successfully, it's sane not to ignore the error code: otherwise,
test X will perturbate test X+1.

> So, how to fix: we've had some discussions about it already.  Jiri had
> requested the addition of a config file (like the one at
> tools/testing/selftests/net/forwarding/config, and maybe an addition
> to the README for tdc for explanation.  People would then possibly be
> restricted to running one test case file at a time based on what
> options they had loaded...  This is still not ideal.

All this depends on where the error condition is catched. Some parameters
(like the invalid 'index' in act_bpf) are rejected within userspace TC,
some others (like the invalid bytecode for test f84a) in the kernel.

> I think the best possible fix is to add a new plugin for tdc to
> exclude tests based on the kernel config.  This would require the
> addition of a new optional field to the test case format, where any
> and all included modules required for the test to work would be
> listed.  The plugin would look at this information, do its best to
> determine if the currently running kernel supports it, and allows the
> test to run or be skipped as a result.
> 
> Let me show an example of the new field:
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> > > @@ -0,0 +1,676 @@
> > > 
> > 
> > ...
> > 
> > > +    {
> > > +        "id": "dc6b",
> > > +        "name": "Add tunnel_key set action with missing mandatory src_ip parameter",
> > > +        "category": [
> > > +            "actions",
> > > +            "tunnel_key"
> > > +        ],
> 
>                "reqModules": [
>                    "CONFIG_NET_ACT_TUNNEL_KEY"
>                ],
> > > +        "setup": [
> > > +            [
> > > +                "$TC actions flush action tunnel_key",
> > > +                0,
> > > +                1,
> > > +                255
> > > +            ]
> > > +        ],
> > > +        "cmdUnderTest": "$TC actions add action tunnel_key set dst_ip 20.20.20.2 id 100",
> > > +        "expExitCode": "255",
> > > +        "verifyCmd": "$TC actions list action tunnel_key",
> > > +        "matchPattern": "action order [0-9]+: tunnel_key set.*dst_ip 20.20.20.2.*key_id 100",
> > > +        "matchCount": "0",
> > > +        "teardown": [
> > > +            "$TC actions flush action tunnel_key"
> > > +        ]
> > > +    },
> 
> As we venture into more and more complicated tests, where different
> modules would start getting mixed together, this might be the most
> effective route.
> 
> This plugin will require some changes I've made to our local version
> of tdc that I've been testing out - they change the way tdc handles
> its test results, and also give it the ability to skip tests without
> affecting the rest of the test run.

LGTM.  To maintain the possibility to test automatic module loading based
on the action, we only need to add another test per module (tipically the
first one) where the 'reqModules' line is not present.

> Until I'm able to submit everything, I'd be OK with having Keara add
> the non-zero exit codes to the teardown on her tests.  In the meantime
> we'll get the README updated and config file added as well.
> 
> How does this sound?

it sounds good to me, but at this point we can also leave the code of
tunnel_key as-is. there are many other items failing in this script:

for act in $ACT; do
while IFS=':' read -r id _ ; do modprobe -r act_${act} ; sleep 1 ; [ -n "$id" ] && ./tdc.py -p /home/davide/iproute2/tc/tc -e $id ; done <<EOF
`./tdc.py -l | grep ${act}`
EOF
done

So, it's ok for me if they are fixed all together in a series, and I
volunteer for testing it when they land on netdev list.

regards,
-- 
davide

^ permalink raw reply

* Re: [PATCH] test_bpf: flag tests that cannot be jited on s390
From: Song Liu @ 2018-06-28 17:31 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: linux-s390, Networking, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20180627151921.15018-1-kleber.souza@canonical.com>

On Wed, Jun 27, 2018 at 8:19 AM, Kleber Sacilotto de Souza
<kleber.souza@canonical.com> wrote:
> Flag with FLAG_EXPECTED_FAIL the BPF_MAXINSNS tests that cannot be jited
> on s390 because they exceed BPF_SIZE_MAX and fail when
> CONFIG_BPF_JIT_ALWAYS_ON is set. Also set .expected_errcode to -ENOTSUPP
> so the tests pass in that case.
>
> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  lib/test_bpf.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/lib/test_bpf.c b/lib/test_bpf.c
> index 60aedc879361..08d3d59dca17 100644
> --- a/lib/test_bpf.c
> +++ b/lib/test_bpf.c
> @@ -5282,21 +5282,31 @@ static struct bpf_test tests[] = {
>         {       /* Mainly checking JIT here. */
>                 "BPF_MAXINSNS: Ctx heavy transformations",
>                 { },
> +#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_S390)
> +               CLASSIC | FLAG_EXPECTED_FAIL,
> +#else
>                 CLASSIC,
> +#endif
>                 { },
>                 {
>                         {  1, !!(SKB_VLAN_TCI & VLAN_TAG_PRESENT) },
>                         { 10, !!(SKB_VLAN_TCI & VLAN_TAG_PRESENT) }
>                 },
>                 .fill_helper = bpf_fill_maxinsns6,
> +               .expected_errcode = -ENOTSUPP,
>         },
>         {       /* Mainly checking JIT here. */
>                 "BPF_MAXINSNS: Call heavy transformations",
>                 { },
> +#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_S390)
> +               CLASSIC | FLAG_NO_DATA | FLAG_EXPECTED_FAIL,
> +#else
>                 CLASSIC | FLAG_NO_DATA,
> +#endif
>                 { },
>                 { { 1, 0 }, { 10, 0 } },
>                 .fill_helper = bpf_fill_maxinsns7,
> +               .expected_errcode = -ENOTSUPP,
>         },
>         {       /* Mainly checking JIT here. */
>                 "BPF_MAXINSNS: Jump heavy test",
> @@ -5347,18 +5357,28 @@ static struct bpf_test tests[] = {
>         {
>                 "BPF_MAXINSNS: exec all MSH",
>                 { },
> +#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_S390)
> +               CLASSIC | FLAG_EXPECTED_FAIL,
> +#else
>                 CLASSIC,
> +#endif
>                 { 0xfa, 0xfb, 0xfc, 0xfd, },
>                 { { 4, 0xababab83 } },
>                 .fill_helper = bpf_fill_maxinsns13,
> +               .expected_errcode = -ENOTSUPP,
>         },
>         {
>                 "BPF_MAXINSNS: ld_abs+get_processor_id",
>                 { },
> +#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_S390)
> +               CLASSIC | FLAG_EXPECTED_FAIL,
> +#else
>                 CLASSIC,
> +#endif
>                 { },
>                 { { 1, 0xbee } },
>                 .fill_helper = bpf_fill_ld_abs_get_processor_id,
> +               .expected_errcode = -ENOTSUPP,
>         },
>         /*
>          * LD_IND / LD_ABS on fragmented SKBs
> --
> 2.17.1
>

^ permalink raw reply

* Re: [patch net-next 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Cong Wang @ 2018-06-28 17:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: sridhar.samudrala, Jakub Kicinski,
	Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Simon Horman, john.hurley, David Ahern, mlxsw
In-Reply-To: <20180628061817.GB2413@nanopsycho>

On Wed, Jun 27, 2018 at 11:19 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Wed, Jun 27, 2018 at 07:04:32PM CEST, xiyou.wangcong@gmail.com wrote:
> >On Wed, Jun 27, 2018 at 9:46 AM Samudrala, Sridhar
> ><sridhar.samudrala@intel.com> wrote:
> >>
> >> On 6/27/2018 12:50 AM, Jiri Pirko wrote:
> >> > if you don't like "tc filter template add dev dummy0 ingress", how
> >> > about:
> >> > "tc template add dev dummy0 ingress ..."
> >> > "tc template add dev dummy0 ingress chain 22 ..."
> >> > that makes more sense I think.
> >
> >Better than 'tc filter template', but this doesn't reflect 'template'
> >is a template of tc filter, it could be an action etc., since it is in the
>
> It's a template of filter per chain. I don't understand how it could be
> an action...

It's because you have that in your mind from very beginning.

Think about what a new TC user's reaction is to 'tc template'
after he/she learns 'tc qdisc/filter/action'. It could be a template
of either of these 3 literately...


>
>
> >same position with 'tc action/filter/qdisc'.
> >
> >
> >>
> >> Isn't it possible to avoid introducing another keyword 'template',
> >>
> >> Can't we just do
> >>        tc chain add dev dummy0 ingress flower chain_index 0
> >> to create a chain that takes any types of flower rules with index 0
> >> and
> >>       tc chain add dev dummy0 ingress flower chain_index 22
> >>              dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
> >>       tc chain add dev dummy0 ingress flower chain_index 23
> >>              dst_ip 192.168.0.0/16
> >> to create 2 chains 22 and 23 that allow rules with specific fields.
> >
> >Sounds good too. Since filter chain can be shared by qdiscs,
> >a 'tc chain' sub-command makes sense, and would probably make
> >it easier to be shared.
>
> We don't have such specific object. It is implicit. We create it
> whenever someone users it. Either filter of chain. I don't like new "tc
> chain" object in cmdline. It really isn't.

I discussed this with you at netconf, it is similar to tc actions,
tc actions can be shared not because they are implicitly created,
but because they could be created alone via `tc action add ...`.

If you don't share the chain, it is perfectly fine to create it
implicitly. If you do share, as in current code base, making it
standalone is reasonable.

^ permalink raw reply

* Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Cong Wang @ 2018-06-28 17:38 UTC (permalink / raw)
  To: David Miller
  Cc: Jiri Pirko, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jakub Kicinski, Simon Horman, john.hurley, David Ahern, mlxsw
In-Reply-To: <20180628.134826.1129904828597717313.davem@davemloft.net>

On Wed, Jun 27, 2018 at 9:48 PM David Miller <davem@davemloft.net> wrote:
>
> This series doesn't apply cleanly to net-next, and also there seems to still
> be some discussion about how the iproute2 command line should look.
>

I am sure you know this, so just to be clear:

A redesign of "how iproute2 command line should look" usually means a
redesign in the kernel code too. Apparently, 'tc chaintemplate' is a new
subsystem under TC, while a 'tc filter template' is merely a new TC filter
attribute.

^ permalink raw reply

* [PATCH] can: peak_usb: mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-06-28 17:41 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller
  Cc: linux-can, netdev, linux-kernel, Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/can/usb/peak_usb/pcan_usb.c      | 1 +
 drivers/net/can/usb/peak_usb/pcan_usb_core.c | 1 +
 drivers/net/can/usb/peak_usb/pcan_usb_pro.c  | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/drivers/net/can/usb/peak_usb/pcan_usb.c b/drivers/net/can/usb/peak_usb/pcan_usb.c
index f530a80..13238a7 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb.c
@@ -423,6 +423,7 @@ static int pcan_usb_decode_error(struct pcan_usb_msg_context *mc, u8 n,
 			new_state = CAN_STATE_ERROR_WARNING;
 			break;
 		}
+		/* else: fall through */
 
 	case CAN_STATE_ERROR_WARNING:
 		if (n & PCAN_USB_ERROR_BUS_HEAVY) {
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
index 50e9114..611f9d3 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
@@ -353,6 +353,7 @@ static netdev_tx_t peak_usb_ndo_start_xmit(struct sk_buff *skb,
 		default:
 			netdev_warn(netdev, "tx urb submitting failed err=%d\n",
 				    err);
+			/* fall through */
 		case -ENOENT:
 			/* cable unplugged */
 			stats->tx_dropped++;
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
index 0105fbf..d516def 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
@@ -141,8 +141,10 @@ static int pcan_msg_add_rec(struct pcan_usb_pro_msg *pm, u8 id, ...)
 	switch (id) {
 	case PCAN_USBPRO_TXMSG8:
 		i += 4;
+		/* fall through */
 	case PCAN_USBPRO_TXMSG4:
 		i += 4;
+		/* fall through */
 	case PCAN_USBPRO_TXMSG0:
 		*pc++ = va_arg(ap, int);
 		*pc++ = va_arg(ap, int);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
From: Al Viro @ 2018-06-28 18:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP
In-Reply-To: <CA+55aFxy=GFVumxzOqv0LH9kr5aV10yccVUpFWGY3BX2pTsVHw@mail.gmail.com>

On Thu, Jun 28, 2018 at 09:40:21AM -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 7:21 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Note that for this removes the possibility of actually returning an
> > error before waiting in poll.  We could still do this with an ERR_PTR
> > in f_poll_head with a little bit of WRITE_ONCE/READ_ONCE magic, but
> > I'd like to defer that until actually required.
> 
> I'm still going to just revert the whole poll mess for now.
>
> It's still completely broken. This helps things, but it doesn't fix
> the fundamental issue: the new interface is strictly worse than the
> old interface ever was.
> 
> So Christoph, if you don't like the tradoitional ->poll() method, and
> you want something else for aio polling, I seriously will suggest that
> you introduce a new f_op for *that*. Don't mess with the existing
> ->poll() function, don't make select() and poll() use a slower and
> less capable function just because aio wants something else.
> 
> In other words, you need to see AIO as the less important case, not as
> the driver for this.
> 
> I also want to understand what the AIO race was, and what the problem
> with the poll() thing was. You claimed it was racy. I don't see it,
> and it was never ever explained in the whole series. I should never
> have pulled it in the first place if only for that reason, but I tend
> to trust Al when it comes to the VFS layer, so I did. My bad.

... and I should have pushed back harder, rather than getting sidetracked
into fixing the fs/aio.c-side races in this series ;-/

As for what can be salvaged out of the whole mess,
	* probably the most serious lesson is that INDIRECT CALLS ARE
COSTLY NOW and shouldn't be used lightly.  That had been slow to sink
in and we'd better all realize how much the things have changed.
That, BTW, has implications going a lot further than poll-related stuff -
e.g. the whole "we'll deal with revoke-like issues in procfs/sysfs/debugfs
by wrapping method calls" needs to be reexamined.  And in poll-related
area, note that we have a lot of indirection levels for socket poll.
	* having an optional pointer to wait_queue_head in struct file
is probably a good idea; a lot of ->poll() instances do have the same
form.  Even if sockets do not (and I'm not all that happy about the
solution in the latest series), the instances that do are common and
important enough.
	* a *LOT* of ->poll() instances only block in __pollwait()
called (indirectly) on the first pass.  If we annotate those in some
way (flag set in ->open(), presence of a new method, whatever) and
limit aio-poll to just those, we could deal with the aio side without
disrupting select/poll at all; just use (in place of __pollwait)
a different callback that wouldn't try to allocate poll_table_entry
and worked with the stuff embedded into aio-poll iocb.

How much do you intend to revert?  Untangling just the ->poll()-related
parts from the rest of changes in fs/aio.c will be unpleasant; we might
end up reverting the whole tail of the series and redoing the things
that are not poll-related on top of the revert... ;-/

^ permalink raw reply

* Re: [PATCH 2/3] net: phy: vitesse: Add support for VSC73xx
From: Linus Walleij @ 2018-06-28 18:23 UTC (permalink / raw)
  To: Woojung.Huh
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, UNGLinuxDriver,
	netdev, OpenWrt Development List, LEDE Development List,
	Gabor Juhos
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D40DD511D@CHN-SV-EXMX02.mchp-main.com>

On Fri, Jun 15, 2018 at 9:25 PM <Woojung.Huh@microchip.com> wrote:
> > On 06/14/2018 05:35 AM, Linus Walleij wrote:

> > Probably as good as it can get given the information you have access to.
> > Maybe the guys at Mircochip could help, adding them.
>
> Microchip have completed the acquisition of Microsemi last months. It will take some time to get access to the right data.
> Hope we can help soon.

I was in contact with ex-Microsemi product support, and they kindly gave
me access to all documentation for all the Vitesse switch chips.

If you can pry out more details like some about the magic switch
internals I'd be happy to update the patch later. I can even read
the VHDL or Verilog files if there is no proper documentation,
it has happened before :D

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
From: Jesus Sanchez-Palencia @ 2018-06-28 18:33 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	Levi Pearson, ilias.apalodimas, ivan.khoronzhuk, Miroslav Lichvar,
	Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko, Richard Cochran
In-Reply-To: <CAF=yD-KccMgPRL67YY9X3U6jP9E1WVc3C=b=wnAZvW8URs1FMQ@mail.gmail.com>

Hi Willem,


On 06/28/2018 07:40 AM, Willem de Bruijn wrote:
> On Thu, Jun 28, 2018 at 10:26 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>>
>> On Wed, Jun 27, 2018 at 6:08 PM Jesus Sanchez-Palencia
>> <jesus.sanchez-palencia@intel.com> wrote:
>>>
>>> From: Richard Cochran <rcochran@linutronix.de>
>>>
>>> This patch introduces SO_TXTIME. User space enables this option in
>>> order to pass a desired future transmit time in a CMSG when calling
>>> sendmsg(2). The argument to this socket option is a 6-bytes long struct
>>> defined as:
>>>
>>> struct sock_txtime {
>>>         clockid_t       clockid;
>>>         u16             flags;
>>> };
>>
>> clockid_t is __kernel_clockid_t is int is a variable length field.
>> Please use fixed length fields.
> 
> Sorry, int is fine, of course, and clockid_t is used between userspace and
> kernel already.


Great. So, in addition to the other feedback in sock.c, what I'm thinking here
for the v2 is:

- move this struct to and the flags definition (as enums) to
include/uapi/linux/net_tstamp.h;

- keep clockid as a clockid_t and increase flags to u32 since this already takes
8 bytes in total anyway;

- reduce sk_clockid and sk_txtime_flags from struct sock from a u16 to a u8 each.


Thanks,
Jesus



> 
>> Also, as MAX_CLOCKS is 16, only 4 bits are needed. A single u16
>> is probably sufficient as cmsg argument. To future proof, a u32 will
>> allow for more
>> than 4 flags. But in struct sock, 16 bits should be sufficient to
>> encode both clock id
>> and flags.

^ permalink raw reply

* [PATCH net-next] r8169: use standard debug output functions
From: Heiner Kallweit @ 2018-06-28 18:36 UTC (permalink / raw)
  To: David Miller, Realtek linux nic maintainers; +Cc: netdev@vger.kernel.org

I see no need to define a private debug output symbol, let's use the
standard debug output functions instead. In this context also remove
the deprecated PFX define.

The one assertion is wrong IMO anyway, this code path is used also
by chip version 01.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 36 ++++++++++------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 70c13cc2..21ffaf10 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -34,7 +34,6 @@
 
 #define RTL8169_VERSION "2.3LK-NAPI"
 #define MODULENAME "r8169"
-#define PFX MODULENAME ": "
 
 #define FIRMWARE_8168D_1	"rtl_nic/rtl8168d-1.fw"
 #define FIRMWARE_8168D_2	"rtl_nic/rtl8168d-2.fw"
@@ -56,19 +55,6 @@
 #define FIRMWARE_8107E_1	"rtl_nic/rtl8107e-1.fw"
 #define FIRMWARE_8107E_2	"rtl_nic/rtl8107e-2.fw"
 
-#ifdef RTL8169_DEBUG
-#define assert(expr) \
-	if (!(expr)) {					\
-		printk( "Assertion failed! %s,%s,%s,line=%d\n",	\
-		#expr,__FILE__,__func__,__LINE__);		\
-	}
-#define dprintk(fmt, args...) \
-	do { printk(KERN_DEBUG PFX fmt, ## args); } while (0)
-#else
-#define assert(expr) do {} while (0)
-#define dprintk(fmt, args...)	do {} while (0)
-#endif /* RTL8169_DEBUG */
-
 #define R8169_MSG_DEFAULT \
 	(NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
 
@@ -2552,7 +2538,7 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp,
 
 static void rtl8169_print_mac_version(struct rtl8169_private *tp)
 {
-	dprintk("mac_version = 0x%02x\n", tp->mac_version);
+	netif_dbg(tp, drv, tp->dev, "mac_version = 0x%02x\n", tp->mac_version);
 }
 
 struct phy_reg {
@@ -4409,8 +4395,6 @@ static void rtl_phy_work(struct rtl8169_private *tp)
 	struct timer_list *timer = &tp->timer;
 	unsigned long timeout = RTL8169_PHY_TIMEOUT;
 
-	assert(tp->mac_version > RTL_GIGA_MAC_VER_01);
-
 	if (tp->phy_reset_pending(tp)) {
 		/*
 		 * A busy loop could burn quite a few cycles on nowadays CPU.
@@ -4467,7 +4451,8 @@ static void rtl8169_init_phy(struct net_device *dev, struct rtl8169_private *tp)
 	rtl_hw_phy_config(dev);
 
 	if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
-		dprintk("Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
+		netif_dbg(tp, drv, dev,
+			  "Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
 		RTL_W8(tp, 0x82, 0x01);
 	}
 
@@ -4477,9 +4462,11 @@ static void rtl8169_init_phy(struct net_device *dev, struct rtl8169_private *tp)
 		pci_write_config_byte(tp->pci_dev, PCI_CACHE_LINE_SIZE, 0x08);
 
 	if (tp->mac_version == RTL_GIGA_MAC_VER_02) {
-		dprintk("Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
+		netif_dbg(tp, drv, dev,
+			  "Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
 		RTL_W8(tp, 0x82, 0x01);
-		dprintk("Set PHY Reg 0x0bh = 0x00h\n");
+		netif_dbg(tp, drv, dev,
+			  "Set PHY Reg 0x0bh = 0x00h\n");
 		rtl_writephy(tp, 0x0b, 0x0000); //w 0x0b 15 0 0
 	}
 
@@ -5171,8 +5158,8 @@ static void rtl_hw_start_8169(struct rtl8169_private *tp)
 
 	if (tp->mac_version == RTL_GIGA_MAC_VER_02 ||
 	    tp->mac_version == RTL_GIGA_MAC_VER_03) {
-		dprintk("Set MAC Reg C+CR Offset 0xe0. "
-			"Bit-3 and bit-14 MUST be 1\n");
+		netif_dbg(tp, drv, tp->dev,
+			  "Set MAC Reg C+CR Offset 0xe0. Bit 3 and Bit 14 MUST be 1\n");
 		tp->cp_cmd |= (1 << 14);
 	}
 
@@ -6017,8 +6004,9 @@ static void rtl_hw_start_8168(struct rtl8169_private *tp)
 		break;
 
 	default:
-		printk(KERN_ERR PFX "%s: unknown chipset (mac_version = %d).\n",
-		       tp->dev->name, tp->mac_version);
+		netif_err(tp, drv, tp->dev,
+			  "unknown chipset (mac_version = %d)\n",
+			  tp->mac_version);
 		break;
 	}
 }
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next] net: phy: realtek: add support for RTL8211
From: Heiner Kallweit @ 2018-06-28 18:46 UTC (permalink / raw)
  To: David Miller, Andrew Lunn, Florian Fainelli,
	Realtek linux nic maintainers
  Cc: netdev@vger.kernel.org

In preparation of adding phylib support to the r8169 driver we need
PHY drivers for all chip-internal PHY types. Fortunately almost all
of them are either supported by the Realtek PHY driver already or work
with the genphy driver.
Still missing is support for the PHY of RTL8169s, it requires a quirk
to properly support 100Mbit-fixed mode. The quirk was copied from
r8169 driver which copied it from the vendor driver.
Based on the PHYID the internal PHY seems to be a RTL8211.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/realtek.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 082fb40c..9757b162 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -128,6 +128,28 @@ static int rtl8211f_config_intr(struct phy_device *phydev)
 	return phy_write_paged(phydev, 0xa42, RTL821x_INER, val);
 }
 
+static int rtl8211_config_aneg(struct phy_device *phydev)
+{
+	int ret;
+
+	ret = genphy_config_aneg(phydev);
+	if (ret < 0)
+		return ret;
+
+	/* Quirk was copied from vendor driver. Unfortunately it includes no
+	 * description of the magic numbers.
+	 */
+	if (phydev->speed == SPEED_100 && phydev->autoneg == AUTONEG_DISABLE) {
+		phy_write(phydev, 0x17, 0x2138);
+		phy_write(phydev, 0x0e, 0x0260);
+	} else {
+		phy_write(phydev, 0x17, 0x2108);
+		phy_write(phydev, 0x0e, 0x0000);
+	}
+
+	return 0;
+}
+
 static int rtl8211f_config_init(struct phy_device *phydev)
 {
 	int ret;
@@ -178,6 +200,14 @@ static struct phy_driver realtek_drvs[] = {
 		.resume		= genphy_resume,
 		.read_page	= rtl821x_read_page,
 		.write_page	= rtl821x_write_page,
+	}, {
+		.phy_id		= 0x001cc910,
+		.name		= "RTL8211 Gigabit Ethernet",
+		.phy_id_mask	= 0x001fffff,
+		.features	= PHY_GBIT_FEATURES,
+		.config_aneg	= rtl8211_config_aneg,
+		.read_mmd	= &genphy_read_mmd_unsupported,
+		.write_mmd	= &genphy_write_mmd_unsupported,
 	}, {
 		.phy_id		= 0x001cc912,
 		.name		= "RTL8211B Gigabit Ethernet",
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
From: Jiong Wang @ 2018-06-28 19:02 UTC (permalink / raw)
  To: Song Liu
  Cc: Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, oss-drivers,
	Networking

On Tue, Jun 26, 2018 at 7:21 AM, Song Liu <liu.song.a23@gmail.com> wrote:
> On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> From: Jiong Wang <jiong.wang@netronome.com>

<snip>

>> +
>> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
>> +{
>> +       struct reciprocal_value_adv R;
>> +       u32 l, post_shift;
>> +       u64 mhigh, mlow;
>> +
>> +       l = fls(d - 1);
>> +       post_shift = l;
>> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
>> +        * MSB set. This case needs to be handled before calling
>> +        * "reciprocal_value_adv", please see the comment at
>> +        * include/linux/reciprocal_div.h.
>> +        */
>
> Shall we handle l == 32 case better? I guess the concern here is extra
> handling may
> slow down the fast path?

The implementation of "reciprocal_value_adv" hasn't considered l  ==
32 which will make the code more complex.

As described at the pseudo code about how to call
"reciprocal_value_adv" in include/linux/reciprocal_div.h, l == 32
means the MSB of dividend is set, so the result of unsigned
divisor/dividend could only be 0 or 1, so the divide result could be
easily get by a comparison then conditional move 0 or 1 to the result.

> If that's the case, we should at least add a WARNING on the slow path.

OK, I will add a pr_warn inside "reciprocal_value_adv" when l == 32 is
triggered.

Thanks,
Jiong

^ permalink raw reply

* [PATCH] net: usb: Mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-06-28 18:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-usb, netdev, linux-kernel, Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/usb/catc.c       | 1 +
 drivers/net/usb/cdc-phonet.c | 1 +
 drivers/net/usb/r8152.c      | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/drivers/net/usb/catc.c b/drivers/net/usb/catc.c
index 18d36df..424053b 100644
--- a/drivers/net/usb/catc.c
+++ b/drivers/net/usb/catc.c
@@ -869,6 +869,7 @@ static int catc_probe(struct usb_interface *intf, const struct usb_device_id *id
 		default:
 			dev_warn(&intf->dev,
 				 "Couldn't detect memory size, assuming 32k\n");
+			/* fall through */
 		case 0x87654321:
 			catc_set_reg(catc, TxBufCount, 4);
 			catc_set_reg(catc, RxBufCount, 16);
diff --git a/drivers/net/usb/cdc-phonet.c b/drivers/net/usb/cdc-phonet.c
index 3c40312..78b16eb 100644
--- a/drivers/net/usb/cdc-phonet.c
+++ b/drivers/net/usb/cdc-phonet.c
@@ -110,6 +110,7 @@ static void tx_complete(struct urb *req)
 	case -ECONNRESET:
 	case -ESHUTDOWN:
 		dev->stats.tx_aborted_errors++;
+		/* fall through */
 	default:
 		dev->stats.tx_errors++;
 		dev_dbg(&dev->dev, "TX error (%d)\n", status);
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 1fd1650..124211a 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1376,6 +1376,7 @@ static void intr_callback(struct urb *urb)
 	case -ECONNRESET:	/* unlink */
 	case -ESHUTDOWN:
 		netif_device_detach(tp->netdev);
+		/* fall through */
 	case -ENOENT:
 	case -EPROTO:
 		netif_info(tp, intr, tp->netdev,
@@ -2741,6 +2742,7 @@ static void r8153b_ups_en(struct r8152 *tp, bool enable)
 			r8152_mdio_write(tp, MII_BMCR, data);
 
 			data = r8153_phy_status(tp, PHY_STAT_LAN_ON);
+			/* fall through */
 
 		default:
 			if (data != PHY_STAT_LAN_ON)
-- 
2.7.4

^ permalink raw reply related

* linux-next: Signed-off-by missing for commits in the net tree
From: Stephen Rothwell @ 2018-06-28 19:12 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List, Shuah Khan,
	Daniel Díaz, Eli Cohen, Saeed Mahameed

[-- Attachment #1: Type: text/plain, Size: 328 bytes --]

Hi all,

Commit

  933e671f8cff ("selftests/net: Fix permissions for fib_tests.sh")

is missing a Signed-off-by from its author.

Commit

  a8d70a054a71 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if not being esw manager")

is missing a Signed-off-by from its committer.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
From: Linus Torvalds @ 2018-06-28 19:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP
In-Reply-To: <20180628181727.GH30522@ZenIV.linux.org.uk>

On Thu, Jun 28, 2018 at 11:17 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> As for what can be salvaged out of the whole mess,
>         * probably the most serious lesson is that INDIRECT CALLS ARE
> COSTLY NOW and shouldn't be used lightly.

Note that this has always been true, it's just _way_ more obvious now.

Indirect calls are costly not just because of the nasty 20+ cycle cost
of the stupid spectre overhead (which hopefully will be a thing of the
past in a few years as people upgrade), but because they are a pretty
basic barrier to optimization, both for compilers but also just for
_people_.

Look at a lot of vfs optimization we've done, like all the name
hashing optimization etc. We basically fall flat on our face if a
filesystem implements its own name hash function, not _just_ because
of the cost of the indirect function call, but because it suddenly
means that the filesystem is doing its own thing and all the clever
work we did to integrate name hashing with copying the name no longer
works.

So I really want to avoid indirect calls. And when they *do* happen, I
want to avoid the model where people think of them as low-level object
accessor functions - the C++ disease. I want indirect function calls
to make sense at a higher level, and do some real operation.

End result: I really despised the new poll() model. Yes, the
performance report was what made me *notice*, but then when I looked
at the code I went "No". Using an indirect call as some low-level
accessor function really is fundamentally wrong. Don't do it. It's
badly designed.

Out VFS operations are _high-level_ operations, where we do one single
call for a whole "read()" operation. "->poll()" used to be the same.
The new "->get_poll_head()" and "->poll_mask()" was just bad, bad,
bad.

>         * having an optional pointer to wait_queue_head in struct file
> is probably a good idea; a lot of ->poll() instances do have the same
> form.  Even if sockets do not (and I'm not all that happy about the
> solution in the latest series), the instances that do are common and
> important enough.

Right. I don't hate the poll wait-queue pointer. That said, I do hope
that we can simply write things so as to not even need it.

>         * a *LOT* of ->poll() instances only block in __pollwait()
> called (indirectly) on the first pass.

They are *all* supposed to do it.

The whole idea with "->poll()" is that the model of operation is:

 -  add_wait_queue() and return state on the first pass

 - on subsequent passes (or if somebody else already returned a state
that means we already know we're not going to wait), the poll table is
NULL, so you *CANNOT* add_wait_queue again, so you just return state.

Additionally, once _anybody_ has returned a "I already have the
event", we also clear the poll table queue, so subsequent accesses
will purely be for returning the poll state.

So I don't understand why you're asking for annotation. The whole "you
add yourself to the poll table" is *fundamentally* only done on the
first pass. You should never do it for later passes.

> How much do you intend to revert?  Untangling just the ->poll()-related
> parts from the rest of changes in fs/aio.c will be unpleasant; we might
> end up reverting the whole tail of the series and redoing the things
> that are not poll-related on top of the revert... ;-/

I pushed out my revert. It was fairly straightforward, it just
reverted all the poll_mask/get_poll_head changes, and the aio code
that depended on them.

Btw, I really don't understand the "aio has a race". The old poll()
interface was fundamentally race-free. There simply *is* no way to
race on it, exactly because of the whole "add yourself to the wait
queue first, then ask for state afterwards" model.  The model may be
_odd_, but it has literally worked well for a quarter century exactly
because it's really simple and fundamentally cannot have races.

So I think it's the aio code that needs fixing, not the polling code.

I do want that explanation for why AIO is _so_ special that it can
introduce a race in poll().

Because I suspect it's not so special, and it's just buggy. Maybe
Christoph didn't understand the two-phase model (how you call ->poll()
_twice_ - first to add yourself to the queue, later to check status).
Or maybe AIO interfaces are just shit (again!) and don't work right.

                   Linus

^ permalink raw reply

* Re: [PATCH net-next 1/1] tc-testing: initial version of tunnel_key unit tests
From: Keara Leibovitz @ 2018-06-28 19:58 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Lucas Bates, David Miller, Linux Kernel Network Developers,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
In-Reply-To: <2ebd38dd2f1772b73f53cb5ea6c562652bb127bb.camel@redhat.com>

>> Until I'm able to submit everything, I'd be OK with having Keara add
>> the non-zero exit codes to the teardown on her tests.  In the meantime
>> we'll get the README updated and config file added as well.
>>
>> How does this sound?
>
> it sounds good to me, but at this point we can also leave the code of
> tunnel_key as-is. there are many other items failing in this script:
>
> for act in $ACT; do
> while IFS=':' read -r id _ ; do modprobe -r act_${act} ; sleep 1 ; [ -n "$id" ] && ./tdc.py -p /home/davide/iproute2/tc/tc -e $id ; done <<EOF
> `./tdc.py -l | grep ${act}`
> EOF
> done
>
> So, it's ok for me if they are fixed all together in a series, and I
> volunteer for testing it when they land on netdev list.

Hi Davide,

I will add the non-zero exit codes and resubmit version 2 tomorrow.

Keara

^ permalink raw reply

* Re: [PATCH bpf-next 08/14] bpf: introduce the bpf_get_local_storage() helper function
From: kbuild test robot @ 2018-06-28 20:02 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: kbuild-all, netdev, linux-kernel, kernel-team, tj, Roman Gushchin,
	Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20180628164719.28215-9-guro@fb.com>

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

Hi Roman,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Roman-Gushchin/bpf-cgroup-local-storage/20180629-031527
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-randconfig-x017-201825 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   kernel/bpf/helpers.c: In function '____bpf_get_local_storage':
>> kernel/bpf/helpers.c:206:9: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     return (u64)this_cpu_read(bpf_cgroup_storage);
            ^

vim +206 kernel/bpf/helpers.c

   198	
   199	BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
   200	{
   201		/* map and flags arguments are not used now,
   202		 * but provide an ability to extend the API
   203		 * for other types of local storages.
   204		 * verifier checks that their values are correct.
   205		 */
 > 206		return (u64)this_cpu_read(bpf_cgroup_storage);
   207	}
   208	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27415 bytes --]

^ permalink raw reply

* [PATCH net-next] openvswitch: kernel datapath clone action
From: Yifeng Sun @ 2018-06-28 15:20 UTC (permalink / raw)
  To: azhou, pshelar, netdev; +Cc: Yifeng Sun
In-Reply-To: <1485881247-6683-3-git-send-email-azhou@ovn.org>

Add 'clone' action to kernel datapath by using existing functions.
When actions within clone don't modify the current flow, the flow
key is not cloned before executing clone actions.

This is a follow up patch for this incomplete work:
https://patchwork.ozlabs.org/patch/722096/

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 include/uapi/linux/openvswitch.h |  8 +++++
 net/openvswitch/actions.c        | 33 ++++++++++++++++++
 net/openvswitch/flow_netlink.c   | 73 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 863aaba..5de8583 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -625,6 +625,11 @@ struct sample_arg {
 				      * 'OVS_SAMPLE_ATTR_PROBABILITY'.
 				      */
 };
+
+#define OVS_CLONE_ATTR_EXEC      0   /* Specify an u32 value. When nonzero,
+				      * actions in clone will not change flow
+				      * keys. False otherwise.
+				      */
 #endif
 
 /**
@@ -840,6 +845,8 @@ struct ovs_action_push_eth {
  * @OVS_ACTION_ATTR_POP_NSH: pop the outermost NSH header off the packet.
  * @OVS_ACTION_ATTR_METER: Run packet through a meter, which may drop the
  * packet, or modify the packet (e.g., change the DSCP field).
+ * @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of
+ * actions without affecting the original packet and key.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -873,6 +880,7 @@ enum ovs_action_attr {
 	OVS_ACTION_ATTR_PUSH_NSH,     /* Nested OVS_NSH_KEY_ATTR_*. */
 	OVS_ACTION_ATTR_POP_NSH,      /* No argument. */
 	OVS_ACTION_ATTR_METER,        /* u32 meter ID. */
+	OVS_ACTION_ATTR_CLONE,        /* Nested OVS_CLONE_ATTR_*.  */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 30a5df2..4444e31 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1057,6 +1057,28 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
 			     clone_flow_key);
 }
 
+/* When 'last' is true, clone() should always consume the 'skb'.
+ * Otherwise, clone() should keep 'skb' intact regardless what
+ * actions are executed within clone().
+ */
+static int clone(struct datapath *dp, struct sk_buff *skb,
+		 struct sw_flow_key *key, const struct nlattr *attr,
+		 bool last)
+{
+	struct nlattr *actions;
+	struct nlattr *clone_arg;
+	int rem = nla_len(attr);
+	bool clone_flow_key;
+
+	/* The first action is always 'OVS_CLONE_ATTR_ARG'. */
+	clone_arg = nla_data(attr);
+	clone_flow_key = !nla_get_u32(clone_arg);
+	actions = nla_next(clone_arg, &rem);
+
+	return clone_execute(dp, skb, key, 0, actions, rem, last,
+			     clone_flow_key);
+}
+
 static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key,
 			 const struct nlattr *attr)
 {
@@ -1336,6 +1358,17 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 				consume_skb(skb);
 				return 0;
 			}
+			break;
+
+		case OVS_ACTION_ATTR_CLONE: {
+			bool last = nla_is_last(a, rem);
+
+			err = clone(dp, skb, key, a, last);
+			if (last)
+				return err;
+
+			break;
+		}
 		}
 
 		if (unlikely(err)) {
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 492ab0c..6938b56 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2460,6 +2460,40 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 	return 0;
 }
 
+static int validate_and_copy_clone(struct net *net,
+				   const struct nlattr *attr,
+				   const struct sw_flow_key *key,
+				   struct sw_flow_actions **sfa,
+				   __be16 eth_type, __be16 vlan_tci,
+				   bool log, bool last)
+{
+	int start, err;
+	u32 exec;
+
+	if (nla_len(attr) && nla_len(attr) < NLA_HDRLEN)
+		return -EINVAL;
+
+	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_CLONE, log);
+	if (start < 0)
+		return start;
+
+	exec = last || !actions_may_change_flow(attr);
+
+	err = ovs_nla_add_action(sfa, OVS_CLONE_ATTR_EXEC, &exec,
+				 sizeof(exec), log);
+	if (err)
+		return err;
+
+	err = __ovs_nla_copy_actions(net, attr, key, sfa,
+				     eth_type, vlan_tci, log);
+	if (err)
+		return err;
+
+	add_nested_action_end(*sfa, start);
+
+	return 0;
+}
+
 void ovs_match_init(struct sw_flow_match *match,
 		    struct sw_flow_key *key,
 		    bool reset_key,
@@ -2844,6 +2878,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			[OVS_ACTION_ATTR_PUSH_NSH] = (u32)-1,
 			[OVS_ACTION_ATTR_POP_NSH] = 0,
 			[OVS_ACTION_ATTR_METER] = sizeof(u32),
+			[OVS_ACTION_ATTR_CLONE] = (u32)-1,
 		};
 		const struct ovs_action_push_vlan *vlan;
 		int type = nla_type(a);
@@ -3033,6 +3068,18 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			/* Non-existent meters are simply ignored.  */
 			break;
 
+		case OVS_ACTION_ATTR_CLONE: {
+			bool last = nla_is_last(a, rem);
+
+			err = validate_and_copy_clone(net, a, key, sfa,
+						      eth_type, vlan_tci,
+						      log, last);
+			if (err)
+				return err;
+			skip_copy = true;
+			break;
+		}
+
 		default:
 			OVS_NLERR(log, "Unknown Action type %d", type);
 			return -EINVAL;
@@ -3111,6 +3158,26 @@ static int sample_action_to_attr(const struct nlattr *attr,
 	return err;
 }
 
+static int clone_action_to_attr(const struct nlattr *attr,
+				struct sk_buff *skb)
+{
+	struct nlattr *start;
+	int err = 0, rem = nla_len(attr);
+
+	start = nla_nest_start(skb, OVS_ACTION_ATTR_CLONE);
+	if (!start)
+		return -EMSGSIZE;
+
+	err = ovs_nla_put_actions(nla_data(attr), rem, skb);
+
+	if (err)
+		nla_nest_cancel(skb, start);
+	else
+		nla_nest_end(skb, start);
+
+	return err;
+}
+
 static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb)
 {
 	const struct nlattr *ovs_key = nla_data(a);
@@ -3199,6 +3266,12 @@ int ovs_nla_put_actions(const struct nlattr *attr, int len, struct sk_buff *skb)
 				return err;
 			break;
 
+		case OVS_ACTION_ATTR_CLONE:
+			err = clone_action_to_attr(a, skb);
+			if (err)
+				return err;
+			break;
+
 		default:
 			if (nla_put(skb, type, nla_len(a), nla_data(a)))
 				return -EMSGSIZE;
-- 
2.7.4

^ permalink raw reply related

* [net-next  1/1] tipc: eliminate buffer cloning in function tipc_msg_extract()
From: Jon Maloy @ 2018-06-28 20:25 UTC (permalink / raw)
  To: davem, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	jon.maloy, canh.d.luu, ying.xue, tipc-discussion

From: Tung Nguyen <tung.q.nguyen@dektech.com.au>

The function tipc_msg_extract() is using skb_clone() to clone inner
messages from a message bundle buffer. Although this method is safe,
it has an undesired effect that each buffer clone inherits the
true-size of the bundling buffer. As a result, the buffer clone
almost always ends up with being copied anyway by the message
validation function. This makes the cloning into a sub-optimization.

In this commit we take the consequence of this realization, and copy
each inner message to a separately allocated buffer up front in the
extraction function.

As a bonus we can now eliminate the two cases where we had to copy
re-routed packets that may potentially go out on the wire again.

Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/msg.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index b6c45dc..b618910 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -416,26 +416,31 @@ bool tipc_msg_bundle(struct sk_buff *skb, struct tipc_msg *msg, u32 mtu)
  */
 bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos)
 {
-	struct tipc_msg *msg;
-	int imsz, offset;
+	struct tipc_msg *hdr, *ihdr;
+	int imsz;
 
 	*iskb = NULL;
 	if (unlikely(skb_linearize(skb)))
 		goto none;
 
-	msg = buf_msg(skb);
-	offset = msg_hdr_sz(msg) + *pos;
-	if (unlikely(offset > (msg_size(msg) - MIN_H_SIZE)))
+	hdr = buf_msg(skb);
+	if (unlikely(*pos > (msg_data_sz(hdr) - MIN_H_SIZE)))
 		goto none;
 
-	*iskb = skb_clone(skb, GFP_ATOMIC);
-	if (unlikely(!*iskb))
+	ihdr = (struct tipc_msg *)(msg_data(hdr) + *pos);
+	imsz = msg_size(ihdr);
+
+	if ((*pos + imsz) > msg_data_sz(hdr))
 		goto none;
-	skb_pull(*iskb, offset);
-	imsz = msg_size(buf_msg(*iskb));
-	skb_trim(*iskb, imsz);
+
+	*iskb = tipc_buf_acquire(imsz, GFP_ATOMIC);
+	if (!*iskb)
+		goto none;
+
+	skb_copy_to_linear_data(*iskb, ihdr, imsz);
 	if (unlikely(!tipc_msg_validate(iskb)))
 		goto none;
+
 	*pos += align(imsz);
 	return true;
 none:
@@ -531,12 +536,6 @@ bool tipc_msg_reverse(u32 own_node,  struct sk_buff **skb, int err)
 		msg_set_hdr_sz(hdr, BASIC_H_SIZE);
 	}
 
-	if (skb_cloned(_skb) &&
-	    pskb_expand_head(_skb, BUF_HEADROOM, BUF_TAILROOM, GFP_ATOMIC))
-		goto exit;
-
-	/* reassign after skb header modifications */
-	hdr = buf_msg(_skb);
 	/* Now reverse the concerned fields */
 	msg_set_errcode(hdr, err);
 	msg_set_non_seq(hdr, 0);
@@ -595,10 +594,6 @@ bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err)
 	if (!skb_cloned(skb))
 		return true;
 
-	/* Unclone buffer in case it was bundled */
-	if (pskb_expand_head(skb, BUF_HEADROOM, BUF_TAILROOM, GFP_ATOMIC))
-		return false;
-
 	return true;
 }
 
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
From: Al Viro @ 2018-06-28 20:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP
In-Reply-To: <CA+55aFyPUpd3tWVqbTVXGKjMb42+g-9MCfvjXLpb0RGSOW8sbA@mail.gmail.com>

On Thu, Jun 28, 2018 at 12:31:14PM -0700, Linus Torvalds wrote:

> >         * a *LOT* of ->poll() instances only block in __pollwait()
> > called (indirectly) on the first pass.
> 
> They are *all* supposed to do it.

Sure, but...

static __poll_t binder_poll(struct file *filp,
                                struct poll_table_struct *wait)
{
        struct binder_proc *proc = filp->private_data;
        struct binder_thread *thread = NULL;
        bool wait_for_proc_work;

        thread = binder_get_thread(proc);
        if (!thread)
                return POLLERR;
...
static struct binder_thread *binder_get_thread(struct binder_proc *proc)
{
        struct binder_thread *thread;
        struct binder_thread *new_thread;

        binder_inner_proc_lock(proc);
        thread = binder_get_thread_ilocked(proc, NULL);
        binder_inner_proc_unlock(proc);
        if (!thread) {
                new_thread = kzalloc(sizeof(*thread), GFP_KERNEL);

And that's hardly unique - we have instances playing with timers,
allocations, whatnot.  Even straight mutex_lock(), as in
static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
{
        struct i915_perf_stream *stream = file->private_data;
        struct drm_i915_private *dev_priv = stream->dev_priv;
        __poll_t ret;

        mutex_lock(&dev_priv->perf.lock);
        ret = i915_perf_poll_locked(dev_priv, stream, file, wait);
        mutex_unlock(&dev_priv->perf.lock);

        return ret;
}

or
static __poll_t cec_poll(struct file *filp,
                             struct poll_table_struct *poll)
{
        struct cec_fh *fh = filp->private_data;
        struct cec_adapter *adap = fh->adap;
        __poll_t res = 0;

        if (!cec_is_registered(adap))
                return EPOLLERR | EPOLLHUP;
        mutex_lock(&adap->lock);
        if (adap->is_configured &&
            adap->transmit_queue_sz < CEC_MAX_MSG_TX_QUEUE_SZ)
                res |= EPOLLOUT | EPOLLWRNORM;
        if (fh->queued_msgs)
                res |= EPOLLIN | EPOLLRDNORM;
        if (fh->total_queued_events)
                res |= EPOLLPRI;
        poll_wait(filp, &fh->wait, poll);
        mutex_unlock(&adap->lock);
        return res;
}

etc.

> 
> The whole idea with "->poll()" is that the model of operation is:
> 
>  -  add_wait_queue() and return state on the first pass
> 
>  - on subsequent passes (or if somebody else already returned a state
> that means we already know we're not going to wait), the poll table is
> NULL, so you *CANNOT* add_wait_queue again, so you just return state.
> 
> Additionally, once _anybody_ has returned a "I already have the
> event", we also clear the poll table queue, so subsequent accesses
> will purely be for returning the poll state.
> 
> So I don't understand why you're asking for annotation. The whole "you
> add yourself to the poll table" is *fundamentally* only done on the
> first pass. You should never do it for later passes.

Sure.  Unfortunately, adding yourself to the poll table is not the only
way to block.  And a plenty of instances in e.g. drivers/media (where
the bulk of ->poll() instances lives) are very free with grabbing mutexes
as they go.

^ permalink raw reply

* [PATCH net] net/ipv6: Fix updates to prefix route
From: dsahern @ 2018-06-28 20:36 UTC (permalink / raw)
  To: netdev; +Cc: sowmini.varadhan, David Ahern

From: David Ahern <dsahern@gmail.com>

Sowmini reported that a recent commit broke prefix routes for linklocal
addresses. The newly added modify_prefix_route is attempting to add a
new prefix route when the ifp priority does not match the route metric
however the check needs to account for the default priority. In addition,
the route add fails because the route already exists, and then the delete
removes the one that exists. Flip the order to do the delete first.

Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/ipv6/addrconf.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c134286d6a41..91580c62bb86 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4528,6 +4528,7 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
 			       unsigned long expires, u32 flags)
 {
 	struct fib6_info *f6i;
+	u32 prio;
 
 	f6i = addrconf_get_prefix_route(&ifp->addr,
 					ifp->prefix_len,
@@ -4536,13 +4537,15 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
 	if (!f6i)
 		return -ENOENT;
 
-	if (f6i->fib6_metric != ifp->rt_priority) {
+	prio = ifp->rt_priority ? : IP6_RT_PRIO_ADDRCONF;
+	if (f6i->fib6_metric != prio) {
+		/* delete old one */
+		ip6_del_rt(dev_net(ifp->idev->dev), f6i);
+
 		/* add new one */
 		addrconf_prefix_route(&ifp->addr, ifp->prefix_len,
 				      ifp->rt_priority, ifp->idev->dev,
 				      expires, flags, GFP_KERNEL);
-		/* delete old one */
-		ip6_del_rt(dev_net(ifp->idev->dev), f6i);
 	} else {
 		if (!expires)
 			fib6_clean_expires(f6i);
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
From: Al Viro @ 2018-06-28 20:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP
In-Reply-To: <20180628202837.GI30522@ZenIV.linux.org.uk>

On Thu, Jun 28, 2018 at 09:28:37PM +0100, Al Viro wrote:

> Sure.  Unfortunately, adding yourself to the poll table is not the only
> way to block.  And a plenty of instances in e.g. drivers/media (where
> the bulk of ->poll() instances lives) are very free with grabbing mutexes
> as they go.

Speaking of obvious bogosities (a lot more so than a blocking allocation
several calls down into helper):

static __poll_t ca8210_test_int_poll(
        struct file *filp,
        struct poll_table_struct *ptable
)
{
        __poll_t return_flags = 0;
        struct ca8210_priv *priv = filp->private_data;

        poll_wait(filp, &priv->test.readq, ptable);
        if (!kfifo_is_empty(&priv->test.up_fifo))
                return_flags |= (EPOLLIN | EPOLLRDNORM);
        if (wait_event_interruptible(
                priv->test.readq,
                !kfifo_is_empty(&priv->test.up_fifo))) {
                return EPOLLERR;
        }
        return return_flags;
}

In a sense, poll_wait() had been an unfortunate choice of name - it's
really "arrange to be woken up by", not "wait for something now" and
while the outright confusion like above is rare, general "blocking
is OK in that area" is not rare at all...

^ permalink raw reply

* [net-next  1/1] tipc: optimize function tipc_node_timeout()
From: Jon Maloy @ 2018-06-28 20:39 UTC (permalink / raw)
  To: davem, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	jon.maloy, canh.d.luu, ying.xue, tipc-discussion

From: Tung Nguyen <tung.q.nguyen@dektech.com.au>

In single-link usage, the function tipc_node_timeout() still iterates
over the whole link array to handle each link. Given that the maximum
number of bearers are 3, there are 2 redundant iterations with lock
grab/release. Since this function is executing very frequently it makes
sense to optimize it.

This commit adds conditional checking to exit from the loop if the
known number of configured links has already been accessed.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/node.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index 6a44eb8..8972ca1 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -551,21 +551,23 @@ static void tipc_node_timeout(struct timer_list *t)
 	struct tipc_node *n = from_timer(n, t, timer);
 	struct tipc_link_entry *le;
 	struct sk_buff_head xmitq;
+	int remains = n->link_cnt;
 	int bearer_id;
 	int rc = 0;
 
 	__skb_queue_head_init(&xmitq);
 
-	for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) {
+	for (bearer_id = 0; remains && (bearer_id < MAX_BEARERS); bearer_id++) {
 		tipc_node_read_lock(n);
 		le = &n->links[bearer_id];
-		spin_lock_bh(&le->lock);
 		if (le->link) {
+			spin_lock_bh(&le->lock);
 			/* Link tolerance may change asynchronously: */
 			tipc_node_calculate_timer(n, le->link);
 			rc = tipc_link_timeout(le->link, &xmitq);
+			spin_unlock_bh(&le->lock);
+			remains--;
 		}
-		spin_unlock_bh(&le->lock);
 		tipc_node_read_unlock(n);
 		tipc_bearer_xmit(n->net, bearer_id, &xmitq, &le->maddr);
 		if (rc & TIPC_LINK_DOWN_EVT)
-- 
2.1.4

^ permalink raw reply related

* [PATCH RFC 2/2] net: phy: sfp: Add HWMON support for module sensors
From: Andrew Lunn @ 2018-06-28 20:41 UTC (permalink / raw)
  To: netdev, Florian Fainelli, Guenter Roeck, Russell King, vadimp,
	linux-hwmon
  Cc: Andrew Lunn
In-Reply-To: <1530218475-4369-1-git-send-email-andrew@lunn.ch>

SFP modules can contain a number of sensors. The EEPROM also contains
recommended alarm and critical values for each sensor, and indications
of if these have been exceeded. Export this information via
HWMON. Currently temperature, VCC, bias current, transmit power, and
possibly receiver power is supported.

The sensors in the modules can either return calibrate or uncalibrated
values. Uncalibrated values need to be manipulated, using coefficients
provided in the SFP EEPROM. Uncalibrated receive power values require
floating point maths in order to calibrate them. Performing this in
the kernel is hard. So if the SFP module indicates it uses
uncalibrated values, RX power is not made available.

With this hwmon device, it is possible to view the sensor values using
lm-sensors programs:

in0:          +3.29 V  (crit min =  +2.90 V, min =  +3.00 V)
                       (max =  +3.60 V, crit max =  +3.70 V)
temp1:        +33.0°C  (low  =  -5.0°C, high = +80.0°C)
                       (crit low = -10.0°C, crit = +85.0°C)
power1:      1000.00 nW (max = 794.00 uW, min =  50.00 uW)  ALARM (LCRIT)
                       (lcrit =  40.00 uW, crit = 1000.00 uW)
curr1:        +0.00 A  (crit min =  +0.00 A, min =  +0.00 A)  ALARM (LCRIT, MIN)
                       (max =  +0.01 A, crit max =  +0.01 A)

The scaling sensors performs on the bias current is not particularly
good. The raw values are more useful:

curr1:
  curr1_input: 0.000
  curr1_min: 0.002
  curr1_max: 0.010
  curr1_lcrit: 0.000
  curr1_crit: 0.011
  curr1_min_alarm: 1.000
  curr1_max_alarm: 0.000
  curr1_lcrit_alarm: 1.000
  curr1_crit_alarm: 0.000

In order to keep the I2C overhead to a minimum, the constant values,
such as limits and calibration coefficients are read once at module
insertion time. Thus only reading *_input and *_alarm properties
requires i2c read operations.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/sfp.c | 732 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/sfp.h   |  72 ++++-
 2 files changed, 803 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index c4c92db86dfa..30428e94b694 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -1,5 +1,7 @@
+#include <linux/ctype.h>
 #include <linux/delay.h>
 #include <linux/gpio/consumer.h>
+#include <linux/hwmon.h>
 #include <linux/i2c.h>
 #include <linux/interrupt.h>
 #include <linux/jiffies.h>
@@ -131,6 +133,12 @@ struct sfp {
 	unsigned int sm_retries;
 
 	struct sfp_eeprom_id id;
+#ifdef CONFIG_HWMON
+	struct sfp_diag diag;
+	struct device *hwmon_dev;
+	char *hwmon_name;
+#endif
+
 };
 
 static bool sff_module_supported(const struct sfp_eeprom_id *id)
@@ -316,6 +324,724 @@ static unsigned int sfp_check(void *buf, size_t len)
 	return check;
 }
 
+/* hwmon */
+#ifdef CONFIG_HWMON
+static umode_t sfp_hwmon_is_visible(const void *data,
+				    enum hwmon_sensor_types type,
+				    u32 attr, int channel)
+{
+	const struct sfp *sfp = data;
+
+	switch (type) {
+	case hwmon_temp:
+		switch (attr) {
+		case hwmon_temp_input:
+		case hwmon_temp_min_alarm:
+		case hwmon_temp_max_alarm:
+		case hwmon_temp_lcrit_alarm:
+		case hwmon_temp_crit_alarm:
+		case hwmon_temp_min:
+		case hwmon_temp_max:
+		case hwmon_temp_lcrit:
+		case hwmon_temp_crit:
+			return 0444;
+		default:
+			return 0;
+		}
+	case hwmon_in:
+		switch (attr) {
+		case hwmon_in_input:
+		case hwmon_in_min_alarm:
+		case hwmon_in_max_alarm:
+		case hwmon_in_lcrit_alarm:
+		case hwmon_in_crit_alarm:
+		case hwmon_in_min:
+		case hwmon_in_max:
+		case hwmon_in_lcrit:
+		case hwmon_in_crit:
+			return 0444;
+		default:
+			return 0;
+		}
+	case hwmon_curr:
+		switch (attr) {
+		case hwmon_curr_input:
+		case hwmon_curr_min_alarm:
+		case hwmon_curr_max_alarm:
+		case hwmon_curr_lcrit_alarm:
+		case hwmon_curr_crit_alarm:
+		case hwmon_curr_min:
+		case hwmon_curr_max:
+		case hwmon_curr_lcrit:
+		case hwmon_curr_crit:
+			return 0444;
+		default:
+			return 0;
+		}
+	case hwmon_power:
+		/* External calibration of receive power requires
+		 * floating point arithmetic. Doing that in the kernel
+		 * is not easy, so just skip it. If the module does
+		 * not require external calibration, we can however
+		 * show receiver power, since FP is then not needed.
+		 */
+		if (sfp->id.ext.diagmon & SFP_DIAGMON_EXT_CAL &&
+		    channel == 1)
+			return 0;
+		switch (attr) {
+		case hwmon_power_input:
+		case hwmon_power_min_alarm:
+		case hwmon_power_max_alarm:
+		case hwmon_power_lcrit_alarm:
+		case hwmon_power_crit_alarm:
+		case hwmon_power_min:
+		case hwmon_power_max:
+		case hwmon_power_lcrit:
+		case hwmon_power_crit:
+			return 0444;
+		default:
+			return 0;
+		}
+	default:
+		return 0;
+	}
+}
+
+/* Sensors values are stored as two bytes, MSB second */
+static int sfp_hwmon_read_sensor(struct sfp *sfp, int reg, long *value)
+{
+	u8 val[2];
+	int err;
+
+	err = sfp_read(sfp, true, reg, val, 2);
+	if (err < 0)
+		return err;
+
+	*value = val[0] << 8 | val[1];
+
+	return 0;
+}
+
+static void sfp_hwmon_to_rx_power(long *value)
+{
+	*value /= 100;
+}
+
+static void sfp_hwmon_calibrate(struct sfp *sfp, unsigned int slope, int offset,
+				long *value)
+{
+	if (sfp->id.ext.diagmon & SFP_DIAGMON_EXT_CAL)
+		*value = (*value * slope) / 256 + offset;
+}
+
+static void sfp_hwmon_calibrate_temp(struct sfp *sfp, long *value)
+{
+	sfp_hwmon_calibrate(sfp, be16_to_cpu(sfp->diag.cal_t_slope),
+			    be16_to_cpu(sfp->diag.cal_t_offset), value);
+
+	if (*value >=  0x8000)
+		*value -= 0x10000;
+
+	*value = *value * 1000 / 256;
+}
+
+static void sfp_hwmon_calibrate_vcc(struct sfp *sfp, long *value)
+{
+	sfp_hwmon_calibrate(sfp, be16_to_cpu(sfp->diag.cal_v_slope),
+			    be16_to_cpu(sfp->diag.cal_v_offset), value);
+
+	*value /= 10;
+}
+
+static void sfp_hwmon_calibrate_bias(struct sfp *sfp, long *value)
+{
+	sfp_hwmon_calibrate(sfp, be16_to_cpu(sfp->diag.cal_txi_slope),
+			    be16_to_cpu(sfp->diag.cal_txi_offset), value);
+
+	*value /= 500;
+}
+
+static void sfp_hwmon_calibrate_tx_power(struct sfp *sfp, long *value)
+{
+	sfp_hwmon_calibrate(sfp, be16_to_cpu(sfp->diag.cal_txpwr_slope),
+			    be16_to_cpu(sfp->diag.cal_txpwr_offset), value);
+
+	*value /= 10;
+}
+
+static int sfp_hwmon_read_temp(struct sfp *sfp, int reg, long *value)
+{
+	int err;
+
+	err = sfp_hwmon_read_sensor(sfp, reg, value);
+	if (err < 0)
+		return err;
+
+	sfp_hwmon_calibrate_temp(sfp, value);
+
+	return 0;
+}
+
+static int sfp_hwmon_read_vcc(struct sfp *sfp, int reg, long *value)
+{
+	int err;
+
+	err = sfp_hwmon_read_sensor(sfp, reg, value);
+	if (err < 0)
+		return err;
+
+	sfp_hwmon_calibrate_vcc(sfp, value);
+
+	return 0;
+}
+
+static int sfp_hwmon_read_bias(struct sfp *sfp, int reg, long *value)
+{
+	int err;
+
+	err = sfp_hwmon_read_sensor(sfp, reg, value);
+	if (err < 0)
+		return err;
+
+	sfp_hwmon_calibrate_bias(sfp, value);
+
+	return 0;
+}
+
+static int sfp_hwmon_read_tx_power(struct sfp *sfp, int reg, long *value)
+{
+	int err;
+
+	err = sfp_hwmon_read_sensor(sfp, reg, value);
+	if (err < 0)
+		return err;
+
+	sfp_hwmon_calibrate_tx_power(sfp, value);
+
+	return 0;
+}
+
+static int sfp_hwmon_read_rx_power(struct sfp *sfp, int reg, long *value)
+{
+	int err;
+
+	err = sfp_hwmon_read_sensor(sfp, reg, value);
+	if (err < 0)
+		return err;
+
+	sfp_hwmon_to_rx_power(value);
+
+	return 0;
+}
+
+static int sfp_hwmon_temp(struct sfp *sfp, u32 attr, long *value)
+{
+	u8 status;
+	int err;
+
+	switch (attr) {
+	case hwmon_temp_input:
+		return sfp_hwmon_read_temp(sfp, SFP_TEMP, value);
+
+	case hwmon_temp_lcrit:
+		*value = be16_to_cpu(sfp->diag.temp_low_alarm);
+		sfp_hwmon_calibrate_temp(sfp, value);
+		return 0;
+
+	case hwmon_temp_min:
+		*value = be16_to_cpu(sfp->diag.temp_low_warn);
+		sfp_hwmon_calibrate_temp(sfp, value);
+		return 0;
+	case hwmon_temp_max:
+		*value = be16_to_cpu(sfp->diag.temp_high_warn);
+		sfp_hwmon_calibrate_temp(sfp, value);
+		return 0;
+
+	case hwmon_temp_crit:
+		*value = be16_to_cpu(sfp->diag.temp_high_alarm);
+		sfp_hwmon_calibrate_temp(sfp, value);
+		return 0;
+
+	case hwmon_temp_lcrit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_TEMP_LOW);
+		return 0;
+
+	case hwmon_temp_min_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_TEMP_LOW);
+		return 0;
+
+	case hwmon_temp_max_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_TEMP_HIGH);
+		return 0;
+
+	case hwmon_temp_crit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_TEMP_HIGH);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static int sfp_hwmon_vcc(struct sfp *sfp, u32 attr, long *value)
+{
+	u8 status;
+	int err;
+
+	switch (attr) {
+	case hwmon_in_input:
+		return sfp_hwmon_read_vcc(sfp, SFP_VCC, value);
+
+	case hwmon_in_lcrit:
+		*value = be16_to_cpu(sfp->diag.volt_low_alarm);
+		sfp_hwmon_calibrate_vcc(sfp, value);
+		return 0;
+
+	case hwmon_in_min:
+		*value = be16_to_cpu(sfp->diag.volt_low_warn);
+		sfp_hwmon_calibrate_vcc(sfp, value);
+		return 0;
+
+	case hwmon_in_max:
+		*value = be16_to_cpu(sfp->diag.volt_high_warn);
+		sfp_hwmon_calibrate_vcc(sfp, value);
+		return 0;
+
+	case hwmon_in_crit:
+		*value = be16_to_cpu(sfp->diag.volt_high_alarm);
+		sfp_hwmon_calibrate_vcc(sfp, value);
+		return 0;
+
+	case hwmon_in_lcrit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_VCC_LOW);
+		return 0;
+
+	case hwmon_in_min_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_VCC_LOW);
+		return 0;
+
+	case hwmon_in_max_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_VCC_HIGH);
+		return 0;
+
+	case hwmon_in_crit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_VCC_HIGH);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static int sfp_hwmon_bias(struct sfp *sfp, u32 attr, long *value)
+{
+	u8 status;
+	int err;
+
+	switch (attr) {
+	case hwmon_curr_input:
+		return sfp_hwmon_read_bias(sfp, SFP_TX_BIAS, value);
+
+	case hwmon_curr_lcrit:
+		*value = be16_to_cpu(sfp->diag.bias_low_alarm);
+		sfp_hwmon_calibrate_bias(sfp, value);
+		return 0;
+
+	case hwmon_curr_min:
+		*value = be16_to_cpu(sfp->diag.bias_low_warn);
+		sfp_hwmon_calibrate_bias(sfp, value);
+		return 0;
+
+	case hwmon_curr_max:
+		*value = be16_to_cpu(sfp->diag.bias_high_warn);
+		sfp_hwmon_calibrate_bias(sfp, value);
+		return 0;
+
+	case hwmon_curr_crit:
+		*value = be16_to_cpu(sfp->diag.bias_high_alarm);
+		sfp_hwmon_calibrate_bias(sfp, value);
+		return 0;
+
+	case hwmon_curr_lcrit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_TX_BIAS_LOW);
+		return 0;
+
+	case hwmon_curr_min_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_TX_BIAS_LOW);
+		return 0;
+
+	case hwmon_curr_max_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_TX_BIAS_HIGH);
+		return 0;
+
+	case hwmon_curr_crit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_TX_BIAS_HIGH);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static int sfp_hwmon_tx_power(struct sfp *sfp, u32 attr, long *value)
+{
+	u8 status;
+	int err;
+
+	switch (attr) {
+	case hwmon_power_input:
+		return sfp_hwmon_read_tx_power(sfp, SFP_TX_POWER, value);
+
+	case hwmon_power_lcrit:
+		*value = be16_to_cpu(sfp->diag.txpwr_low_alarm);
+		sfp_hwmon_calibrate_tx_power(sfp, value);
+		return 0;
+
+	case hwmon_power_min:
+		*value = be16_to_cpu(sfp->diag.txpwr_low_warn);
+		sfp_hwmon_calibrate_tx_power(sfp, value);
+		return 0;
+
+	case hwmon_power_max:
+		*value = be16_to_cpu(sfp->diag.txpwr_high_warn);
+		sfp_hwmon_calibrate_tx_power(sfp, value);
+		return 0;
+
+	case hwmon_power_crit:
+		*value = be16_to_cpu(sfp->diag.txpwr_high_alarm);
+		sfp_hwmon_calibrate_tx_power(sfp, value);
+		return 0;
+
+	case hwmon_power_lcrit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_TXPWR_LOW);
+		return 0;
+
+	case hwmon_power_min_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_TXPWR_LOW);
+		return 0;
+
+	case hwmon_power_max_alarm:
+		err = sfp_read(sfp, true, SFP_WARN0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN0_TXPWR_HIGH);
+		return 0;
+
+	case hwmon_power_crit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM0, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM0_TXPWR_HIGH);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static int sfp_hwmon_rx_power(struct sfp *sfp, u32 attr, long *value)
+{
+	u8 status;
+	int err;
+
+	switch (attr) {
+	case hwmon_power_input:
+		return sfp_hwmon_read_rx_power(sfp, SFP_RX_POWER, value);
+
+	case hwmon_power_lcrit:
+		*value = be16_to_cpu(sfp->diag.rxpwr_low_alarm);
+		sfp_hwmon_to_rx_power(value);
+		return 0;
+
+	case hwmon_power_min:
+		*value = be16_to_cpu(sfp->diag.rxpwr_low_warn);
+		sfp_hwmon_to_rx_power(value);
+		return 0;
+
+	case hwmon_power_max:
+		*value = be16_to_cpu(sfp->diag.rxpwr_high_warn);
+		sfp_hwmon_to_rx_power(value);
+		return 0;
+
+	case hwmon_power_crit:
+		*value = be16_to_cpu(sfp->diag.rxpwr_high_alarm);
+		sfp_hwmon_to_rx_power(value);
+		return 0;
+
+	case hwmon_power_lcrit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM1, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM1_RXPWR_LOW);
+		return 0;
+
+	case hwmon_power_min_alarm:
+		err = sfp_read(sfp, true, SFP_WARN1, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN1_RXPWR_LOW);
+		return 0;
+
+	case hwmon_power_max_alarm:
+		err = sfp_read(sfp, true, SFP_WARN1, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_WARN1_RXPWR_HIGH);
+		return 0;
+
+	case hwmon_power_crit_alarm:
+		err = sfp_read(sfp, true, SFP_ALARM1, &status, sizeof(status));
+		if (err < 0)
+			return err;
+
+		*value = !!(status & SFP_ALARM1_RXPWR_HIGH);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static int sfp_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
+			  u32 attr, int channel, long *value)
+{
+	struct sfp *sfp = dev_get_drvdata(dev);
+
+	switch (type) {
+	case hwmon_temp:
+		return sfp_hwmon_temp(sfp, attr, value);
+	case hwmon_in:
+		return sfp_hwmon_vcc(sfp, attr, value);
+	case hwmon_curr:
+		return sfp_hwmon_bias(sfp, attr, value);
+	case hwmon_power:
+		switch (channel) {
+		case 0:
+			return sfp_hwmon_tx_power(sfp, attr, value);
+		case 1:
+			return sfp_hwmon_rx_power(sfp, attr, value);
+		default:
+			return -EOPNOTSUPP;
+		}
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static const struct hwmon_ops sfp_hwmon_ops = {
+	.is_visible = sfp_hwmon_is_visible,
+	.read = sfp_hwmon_read,
+};
+
+static u32 sfp_hwmon_chip_config[] = {
+	HWMON_C_REGISTER_TZ,
+	0,
+};
+
+static const struct hwmon_channel_info sfp_hwmon_chip = {
+	.type = hwmon_chip,
+	.config = sfp_hwmon_chip_config,
+};
+
+static u32 sfp_hwmon_temp_config[] = {
+	HWMON_T_INPUT |
+	HWMON_T_MAX | HWMON_T_MIN |
+	HWMON_T_MAX_ALARM | HWMON_T_MIN_ALARM |
+	HWMON_T_CRIT | HWMON_T_LCRIT |
+	HWMON_T_CRIT_ALARM | HWMON_T_LCRIT_ALARM,
+	0,
+};
+
+static const struct hwmon_channel_info sfp_hwmon_temp_channel_info = {
+	.type = hwmon_temp,
+	.config = sfp_hwmon_temp_config,
+};
+
+static u32 sfp_hwmon_vcc_config[] = {
+	HWMON_I_INPUT |
+	HWMON_I_MAX | HWMON_I_MIN |
+	HWMON_I_MAX_ALARM | HWMON_I_MIN_ALARM |
+	HWMON_I_CRIT | HWMON_I_LCRIT |
+	HWMON_I_CRIT_ALARM | HWMON_I_LCRIT_ALARM,
+	0,
+};
+
+static const struct hwmon_channel_info sfp_hwmon_vcc_channel_info = {
+	.type = hwmon_in,
+	.config = sfp_hwmon_vcc_config,
+};
+
+static u32 sfp_hwmon_bias_config[] = {
+	HWMON_C_INPUT |
+	HWMON_C_MAX | HWMON_C_MIN |
+	HWMON_C_MAX_ALARM | HWMON_C_MIN_ALARM |
+	HWMON_C_CRIT | HWMON_C_LCRIT |
+	HWMON_C_CRIT_ALARM | HWMON_C_LCRIT_ALARM,
+	0,
+};
+
+static const struct hwmon_channel_info sfp_hwmon_bias_channel_info = {
+	.type = hwmon_curr,
+	.config = sfp_hwmon_bias_config,
+};
+
+static u32 sfp_hwmon_power_config[] = {
+	/* Transmit power */
+	HWMON_P_INPUT |
+	HWMON_P_MAX | HWMON_P_MIN |
+	HWMON_P_MAX_ALARM | HWMON_P_MIN_ALARM |
+	HWMON_P_CRIT | HWMON_P_LCRIT |
+	HWMON_P_CRIT_ALARM | HWMON_P_LCRIT_ALARM,
+	/* Receive power */
+	HWMON_P_INPUT |
+	HWMON_P_MAX | HWMON_P_MIN |
+	HWMON_P_MAX_ALARM | HWMON_P_MIN_ALARM |
+	HWMON_P_CRIT | HWMON_P_LCRIT |
+	HWMON_P_CRIT_ALARM | HWMON_P_LCRIT_ALARM,
+	0,
+};
+
+static const struct hwmon_channel_info sfp_hwmon_power_channel_info = {
+	.type = hwmon_power,
+	.config = sfp_hwmon_power_config,
+};
+
+static const struct hwmon_channel_info *sfp_hwmon_info[] = {
+	&sfp_hwmon_chip,
+	&sfp_hwmon_vcc_channel_info,
+	&sfp_hwmon_temp_channel_info,
+	&sfp_hwmon_bias_channel_info,
+	&sfp_hwmon_power_channel_info,
+	NULL,
+};
+
+static const struct hwmon_chip_info sfp_hwmon_chip_info = {
+	.ops = &sfp_hwmon_ops,
+	.info = sfp_hwmon_info,
+};
+
+static int sfp_hwmon_insert(struct sfp *sfp)
+{
+	int err, i, j;
+
+	if (sfp->id.ext.sff8472_compliance == SFP_SFF8472_COMPLIANCE_NONE)
+		return 0;
+
+	if (!(sfp->id.ext.diagmon & SFP_DIAGMON_DDM))
+		return 0;
+
+	if (sfp->id.ext.diagmon & SFP_DIAGMON_ADDRMODE)
+		/* This driver in general does not support address
+		 * change.
+		 */
+		return 0;
+
+	err = sfp_read(sfp, true, 0, &sfp->diag, sizeof(sfp->diag));
+	if (err < 0)
+		return err;
+
+	sfp->hwmon_name = devm_kstrdup(sfp->dev, dev_name(sfp->dev),
+				       GFP_KERNEL);
+	if (!sfp->hwmon_name)
+		return -ENODEV;
+
+	for (i = j = 0; sfp->hwmon_name[i]; i++) {
+		if (isalnum(sfp->hwmon_name[i])) {
+			if (i != j)
+				sfp->hwmon_name[j] = sfp->hwmon_name[i];
+			j++;
+		}
+	}
+	sfp->hwmon_name[j] = '\0';
+
+	sfp->hwmon_dev = devm_hwmon_device_register_with_info(sfp->dev,
+				sfp->hwmon_name, sfp, &sfp_hwmon_chip_info,
+				NULL);
+
+	return PTR_ERR_OR_ZERO(sfp->hwmon_dev);
+}
+
+static void sfp_hwmon_remove(struct sfp *sfp)
+{
+	devm_hwmon_device_unregister(sfp->hwmon_dev);
+}
+#else
+static int sfp_hwmon_insert(struct sfp *sfp)
+{
+	return 0;
+}
+
+static void sfp_hwmon_remove(struct sfp *sfp)
+{
+}
+#endif
+
 /* Helpers */
 static void sfp_module_tx_disable(struct sfp *sfp)
 {
@@ -636,6 +1362,10 @@ static int sfp_sm_mod_probe(struct sfp *sfp)
 		dev_warn(sfp->dev,
 			 "module address swap to access page 0xA2 is not supported.\n");
 
+	ret = sfp_hwmon_insert(sfp);
+	if (ret < 0)
+		return ret;
+
 	ret = sfp_module_insert(sfp->sfp_bus, &sfp->id);
 	if (ret < 0)
 		return ret;
@@ -647,6 +1377,8 @@ static void sfp_sm_mod_remove(struct sfp *sfp)
 {
 	sfp_module_remove(sfp->sfp_bus);
 
+	sfp_hwmon_remove(sfp);
+
 	if (sfp->mod_phy)
 		sfp_sm_phy_detach(sfp);
 
diff --git a/include/linux/sfp.h b/include/linux/sfp.h
index ebce9e24906a..d37518e89db2 100644
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -231,6 +231,50 @@ struct sfp_eeprom_id {
 	struct sfp_eeprom_ext ext;
 } __packed;
 
+struct sfp_diag {
+	__be16 temp_high_alarm;
+	__be16 temp_low_alarm;
+	__be16 temp_high_warn;
+	__be16 temp_low_warn;
+	__be16 volt_high_alarm;
+	__be16 volt_low_alarm;
+	__be16 volt_high_warn;
+	__be16 volt_low_warn;
+	__be16 bias_high_alarm;
+	__be16 bias_low_alarm;
+	__be16 bias_high_warn;
+	__be16 bias_low_warn;
+	__be16 txpwr_high_alarm;
+	__be16 txpwr_low_alarm;
+	__be16 txpwr_high_warn;
+	__be16 txpwr_low_warn;
+	__be16 rxpwr_high_alarm;
+	__be16 rxpwr_low_alarm;
+	__be16 rxpwr_high_warn;
+	__be16 rxpwr_low_warn;
+	__be16 laser_temp_high_alarm;
+	__be16 laser_temp_low_alarm;
+	__be16 laser_temp_high_warn;
+	__be16 laser_temp_low_warn;
+	__be16 tec_cur_high_alarm;
+	__be16 tec_cur_low_alarm;
+	__be16 tec_cur_high_warn;
+	__be16 tec_cur_low_warn;
+	__be32 cal_rxpwr4;
+	__be32 cal_rxpwr3;
+	__be32 cal_rxpwr2;
+	__be32 cal_rxpwr1;
+	__be32 cal_rxpwr0;
+	__be16 cal_txi_slope;
+	__be16 cal_txi_offset;
+	__be16 cal_txpwr_slope;
+	__be16 cal_txpwr_offset;
+	__be16 cal_t_slope;
+	__be16 cal_t_offset;
+	__be16 cal_v_slope;
+	__be16 cal_v_offset;
+} __packed;
+
 /* SFP EEPROM registers */
 enum {
 	SFP_PHYS_ID			= 0x00,
@@ -384,7 +428,33 @@ enum {
 	SFP_TEC_CUR			= 0x6c,
 
 	SFP_STATUS			= 0x6e,
-	SFP_ALARM			= 0x70,
+	SFP_ALARM0			= 0x70,
+	SFP_ALARM0_TEMP_HIGH		= BIT(7),
+	SFP_ALARM0_TEMP_LOW		= BIT(6),
+	SFP_ALARM0_VCC_HIGH		= BIT(5),
+	SFP_ALARM0_VCC_LOW		= BIT(4),
+	SFP_ALARM0_TX_BIAS_HIGH		= BIT(3),
+	SFP_ALARM0_TX_BIAS_LOW		= BIT(2),
+	SFP_ALARM0_TXPWR_HIGH		= BIT(1),
+	SFP_ALARM0_TXPWR_LOW		= BIT(0),
+
+	SFP_ALARM1			= 0x71,
+	SFP_ALARM1_RXPWR_HIGH		= BIT(7),
+	SFP_ALARM1_RXPWR_LOW		= BIT(6),
+
+	SFP_WARN0			= 0x74,
+	SFP_WARN0_TEMP_HIGH		= BIT(7),
+	SFP_WARN0_TEMP_LOW		= BIT(6),
+	SFP_WARN0_VCC_HIGH		= BIT(5),
+	SFP_WARN0_VCC_LOW		= BIT(4),
+	SFP_WARN0_TX_BIAS_HIGH		= BIT(3),
+	SFP_WARN0_TX_BIAS_LOW		= BIT(2),
+	SFP_WARN0_TXPWR_HIGH		= BIT(1),
+	SFP_WARN0_TXPWR_LOW		= BIT(0),
+
+	SFP_WARN1			= 0x75,
+	SFP_WARN1_RXPWR_HIGH		= BIT(7),
+	SFP_WARN1_RXPWR_LOW		= BIT(6),
 
 	SFP_EXT_STATUS			= 0x76,
 	SFP_VSL				= 0x78,
-- 
2.18.0.rc2

^ permalink raw reply related

* [PATCH RFC 1/2] hwmon: Add support for power min, lcrit, min_alarm and lcrit_alarm
From: Andrew Lunn @ 2018-06-28 20:41 UTC (permalink / raw)
  To: netdev, Florian Fainelli, Guenter Roeck, Russell King, vadimp,
	linux-hwmon
  Cc: Andrew Lunn
In-Reply-To: <1530218475-4369-1-git-send-email-andrew@lunn.ch>

Some sensors support reporting minimal and lower critical power, as
well as alarms when these thresholds are reached. Add support for
these attributes to the hwmon core.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/hwmon/hwmon.c | 4 ++++
 include/linux/hwmon.h | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
index e88c01961948..33d51281272b 100644
--- a/drivers/hwmon/hwmon.c
+++ b/drivers/hwmon/hwmon.c
@@ -394,12 +394,16 @@ static const char * const hwmon_power_attr_templates[] = {
 	[hwmon_power_cap_hyst] = "power%d_cap_hyst",
 	[hwmon_power_cap_max] = "power%d_cap_max",
 	[hwmon_power_cap_min] = "power%d_cap_min",
+	[hwmon_power_min] = "power%d_min",
 	[hwmon_power_max] = "power%d_max",
+	[hwmon_power_lcrit] = "power%d_lcrit",
 	[hwmon_power_crit] = "power%d_crit",
 	[hwmon_power_label] = "power%d_label",
 	[hwmon_power_alarm] = "power%d_alarm",
 	[hwmon_power_cap_alarm] = "power%d_cap_alarm",
+	[hwmon_power_min_alarm] = "power%d_min_alarm",
 	[hwmon_power_max_alarm] = "power%d_max_alarm",
+	[hwmon_power_lcrit_alarm] = "power%d_lcrit_alarm",
 	[hwmon_power_crit_alarm] = "power%d_crit_alarm",
 };
 
diff --git a/include/linux/hwmon.h b/include/linux/hwmon.h
index 1b74ad11a5a4..b217101ca76e 100644
--- a/include/linux/hwmon.h
+++ b/include/linux/hwmon.h
@@ -188,12 +188,16 @@ enum hwmon_power_attributes {
 	hwmon_power_cap_hyst,
 	hwmon_power_cap_max,
 	hwmon_power_cap_min,
+	hwmon_power_min,
 	hwmon_power_max,
 	hwmon_power_crit,
+	hwmon_power_lcrit,
 	hwmon_power_label,
 	hwmon_power_alarm,
 	hwmon_power_cap_alarm,
+	hwmon_power_min_alarm,
 	hwmon_power_max_alarm,
+	hwmon_power_lcrit_alarm,
 	hwmon_power_crit_alarm,
 };
 
@@ -214,12 +218,16 @@ enum hwmon_power_attributes {
 #define HWMON_P_CAP_HYST		BIT(hwmon_power_cap_hyst)
 #define HWMON_P_CAP_MAX			BIT(hwmon_power_cap_max)
 #define HWMON_P_CAP_MIN			BIT(hwmon_power_cap_min)
+#define HWMON_P_MIN			BIT(hwmon_power_min)
 #define HWMON_P_MAX			BIT(hwmon_power_max)
+#define HWMON_P_LCRIT			BIT(hwmon_power_lcrit)
 #define HWMON_P_CRIT			BIT(hwmon_power_crit)
 #define HWMON_P_LABEL			BIT(hwmon_power_label)
 #define HWMON_P_ALARM			BIT(hwmon_power_alarm)
 #define HWMON_P_CAP_ALARM		BIT(hwmon_power_cap_alarm)
+#define HWMON_P_MIN_ALARM		BIT(hwmon_power_max_alarm)
 #define HWMON_P_MAX_ALARM		BIT(hwmon_power_max_alarm)
+#define HWMON_P_LCRIT_ALARM		BIT(hwmon_power_lcrit_alarm)
 #define HWMON_P_CRIT_ALARM		BIT(hwmon_power_crit_alarm)
 
 enum hwmon_energy_attributes {
-- 
2.18.0.rc2

^ permalink raw reply related

* [PATCH RFC 0/2] HWMON support for SFP modules
From: Andrew Lunn @ 2018-06-28 20:41 UTC (permalink / raw)
  To: netdev, Florian Fainelli, Guenter Roeck, Russell King, vadimp,
	linux-hwmon
  Cc: Andrew Lunn

This patchset adds HWMON support to SFP modules. The first patch adds
some attributes for power sensors which are currently missing from the
hwmon core. The second patch then extends the core SFP code to export
the sensors found in SFP modules.

This code has been tested with two SFP modules:

module OEM SFP-7000-85 rev 11.0 sn M1512220075 dc 160221
module FINISAR CORP. FTLF8524E2GNL rev A sn PW40MNN dc 160725

The anonymous module uses external calibration, while the FINISAR uses
internal calibration. Thus both code paths have been tested.

Andrew Lunn (2):
  hwmon: Add support for power min, lcrit, min_alarm and lcrit_alarm
  net: phy: sfp: Add HWMON support for module sensors

 drivers/hwmon/hwmon.c |   4 +
 drivers/net/phy/sfp.c | 732 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/hwmon.h |   8 +
 include/linux/sfp.h   |  72 ++++-
 4 files changed, 815 insertions(+), 1 deletion(-)

-- 
2.18.0.rc2

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox