Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: mt7601u: phy: mark expected switch fall-through
From: Kalle Valo @ 2018-04-24 17:28 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Jakub Kicinski, linux-wireless, netdev, linux-kernel,
	Gustavo A. R. Silva
In-Reply-To: <20180330211223.GA22497@embeddedgus>

"Gustavo A. R. Silva" <gustavo@embeddedor.com> wrote:

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> Acked-by: Jakub Kicinski <kubakici@wp.pl>

Patch applied to wireless-drivers-next.git, thanks.

5156b054bcda mt7601u: phy: mark expected switch fall-through

-- 
https://patchwork.kernel.org/patch/10318419/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)
From: Nathan Royce @ 2018-04-24 17:28 UTC (permalink / raw)
  To: Tobias Diedrich, Oleksij Rempel, Nathan Royce,
	QCA ath9k Development, Kalle Valo, linux-wireless, netdev,
	linux-kernel, ath9k_htc_fw
In-Reply-To: <20170607223913.GD20162@yumi.tdiedrich.de>

I finally got around to applying your patch, building the toolchain
(based on master source (gcc8)), but alas while there is no firmware
panic in the log, wifi drops off the face of the planet (ssid
disappears and hostapd doesn't know wifi failed (nothing in the log
either)).

On Wed, Jun 7, 2017 at 5:39 PM, Tobias Diedrich
<ranma+ath9k_htc_fw@tdiedrich.de> wrote:
> Oleksij Rempel wrote:
>> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
>> > Oleksij Rempel wrote:
>> >> Yes, this is "normal" problem. The firmware has no error handler for PCI
>> >> bus related exceptions. So if we filed to read PCI bus first time, we
>> >> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
>> >> and provide an kernel "firmware panic!" message.
>> >> Every one who can or will to fix this, is welcome.
>> >>
>> >>> *****
>> >>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
>> >>> exccause: 0x0000000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
>> > [...]
>> >
>> >> memdmp 50ae78 50ae88
>> >
>> > 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..........@
>> >
>> > [...copy to bin...]
>> > $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin
>> > [..]
>> >    0:   6c1004          entry   a1, 32
>> >    3:   126aa2          l32r    a2, 0xfffdaa8c
>> >    6:   0c0200          memw
>> >    9:   8820            l32i.n  a8, a2, 0      <----------Exception cause PC still points at load
>> >    b:   c020            movi.n  a2, 0
>> >    d:   081940          extui   a9, a8, 1, 1
>> >
>> > Judging from that it should be fairly simple to at least implement
>> > some sort of retry, possible after triggering a PCIe link retrain?
>>
>> I assume, yes.
>>
>> > There are some related PCIe root complex registers that may point to
>> > what exactly failed if they were dumped.
>> >
>> > The root complex registers live at 0x00040000 and I think match the
>> > registers described for the root complex in the AR9344 datasheet.
>>
>> Suddenly I don't have ar7010 docs to tell..
>>
>> > PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
>> > "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
>> > the hierarchy reports any of the following errors and the associated
>> > enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
>> > ERR_NONFATAL."
>> >
>> > AFAICS link retrain can be done by setting bit3 (INIT_RST,
>> > "Application request to initiate a training reset") in
>> > PCIE_APP (0x40000).
>> >
>> > See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
>> > flips some bits in the RC to enable the PCIe bus for reading the
>> > EEPROM).
>> >
>> > The root complex pci configuration space is at 0x20000 which could
>> > have further error details:
>> >> memdmp 20000 20200
>> >
>> > 020000: a02a 168c 0010 0006 0000 0001 0001 0000  .*..............
>> > 020010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020030: 0000 0000 0000 0040 0000 0000 0000 01ff  .......@........
>> > 020040: 5bc3 5001 0000 0000 0000 0000 0000 0000  [.P.............
>> > 020050: 0080 7005 0000 0000 0000 0000 0000 0000  ..p.............
>> > 020060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020070: 0042 0010 0000 8701 0000 2010 0013 4411  .B............D.
>> > 020080: 3011 0000 0000 0000 00c0 03c0 0000 0000  0...............
>> > 020090: 0000 0000 0000 0010 0000 0000 0000 0000  ................
>> > 0200a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0200b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0200c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0200d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0200e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0200f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020100: 1401 0001 0000 0000 0000 0000 0006 2030  ...............0
>> > 020110: 0000 0000 0000 2000 0000 00a0 0000 0000  ................
>> > 020120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020140: 0001 0002 0000 0000 0000 0000 0000 0000  ................
>> > 020150: 0000 0000 8000 00ff 0000 0000 0000 0000  ................
>> > 020160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 020190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0201a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0201b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0201c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0201d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0201e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> > 0201f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
>> >
>> > Transformed into something suitable for feeding into lspci -F:
>> >
>> > 00:00.0 Description filled in by lspci
>> > 00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
>> > 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
>> > 40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
>> > 50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > 70: 10 00 42 00 01 87 00 00 10 20 00 00 11 44 13 00
>> > 80: 00 00 11 30 00 00 00 00 c0 03 c0 00 00 00 00 00
>> > 90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
>> > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> >
>> > $ lspci -F /tmp/hexdump -vvv
>> > 00:00.0 Non-VGA unclassified device: Qualcomm Atheros Device a02a (rev 01)
>> >         !!! Invalid class 0000 for header type 01
>> >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> >         Latency: 0
>> >         Interrupt: pin A routed to IRQ 255
>> >         Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
>> >         I/O behind bridge: 00000000-00000fff
>> >         Memory behind bridge: 00000000-000fffff
>> >         Prefetchable memory behind bridge: 00000000-000fffff
>> >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>> >         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>> >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>> >         Capabilities: [40] Power Management version 3
>> >                 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
>> >                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>> >         Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>> >                 Address: 0000000000000000  Data: 0000
>> >         Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
>> >                 DevCap: MaxPayload 256 bytes, PhantFunc 0
>> >                         ExtTag- RBE+
>> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>> >                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>> >                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>> >                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <1us, L1 <64us
>> >                         ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp-
>> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
>> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> >                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
>> >                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
>> >                 RootCap: CRSVisible-
>> >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>> >                 DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
>> >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>> >                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>> >                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>> >                          Compliance De-emphasis: -6dB
>> >                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>> >                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>> >
>>
>> Looks promising :)
>>
>
> POC seems to work, though this may additionally need to restore wifi
> state as well, no guarantees there.
>
>>str 40018 3
> 00040018 : 00000003
>>
> Retry(1) failed PCIe access @0x10ff4038
> Before: int_mask=0 app=ffc1 reset=0
> After: int_mask=0 app=ffc1 reset=7
> wlan int status=0
>
>>str 40018 3
> 00040018 : 00000003
>>
> Retry(1) failed PCIe access @0x10ff4038
> Before: int_mask=0 app=ffc1 reset=0
> After: int_mask=0 app=ffc1 reset=7
> wlan int status=0
>>
>
>
> diff --git a/target_firmware/magpie_fw_dev/target/init/app_start.c b/target_firmware/magpie_fw_dev/target/init/app_start.c
> index 8fa9c8b..fea62c1 100644
> --- a/target_firmware/magpie_fw_dev/target/init/app_start.c
> +++ b/target_firmware/magpie_fw_dev/target/init/app_start.c
> @@ -137,6 +137,13 @@ void __section(boot) __noreturn __visible app_start(void)
>
>         A_PRINTF(" A_WDT_INIT()\n\r");
>
> +#if defined(PROJECT_MAGPIE)
> +       // For some reason needs to be called again here for the
> +       // exception handlers to work properly, at least on the XBOX
> +       // adapter.
> +       fatal_exception_func();
> +#endif
> +
>  #if defined(PROJECT_K2)
>         save_cmnos_printf = fw_cmnos_printf;
>  #endif
> diff --git a/target_firmware/magpie_fw_dev/target/init/init.c b/target_firmware/magpie_fw_dev/target/init/init.c
> index 7484c05..cad2519 100755
> --- a/target_firmware/magpie_fw_dev/target/init/init.c
> +++ b/target_firmware/magpie_fw_dev/target/init/init.c
> @@ -212,6 +212,78 @@ LOCAL void zfGenWrongEpidEvent(uint32_t epid)
>         mUSB_EP3_XFER_DONE();
>  }
>
> +static void
> +AR7010_pcie_reset(void)
> +{
> +#define PCIE_RC_ACCESS_DELAY    20
> +
> +#define PCI_RC_RESET_BIT                            BIT6
> +#define PCI_RC_PHY_RESET_BIT                        BIT7
> +#define PCI_RC_PLL_RESET_BIT                        BIT8
> +#define PCI_RC_PHY_SHIFT_RESET_BIT                  BIT10
> +
> +#define HAL_WORD_REG_WRITE(addr, val) do { *((uint32_t*)(addr)) = val; } while (0)
> +#define HAL_WORD_REG_READ(addr) (*((uint32_t*)(addr)))
> +
> +#define CMD_PCI_RC_RESET_ON()    HAL_WORD_REG_WRITE(MAGPIE_REG_RST_RESET_ADDR,  \
> +                                    (HAL_WORD_REG_READ(MAGPIE_REG_RST_RESET_ADDR)|  \
> +                                        (PCI_RC_PHY_SHIFT_RESET_BIT|PCI_RC_PLL_RESET_BIT|PCI_RC_PHY_RESET_BIT|PCI_RC_RESET_BIT)))
> +
> +#define CMD_PCI_RC_RESET_CLR()   HAL_WORD_REG_WRITE(MAGPIE_REG_RST_RESET_ADDR, \
> +                                    (HAL_WORD_REG_READ(MAGPIE_REG_RST_RESET_ADDR)&   \
> +                                        (~(PCI_RC_PHY_SHIFT_RESET_BIT|PCI_RC_PLL_RESET_BIT|PCI_RC_PHY_RESET_BIT|PCI_RC_RESET_BIT))))
> +
> +       int i;
> +
> +       CMD_PCI_RC_RESET_ON();
> +       A_DELAY_USECS(PCIE_RC_ACCESS_DELAY);
> +
> +       /* dereset the reset */
> +       CMD_PCI_RC_RESET_CLR();
> +       A_DELAY_USECS(500);
> +
> +       /* 7. set bus master and memory space enable */
> +       DEBUG_SYSTEM_STATE = (DEBUG_SYSTEM_STATE&(~0xff)) | 0x45;
> +       HAL_WORD_REG_WRITE(0x00020004, (HAL_WORD_REG_READ(0x00020004)|(BIT1|BIT2)));
> +       A_DELAY_USECS(PCIE_RC_ACCESS_DELAY);
> +
> +       /* 7.5. asser pcie_ep reset */
> +       HAL_WORD_REG_WRITE(0x00040018, (HAL_WORD_REG_READ(0x00040018) & ~(0x1 << 2)));
> +       A_DELAY_USECS(PCIE_RC_ACCESS_DELAY);
> +
> +       /* 7.5. de-asser pcie_ep reset */
> +       HAL_WORD_REG_WRITE(0x00040018, (HAL_WORD_REG_READ(0x00040018)|(0x1 << 2)));
> +       A_DELAY_USECS(PCIE_RC_ACCESS_DELAY);
> +
> +       /* 8. set app_ltssm_enable */
> +       DEBUG_SYSTEM_STATE = (DEBUG_SYSTEM_STATE&(~0xff)) | 0x46;
> +       HAL_WORD_REG_WRITE(0x00040000, (HAL_WORD_REG_READ(0x00040000)|0xffc1));
> +
> +       /*!
> +        * Receive control (PCIE_RESET),
> +        *  0x40018, BIT0: LINK_UP, PHY Link up -PHY Link up/down indicator
> +        *  in case the link up is not ready and we access the 0x14000000,
> +        *  vmc will hang here
> +        */
> +
> +       /* poll 0x40018/bit0 (1000 times) until it turns to 1 */
> +       i = 10000;
> +       while(i-->0)
> +       {
> +               uint32_t reg_value = HAL_WORD_REG_READ(0x00040018);
> +               if( reg_value & BIT0 )
> +                       break;
> +               A_DELAY_USECS(PCIE_RC_ACCESS_DELAY);
> +       }
> +
> +       HAL_WORD_REG_WRITE(0x14000004, (HAL_WORD_REG_READ(0x14000004)|0x116));
> +       A_DELAY_USECS(PCIE_RC_ACCESS_DELAY);
> +
> +       HAL_WORD_REG_WRITE(0x14000010, (HAL_WORD_REG_READ(0x14000010)|EEPROM_CTRL_BASE));
> +}
> +
> +static int exception_retries = 0;
> +
>  void
>  AR6002_fatal_exception_handler_patch(CPU_exception_frame_t *exc_frame)
>  {
> @@ -226,6 +298,32 @@ AR6002_fatal_exception_handler_patch(CPU_exception_frame_t *exc_frame)
>         dump.pc                     = exc_frame->xt_pc;
>         dump.assline                = 0;
>
> +       if (dump.badvaddr >= 0x10000000 &&
> +           dump.badvaddr <  0x18000000) {
> +               // Exception while accessing PCIe memory space.
> +               volatile uint32_t *pcie_app = (uint32_t*) 0x40000;
> +               volatile uint32_t *pcie_reset = (uint32_t*) 0x40018;
> +               volatile uint32_t *pcie_int_mask = (uint32_t*) 0x40050;
> +
> +               // Maybe retry.
> +               if (++exception_retries < 2) {
> +                       A_PRINTF("\nRetry(%d) failed PCIe access @0x%x\n",
> +                               exception_retries, dump.badvaddr);
> +                       A_PRINTF("Before: int_mask=%x app=%x reset=%x\n", *pcie_int_mask, *pcie_app, *pcie_reset);
> +
> +                       AR7010_pcie_reset();
> +
> +                       A_PRINTF("After: int_mask=%x app=%x reset=%x\n", *pcie_int_mask, *pcie_app, *pcie_reset);
> +
> +                       // This should recurse if we failed to recover.
> +                       A_PRINTF("wlan int status=%x\n", HAL_WORD_REG_READ(0x10ff4038));
> +
> +                       // Reset retry counter.
> +                       exception_retries = 0;
> +                       return;
> +               }
> +       }
> +
>         zfGenExceptionEvent(dump.exc_frame.xt_exccause, dump.pc, dump.badvaddr);
>
>  #if SYSTEM_MODULE_PRINT
>
>
> --
> Tobias                                          PGP: http://8ef7ddba.uguu.de

^ permalink raw reply

* Re: [net-next 1/2] ipv6: sr: add a per namespace sysctl to control seg6 flowlabel
From: Ahmed Abdelsalam @ 2018-04-24 17:25 UTC (permalink / raw)
  To: David Miller; +Cc: dav.lebrun, kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20180424.131650.2276839479783300009.davem@davemloft.net>

On Tue, 24 Apr 2018 13:16:50 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Ahmed Abdelsalam <amsalam20@gmail.com>
> Date: Mon, 23 Apr 2018 23:36:59 +0200
> 
> > This patch adds a per namespace sysctl, named 'seg6_flowlabel', to be used
> > by seg6_do_srh_encap() to control the behaviour of setting the flowlabel
> > value of outer IPv6.
> > 
> > The currently support behaviours are as follows:
> > -1 set flowlabel to zero.
> >  0 copy flowlabel from Inner paceket in case of Inner IPv6 (0 for IPv4/L2)
> >  1 Compute the flowlabel using seg6_make_flowlabel()
> > 
> > Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
> 
> There really isn't a reason to make this a separate patch.
> 
> Adding a sysctl that nothing refers to doesn't add much value.
> 
> So please combine patches #1 and #2.

Ok, I will combine them and send you a new patch. 

Thanks

-- 
Ahmed Abdelsalam <amsalam20@gmail.com>

^ permalink raw reply

* [PATCH net-next] liquidio: Swap VF representor Tx and Rx statistics
From: Felix Manlunas @ 2018-04-24 17:23 UTC (permalink / raw)
  To: davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	felix.manlunas, srinivasa.jampala

From: Srinivas Jampala <srinivasa.jampala@cavium.com>

Swap VF representor tx and rx interface statistics since it is a
virtual switchdev port and tx for VM should be rx for VF representor
and vice-versa.

Signed-off-by: Srinivas Jampala <srinivasa.jampala@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
index 2adafa3..ddd7431 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
@@ -201,13 +201,14 @@ static void lio_vf_rep_get_stats64(struct net_device *dev,
 {
 	struct lio_vf_rep_desc *vf_rep = netdev_priv(dev);
 
-	stats64->tx_packets = vf_rep->stats.tx_packets;
-	stats64->tx_bytes   = vf_rep->stats.tx_bytes;
-	stats64->tx_dropped = vf_rep->stats.tx_dropped;
-
-	stats64->rx_packets = vf_rep->stats.rx_packets;
-	stats64->rx_bytes   = vf_rep->stats.rx_bytes;
-	stats64->rx_dropped = vf_rep->stats.rx_dropped;
+	/* Swap tx and rx stats as VF rep is a switch port */
+	stats64->tx_packets = vf_rep->stats.rx_packets;
+	stats64->tx_bytes   = vf_rep->stats.rx_bytes;
+	stats64->tx_dropped = vf_rep->stats.rx_dropped;
+
+	stats64->rx_packets = vf_rep->stats.tx_packets;
+	stats64->rx_bytes   = vf_rep->stats.tx_bytes;
+	stats64->rx_dropped = vf_rep->stats.tx_dropped;
 }
 
 static int
-- 
1.8.3.1

^ permalink raw reply related

* Re: cw1200: fix spelling mistake: "Mailformed" -> "Malformed"
From: Kalle Valo @ 2018-04-24 17:23 UTC (permalink / raw)
  To: Colin Ian King
  Cc: Solomon Peachy, linux-wireless, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20180329153818.29538-1-colin.king@canonical.com>

Colin Ian King <colin.king@canonical.com> wrote:

> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix to spelling mistake in wiphy_warn warning message text
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Patch applied to wireless-drivers-next.git, thanks.

0fb1abc3a0b8 cw1200: fix spelling mistake: "Mailformed" -> "Malformed"

-- 
https://patchwork.kernel.org/patch/10315587/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: fix for bnx2x panic during ethtool reporting
From: Sebastian Kuzminsky @ 2018-04-24 17:22 UTC (permalink / raw)
  To: Kalluru, Sudarsana
  Cc: Florian Fainelli, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, Elior, Ariel, Dept-Eng Everest Linux L2
In-Reply-To: <MW2PR07MB4139D11DC60825A8A6C04E068A880@MW2PR07MB4139.namprd07.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 126 bytes --]

Here's an updated version of the patch that is checkpatch.pl clean (I
had forgotten to sign off...).

-- 
Sebastian Kuzminsky

[-- Attachment #2: 0001-bnx2x-make-fw-str-in-a-non-crazy-way.patch --]
[-- Type: text/x-patch, Size: 1290 bytes --]

From 45fe6fb1f6a187233bd1304b031bed20b3819d26 Mon Sep 17 00:00:00 2001
From: Sebastian Kuzminsky <seb@highlab.com>
Date: Mon, 16 Apr 2018 23:24:07 +0000
Subject: [PATCH] bnx2x: make fw str in a non-crazy way

This fixes an unterminated string bug when the firwmare version
(bp->fw_ver) is longer than the passed-in buffer (buf/buf_len).

Signed-off-by: Sebastian Kuzminsky <seb@highlab.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 6465414dad74..f023ee365c63 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -150,9 +150,9 @@ void bnx2x_fill_fw_str(struct bnx2x *bp, char *buf, size_t buf_len)
 		phy_fw_ver[0] = '\0';
 		bnx2x_get_ext_phy_fw_version(&bp->link_params,
 					     phy_fw_ver, PHY_FW_VER_LEN);
-		strlcpy(buf, bp->fw_ver, buf_len);
-		snprintf(buf + strlen(bp->fw_ver), 32 - strlen(bp->fw_ver),
-			 "bc %d.%d.%d%s%s",
+		snprintf(buf, buf_len,
+			 "%s bc %d.%d.%d%s%s",
+			 bp->fw_ver,
 			 (bp->common.bc_ver & 0xff0000) >> 16,
 			 (bp->common.bc_ver & 0xff00) >> 8,
 			 (bp->common.bc_ver & 0xff),
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH net-next] tcp: md5: only call tp->af_specific->md5_lookup() for md5 sockets
From: David Miller @ 2018-04-24 17:20 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <20180423214625.107867-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 23 Apr 2018 14:46:25 -0700

> RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive,
> given they have no result.
> We can omit the calls for sockets that have no md5 keys.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] liquidio: fix liquidio_xmit()'s return type
From: Felix Manlunas @ 2018-04-24 17:18 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: linux-kernel, Derek Chickles, Satanand Burla, Felix Manlunas,
	Raghu Vatsavayi, netdev
In-Reply-To: <20180424131637.3707-1-luc.vanoostenryck@gmail.com>

On Tue, Apr 24, 2018 at 03:16:37PM +0200, Luc Van Oostenryck wrote:
> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> which is a typedef for an enum type, but the implementations in this
> driver returns an 'int'.
> 
> Fix this by returning 'netdev_tx_t' in this driver too.
> 
> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> ---
>  drivers/net/ethernet/cavium/liquidio/lio_main.c    | 2 +-
>  drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> index 603a144d3..6d6f16815 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> @@ -2491,7 +2491,7 @@ static inline int send_nic_timestamp_pkt(struct octeon_device *oct,
>   * @returns whether the packet was transmitted to the device okay or not
>   *             (NETDEV_TX_OK or NETDEV_TX_BUSY)
>   */
> -static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
> +static netdev_tx_t liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
>  {
>  	struct lio *lio;
>  	struct octnet_buf_free_info *finfo;
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> index f92dfa411..13ef46a85 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> @@ -1560,7 +1560,7 @@ static int send_nic_timestamp_pkt(struct octeon_device *oct,
>   * @returns whether the packet was transmitted to the device okay or not
>   *             (NETDEV_TX_OK or NETDEV_TX_BUSY)
>   */
> -static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
> +static netdev_tx_t liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
>  {
>  	struct octnet_buf_free_info *finfo;
>  	union octnic_cmd_setup cmdsetup;
> -- 
> 2.17.0
> 

Acked-by: Felix Manlunas <felix.manlunas@cavium.com>

^ permalink raw reply

* Re: [PATCH net] packet: fix bitfield update race
From: David Miller @ 2018-04-24 17:17 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: netdev, byoungyoung, threeearcat, xiyou.wangcong, herbert,
	willemb
In-Reply-To: <20180423213703.241054-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 23 Apr 2018 17:37:03 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> Updates to the bitfields in struct packet_sock are not atomic.
> Serialize these read-modify-write cycles.
> 
> Move po->running into a separate variable. Its writes are protected by
> po->bind_lock (except for one startup case at packet_create). Also
> replace a textual precondition warning with lockdep annotation.
> 
> All others are set only in packet_setsockopt. Serialize these
> updates by holding the socket lock. Analogous to other field updates,
> also hold the lock when testing whether a ring is active (pg_vec).
> 
> Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg")
> Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
> Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Looks good, applied and queued up for -stable.

Thanks for fixing this Willem.

^ permalink raw reply

* Re: [PATCH] liquidio: fix lio_vf_rep_pkt_xmit()'s return type
From: Felix Manlunas @ 2018-04-24 17:17 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: linux-kernel, Derek Chickles, Satanand Burla, Felix Manlunas,
	Raghu Vatsavayi, netdev
In-Reply-To: <20180424131639.3757-1-luc.vanoostenryck@gmail.com>

On Tue, Apr 24, 2018 at 03:16:39PM +0200, Luc Van Oostenryck wrote:
> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> which is a typedef for an enum type, but the implementation in this
> driver returns an 'int'.
> 
> Fix this by returning 'netdev_tx_t' in this driver too.
> 
> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> ---
>  drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
> index 2adafa366..5973a4083 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
> @@ -31,7 +31,7 @@
>  
>  static int lio_vf_rep_open(struct net_device *ndev);
>  static int lio_vf_rep_stop(struct net_device *ndev);
> -static int lio_vf_rep_pkt_xmit(struct sk_buff *skb, struct net_device *ndev);
> +static netdev_tx_t lio_vf_rep_pkt_xmit(struct sk_buff *skb, struct net_device *ndev);
>  static void lio_vf_rep_tx_timeout(struct net_device *netdev);
>  static int lio_vf_rep_phys_port_name(struct net_device *dev,
>  				     char *buf, size_t len);
> @@ -379,7 +379,7 @@ lio_vf_rep_packet_sent_callback(struct octeon_device *oct,
>  		netif_wake_queue(ndev);
>  }
>  
> -static int
> +static netdev_tx_t
>  lio_vf_rep_pkt_xmit(struct sk_buff *skb, struct net_device *ndev)
>  {
>  	struct lio_vf_rep_desc *vf_rep = netdev_priv(ndev);
> -- 
> 2.17.0
> 

Thanks.

Acked-by: Felix Manlunas <felix.manlunas@cavium.com>

^ permalink raw reply

* Re: [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
From: Matthew Wilcox @ 2018-04-24 17:16 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Michal Hocko, David Miller, Andrew Morton, linux-mm, eric.dumazet,
	edumazet, netdev, linux-kernel, mst, jasowang, virtualization,
	dm-devel, Vlastimil Babka, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim
In-Reply-To: <alpine.LRH.2.02.1804240818530.28016@file01.intranet.prod.int.rdu2.redhat.com>

On Tue, Apr 24, 2018 at 08:29:14AM -0400, Mikulas Patocka wrote:
> 
> 
> On Mon, 23 Apr 2018, Matthew Wilcox wrote:
> 
> > On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote:
> > > Some bugs (such as buffer overflows) are better detected
> > > with kmalloc code, so we must test the kmalloc path too.
> > 
> > Well now, this brings up another item for the collective TODO list --
> > implement redzone checks for vmalloc.  Unless this is something already
> > taken care of by kasan or similar.
> 
> The kmalloc overflow testing is also not ideal - it rounds the size up to 
> the next slab size and detects buffer overflows only at this boundary.
> 
> Some times ago, I made a "kmalloc guard" patch that places a magic number 
> immediatelly after the requested size - so that it can detect overflows at 
> byte boundary 
> ( https://www.redhat.com/archives/dm-devel/2014-September/msg00018.html )
> 
> That patch found a bug in crypto code:
> ( http://lkml.iu.edu/hypermail/linux/kernel/1409.1/02325.html )

Is it still worth doing this, now we have kasan?

^ permalink raw reply

* Re: [net-next 1/2] ipv6: sr: add a per namespace sysctl to control seg6 flowlabel
From: David Miller @ 2018-04-24 17:16 UTC (permalink / raw)
  To: amsalam20; +Cc: dav.lebrun, kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <1524519420-1612-1-git-send-email-amsalam20@gmail.com>

From: Ahmed Abdelsalam <amsalam20@gmail.com>
Date: Mon, 23 Apr 2018 23:36:59 +0200

> This patch adds a per namespace sysctl, named 'seg6_flowlabel', to be used
> by seg6_do_srh_encap() to control the behaviour of setting the flowlabel
> value of outer IPv6.
> 
> The currently support behaviours are as follows:
> -1 set flowlabel to zero.
>  0 copy flowlabel from Inner paceket in case of Inner IPv6 (0 for IPv4/L2)
>  1 Compute the flowlabel using seg6_make_flowlabel()
> 
> Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>

There really isn't a reason to make this a separate patch.

Adding a sysctl that nothing refers to doesn't add much value.

So please combine patches #1 and #2.

^ permalink raw reply

* Re: [PATCH bpf-next 00/15] Introducing AF_XDP support
From: Willem de Bruijn @ 2018-04-24 17:03 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	Björn Töpel, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <20180423135619.7179-1-bjorn.topel@gmail.com>

On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This RFC introduces a new address family called AF_XDP that is
> optimized for high performance packet processing and, in upcoming
> patch sets, zero-copy semantics.

Overall, this looks really nice!

> In this v2 version, we have removed
> all zero-copy related code in order to make it smaller, simpler and
> hopefully more review friendly. This RFC only supports copy-mode for
> the generic XDP path (XDP_SKB) for both RX and TX and copy-mode for RX
> using the XDP_DRV path.

Please remove references to RFC when resending to bpf-next.

> An AF_XDP socket (XSK) is created with the normal socket()
> syscall. Associated with each XSK are two queues: the RX queue and the
> TX queue. A socket can receive packets on the RX queue and it can send
> packets on the TX queue. These queues are registered and sized with
> the setsockopts XDP_RX_RING and XDP_TX_RING, respectively. It is
> mandatory to have at least one of these queues for each socket. In
> contrast to AF_PACKET V2/V3 these descriptor queues are separated from
> packet buffers. An RX or TX descriptor points to a data buffer in a
> memory area called a UMEM. RX and TX can share the same UMEM so that a
> packet does not have to be copied between RX and TX. Moreover, if a
> packet needs to be kept for a while due to a possible retransmit, the
> descriptor that points to that packet can be changed to point to
> another and reused right away. This again avoids copying data.
>
> This new dedicated packet buffer area is call a UMEM. It consists of a
> number of equally size frames and each frame has a unique frame id. A
> descriptor in one of the queues references a frame by referencing its
> frame id. The user space allocates memory for this UMEM using whatever
> means it feels is most appropriate (malloc, mmap, huge pages,
> etc). This memory area is then registered with the kernel using the new
> setsockopt XDP_UMEM_REG. The UMEM also has two queues: the FILL queue
> and the COMPLETION queue. The fill queue is used by the application to
> send down frame ids for the kernel to fill in with RX packet
> data. References to these frames will then appear in the RX queue of
> the XSK once they have been received. The completion queue, on the
> other hand, contains frame ids that the kernel has transmitted
> completely and can now be used again by user space, for either TX or
> RX. Thus, the frame ids appearing in the completion queue are ids that
> were previously transmitted using the TX queue. In summary, the RX and
> FILL queues are used for the RX path and the TX and COMPLETION queues
> are used for the TX path.
>
> The socket is then finally bound with a bind() call to a device and a
> specific queue id on that device,

The setup involves a lot of system calls. You may want to require the
caller to take these in a well defined order, and same for destruction.

Arbitrary order leads to a state explosion in paths through the code.

With AF_PACKET we've had to fix quite a few bugs due to unexpected
states of the socket, e.g., on teardown, and it is too late now to restrict
the number of states.

^ permalink raw reply

* Re: [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
From: Michal Hocko @ 2018-04-24 17:03 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, Matthew Wilcox,
	virtualization, linux-mm, edumazet, Andrew Morton, David Miller,
	Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804241250350.28995@file01.intranet.prod.int.rdu2.redhat.com>

On Tue 24-04-18 13:00:11, Mikulas Patocka wrote:
> 
> 
> On Tue, 24 Apr 2018, Michal Hocko wrote:
> 
> > On Tue 24-04-18 11:50:30, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > > 
> > > > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote:
> > > > [...]
> > > > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f
> > > > >  	 */
> > > > >  	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > > > >  
> > > > > +#ifdef CONFIG_DEBUG_SG
> > > > > +	/* Catch bugs when the caller uses DMA API on the result of kvmalloc. */
> > > > > +	if (!(prandom_u32_max(2) & 1))
> > > > > +		goto do_vmalloc;
> > > > > +#endif
> > > > 
> > > > I really do not think there is anything DEBUG_SG specific here. Why you
> > > > simply do not follow should_failslab path or even reuse the function?
> > > 
> > > CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if 
> > > you don't like CONFIG_DEBUG_SG, pick any other option that is enabled 
> > > there).
> > 
> > Are you telling me that you are shaping a debugging functionality basing
> > on what RHEL has enabled? And you call me evil. This is just rediculous.
> > 
> > > Fail-injection framework is if off by default and it must be explicitly 
> > > enabled and configured by the user - and most users won't enable it.
> > 
> > It can be enabled easily. And if you care enough for your debugging
> > kernel then just make it enabled unconditionally.
> 
> So, should we add a new option CONFIG_KVMALLOC_FALLBACK_DEFAULT? I'm not 
> quite sure if 3 lines of debugging code need an extra option, but if you 
> don't want to reuse any existing debug option, it may be possible. Adding 
> it to the RHEL debug kernel would be trivial.

Wouldn't it be equally trivial to simply enable the fault injection? You
would get additional failure paths testing as a bonus.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: kTLS in combination with mlx4 is very unstable
From: Dave Watson @ 2018-04-24 17:01 UTC (permalink / raw)
  To: Andre Tomt; +Cc: netdev, borisp, Aviad Yehezkel
In-Reply-To: <bfb1e726-24ea-89e9-61e7-8e43e82cd23b@tomt.net>

On 04/22/18 11:21 PM, Andre Tomt wrote:
> kTLS looks fun, so I decided to play with it. It is quite spiffy - however
> with mlx4 I get kernel crashes I'm not seeing when testing on ixgbe.
> 
> For testing I'm using a git build of the "stream reflector" cubemap[1]
> configured with kTLS and 8 worker threads running on 4 physical cores,
> loading it up with a ~13Mbps MPEG-TS stream pulled from satelite TV.
> 
> The kernel seems to get increasingly unstable as I load it up with client
> connections. At about 9Gbps and 700 connections, it is okay at least for a
> while - it might run fine for say 45 minutes. Once it gets to 20 - 30Gbps,
> the kernel will usually start spewing OOPSes within minutes and the traffic
> drops.
> 
> Some bad interaction between mlx4 and kTLS?

I'm not familiar with any mlx4 specific issues, but it looks like
there is enough information here to fix the stack overflow from
recursive callbacks. I'll see if I can come up with something.

Thanks for the report.

> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__git.sesse.net_-3Fp-3Dcubemap&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=vou6lT5jmE_fWQWZZgNrsMWu4RT87QAB9V07tPHlP5U&m=nXYfAmb3ozJUT-pI1JGDgMYhxb7Dq4XSorzfyyQeGWk&s=05SnCOrNbK2DHRub2qPdVxAzXW9e7utxqDMeVaGBd8k&e=
> 
> First OOPS (from 4.16.3):
> > [  660.467358] BUG: stack guard page was hit at 00000000b136e403 (stack is 00000000ded3f179..00000000835ee6c5)
> > [  660.467422] kernel stack overflow (double-fault): 0000 [#1] SMP PTI
> > [  660.467457] Modules linked in: coretemp intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp iTCO_wdt gpio_ich iTCO_vendor_support kvm_intel mxm_wmi xfs libcrc32c kvm crc32c_generic irqbypass nls_iso8859_1 crct10dif_pclmul crc32_pclmul nls_cp437 ghash_clmulni_intel vfat fat aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_pch_thermal mei_me sg mei lpc_ich mfd_core evdev ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_pad tls ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid mlx4_ib mlx4_en ib_core sd_mod ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect ehci_pci xhci_pci sysimgblt fb_sys_fops ahci libahci xhci_hcd ehci_hcd libata crc32c_intel nvme drm usbcore scsi_mod mlx4_core ixgbe i2c_core mdio usb_common devlink hwmon nvme_core rtc_
 cmos
> > [  660.467856] CPU: 4 PID: 660 Comm: cubemap Not tainted 4.16.0-1 #1
> > [  660.467890] Hardware name: Supermicro Super Server/X10SDV-4C-TLN2F, BIOS 1.2c 09/19/2017
> > [  660.467939] RIP: 0010:__kmalloc+0x7/0x1f0
> > [  660.467962] RSP: 0018:ffffabafc27b8000 EFLAGS: 00010206
> > [  660.467992] RAX: 000000000000000d RBX: 0000000000000010 RCX: ffffabafc27b8070
> > [  660.468030] RDX: ffff98a0d0235490 RSI: 0000000001080020 RDI: 000000000000001d
> > [  660.468069] RBP: 000000000000000d R08: ffff98a0d5be4860 R09: ffff98a0ec299180
> > [  660.468106] R10: ffffabafc27b80b8 R11: 0000000000000010 R12: 0000000000000010
> > [  660.468145] R13: ffff98a0ec299180 R14: ffff98a0ec299180 R15: 0000000000000000
> > [  660.468184] FS:  00007f8a35ffb700(0000) GS:ffff98a17fd00000(0000) knlGS:0000000000000000
> > [  660.468227] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  660.468258] CR2: ffffabafc27b7ff8 CR3: 00000004698ee001 CR4: 00000000003606e0
> > [  660.468297] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  660.468334] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  660.468373] Call Trace:
> > [  660.468401]  gcmaes_encrypt.constprop.5+0x137/0x240 [aesni_intel]
> > [  660.468439]  ? generic_gcmaes_encrypt+0x5f/0x80 [aesni_intel]
> > [  660.468476]  ? gcmaes_wrapper_encrypt+0x36/0x80 [aesni_intel]
> > [  660.468511]  ? tls_push_record+0x1d3/0x390 [tls]
> > [  660.468537]  ? tls_push_record+0x1d3/0x390 [tls]
> > [  660.468565]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468593]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468618]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468643]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.468671]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468697]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468722]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468748]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.468776]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468802]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468826]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468852]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.468880]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468906]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468931]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468957]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.470165]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.471363]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.472555]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.473713]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.474838]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.475927]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.476977]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.477999]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.478968]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.479902]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.480790]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.481644]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.482483]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.483301]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.484099]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.484891]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.485674]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.486455]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.487220]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.487890]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.488328]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.488748]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.489167]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.489565]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.489970]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.490370]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.490771]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.491165]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.491550]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.491914]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.492274]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.492641]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.493008]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.493374]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.493787]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.494177]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.494585]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.494972]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.495359]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.495742]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.496128]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.496512]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.496901]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.497301]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.497697]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.498096]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.498490]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.498884]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.499291]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.499700]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.500103]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.500511]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.500909]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.501326]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.501737]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.502131]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.502525]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.502928]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.503331]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.503724]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.504127]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.504547]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.504949]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.505348]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.505769]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.506207]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.506622]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.507030]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.507435]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.507841]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.508518]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.509261]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.510011]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.510187] BUG: stack guard page was hit at 00000000e0315e51 (stack is 00000000bea6f919..0000000005fc5eb4)
> > [  660.510473] BUG: stack guard page was hit at 000000004b958a15 (stack is 000000001f2af2d1..000000006295a4b1)
> > [  660.510758]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.513094]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.513886]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.514680]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.515487]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.515750] BUG: stack guard page was hit at 00000000bc93cf0d (stack is 0000000031a15c9c..0000000029a82776)
> > [  660.516295]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.518017]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.518883]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.519752]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.519816] BUG: stack guard page was hit at 000000002d1db286 (stack is 00000000b5bb06d4..000000007a29c8f2)
> > [  660.520544]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.522315]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.523162]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.524006]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.524849]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.525695]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.526545]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.527399]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.528247]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.529099]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.529955]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.530797]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.531643]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.532010] BUG: stack guard page was hit at 0000000027abda92 (stack is 00000000aadcb221..00000000a587b67b)
> > [  660.532535]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.534511]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.535506]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.536500]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.537495]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.538493]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.539482]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.540462]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.541447]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.542430]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.543411]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.544395]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.545382]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.546365]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.547347]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.548334]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.549318]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.550300]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.551284]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.552267]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.553250]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.554205]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.555158]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.556083]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.557009]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.557936]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.558862]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.559786]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.560681]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.561547]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.562413]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.563279]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.564143]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.564979]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.565783]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.566587]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.567392]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.568197]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.569000]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.569804]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.570609]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.571415]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.572218]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.573023]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.573830]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.574634]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.575437]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.576210]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.576953]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.577698]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.578441]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.579183]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.579929]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.580673]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.581417]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.582159]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.582904]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.583649]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.584394]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.585137]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.585882]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.586628]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.587372]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.588115]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.588861]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.589605]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.590350]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.591093]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.591840]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.592585]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.593328]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.594072]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.594816]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.595563]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.596308]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.597050]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.597794]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.598540]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.599281]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.600025]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.600772]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.601517]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.602260]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.603003]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.603750]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.604495]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.605238]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.605981]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.606726]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.607474]  ? tls_sw_sendpage+0x14a/0x390 [tls]
> > [  660.608214]  ? direct_splice_actor+0x40/0x40
> > [  660.608951]  ? inet_sendpage+0x40/0xf0
> > [  660.609689]  ? kernel_sendpage+0x1a/0x30
> > [  660.610426]  ? sock_sendpage+0x20/0x30
> > [  660.611161]  ? pipe_to_sendpage+0x5f/0x70
> > [  660.611898]  ? __splice_from_pipe+0x80/0x180
> > [  660.612637]  ? generic_file_splice_read+0x100/0x150
> > [  660.613382]  ? direct_splice_actor+0x40/0x40
> > [  660.614128]  ? splice_from_pipe+0x4f/0x70
> > [  660.614871]  ? direct_splice_actor+0x35/0x40
> > [  660.615619]  ? splice_direct_to_actor+0xce/0x1d0
> > [  660.616368]  ? generic_pipe_buf_nosteal+0x10/0x10
> > [  660.617122]  ? do_splice_direct+0x8c/0xa0
> > [  660.617876]  ? do_sendfile+0x19d/0x380
> > [  660.618626]  ? SyS_sendfile64+0x4c/0x90
> > [  660.619376]  ? do_syscall_64+0x7a/0x390
> > [  660.620121]  ? do_page_fault+0x31/0x130
> > [  660.620863]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > [  660.621618] Code: 24 08 4c 89 e9 48 89 de e8 d7 66 63 00 4d 8b 17 5a
> > 4d 85 d2 75 d7 e9 e9 fe ff ff 48 89 c5 e9 e1 fe ff ff 90 0f 1f 44 00 00
> > 41 57 <41> 56 41 55 41 54 55 53 48 81 ff 00 20 00 00 0f 87 a4 01 00 00 [
> > 660.623316] RIP: __kmalloc+0x7/0x1f0 RSP: ffffabafc27b8000
> > [  660.624168] ---[ end trace 7f6206177c0cc58f ]---
> 

^ permalink raw reply

* Re: [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
From: Mikulas Patocka @ 2018-04-24 17:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, Matthew Wilcox,
	virtualization, linux-mm, edumazet, Andrew Morton, David Miller,
	Vlastimil Babka
In-Reply-To: <20180424162906.GM17484@dhcp22.suse.cz>



On Tue, 24 Apr 2018, Michal Hocko wrote:

> On Tue 24-04-18 11:50:30, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > 
> > > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote:
> > > [...]
> > > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f
> > > >  	 */
> > > >  	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > > >  
> > > > +#ifdef CONFIG_DEBUG_SG
> > > > +	/* Catch bugs when the caller uses DMA API on the result of kvmalloc. */
> > > > +	if (!(prandom_u32_max(2) & 1))
> > > > +		goto do_vmalloc;
> > > > +#endif
> > > 
> > > I really do not think there is anything DEBUG_SG specific here. Why you
> > > simply do not follow should_failslab path or even reuse the function?
> > 
> > CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if 
> > you don't like CONFIG_DEBUG_SG, pick any other option that is enabled 
> > there).
> 
> Are you telling me that you are shaping a debugging functionality basing
> on what RHEL has enabled? And you call me evil. This is just rediculous.
> 
> > Fail-injection framework is if off by default and it must be explicitly 
> > enabled and configured by the user - and most users won't enable it.
> 
> It can be enabled easily. And if you care enough for your debugging
> kernel then just make it enabled unconditionally.

So, should we add a new option CONFIG_KVMALLOC_FALLBACK_DEFAULT? I'm not 
quite sure if 3 lines of debugging code need an extra option, but if you 
don't want to reuse any existing debug option, it may be possible. Adding 
it to the RHEL debug kernel would be trivial.

Mikulas

^ permalink raw reply

* Re: [PATCH bpf-next 14/15] xsk: statistics support
From: Willem de Bruijn @ 2018-04-24 16:58 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	michael.lundkvist, Brandeburg, Jesse, Singhai, Anjali,
	Zhang, Qi Z
In-Reply-To: <20180423135619.7179-15-bjorn.topel@gmail.com>

On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
>
> In this commit, a new getsockopt is added: XDP_STATISTICS. This is
> used to obtain stats from the sockets.
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>

> +static int xsk_getsockopt(struct socket *sock, int level, int optname,
> +                         char __user *optval, int __user *optlen)
> +{
> +       struct sock *sk = sock->sk;
> +       struct xdp_sock *xs = xdp_sk(sk);
> +       int len;
> +
> +       if (level != SOL_XDP)
> +               return -ENOPROTOOPT;
> +
> +       if (get_user(len, optlen))
> +               return -EFAULT;
> +       if (len < 0)
> +               return -EINVAL;
> +
> +       switch (optname) {
> +       case XDP_STATISTICS:
> +       {
> +               struct xdp_statistics stats;
> +
> +               if (len != sizeof(stats))
> +                       return -EINVAL;
> +
> +               mutex_lock(&xs->mutex);
> +               stats.rx_dropped = xs->rx_dropped;
> +               stats.rx_invalid_descs = xskq_nb_invalid_descs(xs->rx);
> +               stats.tx_invalid_descs = xskq_nb_invalid_descs(xs->tx);
> +               mutex_unlock(&xs->mutex);
> +
> +               if (copy_to_user(optval, &stats, sizeof(stats)))
> +                       return -EFAULT;
> +               return 0;

For forward compatibility, could allow caller to pass a struct larger
than stats and return the number of bytes filled in.

The lock can also be elided with something like gnet_stats, but it is probably
taken rarely enough that that is not worth the effort, at least right now.

^ permalink raw reply

* Re: [PATCH 1/1] Revert "rds: ib: add error handle"
From: Santosh Shilimkar @ 2018-04-24 16:58 UTC (permalink / raw)
  To: Dag Moxnes, Håkon Bugge
  Cc: Zhu Yanjun, OFED mailing list, rds-devel, davem, netdev
In-Reply-To: <373de57c-4cab-0d04-0021-57b566cefe0d@oracle.com>

On 4/24/2018 4:25 AM, Dag Moxnes wrote:
> I was going to suggest the following correction:
> 
> 
> If all agree that this is the correct way of doing it, I can go ahead 
> and an post it.
> 
Yes please. Go ahead and post your fix.

Regards,
Santosh
P.S: Avoid top posting please.

^ permalink raw reply

* Re: [PATCH bpf-next 13/15] xsk: support for Tx
From: Willem de Bruijn @ 2018-04-24 16:57 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	michael.lundkvist, Brandeburg, Jesse, Singhai, Anjali,
	Zhang, Qi Z
In-Reply-To: <20180423135619.7179-14-bjorn.topel@gmail.com>

On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
>
> Here, Tx support is added. The user fills the Tx queue with frames to
> be sent by the kernel, and let's the kernel know using the sendmsg
> syscall.
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>

> +static int xsk_xmit_skb(struct sk_buff *skb)

This is basically packet_direct_xmit. Might be better to just move that
to net/core/dev.c and use in both AF_PACKET and AF_XDP.

Also, (eventually) AF_XDP may also want to support the regular path
through dev_queue_xmit to go through traffic shaping.

> +{
> +       struct net_device *dev = skb->dev;
> +       struct sk_buff *orig_skb = skb;
> +       struct netdev_queue *txq;
> +       int ret = NETDEV_TX_BUSY;
> +       bool again = false;
> +
> +       if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
> +               goto drop;
> +
> +       skb = validate_xmit_skb_list(skb, dev, &again);
> +       if (skb != orig_skb)
> +               return NET_XMIT_DROP;

Need to free generated segment list on error, see packet_direct_xmit.

> +
> +       txq = skb_get_tx_queue(dev, skb);
> +
> +       local_bh_disable();
> +
> +       HARD_TX_LOCK(dev, txq, smp_processor_id());
> +       if (!netif_xmit_frozen_or_drv_stopped(txq))
> +               ret = netdev_start_xmit(skb, dev, txq, false);
> +       HARD_TX_UNLOCK(dev, txq);
> +
> +       local_bh_enable();
> +
> +       if (!dev_xmit_complete(ret))
> +               goto out_err;
> +
> +       return ret;
> +drop:
> +       atomic_long_inc(&dev->tx_dropped);
> +out_err:
> +       return NET_XMIT_DROP;
> +}

> +static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
> +                           size_t total_len)
> +{
> +       bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
> +       u32 max_batch = TX_BATCH_SIZE;
> +       struct xdp_sock *xs = xdp_sk(sk);
> +       bool sent_frame = false;
> +       struct xdp_desc desc;
> +       struct sk_buff *skb;
> +       int err = 0;
> +
> +       if (unlikely(!xs->tx))
> +               return -ENOBUFS;
> +       if (need_wait)
> +               return -EOPNOTSUPP;
> +
> +       mutex_lock(&xs->mutex);
> +
> +       while (xskq_peek_desc(xs->tx, &desc)) {

It is possible to pass a chain of skbs to validate_xmit_skb_list and
eventually pass this chain to xsk_xmit_skb, amortizing the cost of
taking the txq lock. Fine to ignore for this patch set.

> +               char *buffer;
> +               u32 id, len;
> +
> +               if (max_batch-- == 0) {
> +                       err = -EAGAIN;
> +                       goto out;
> +               }
> +
> +               if (xskq_reserve_id(xs->umem->cq)) {
> +                       err = -EAGAIN;
> +                       goto out;
> +               }
> +
> +               len = desc.len;
> +               if (unlikely(len > xs->dev->mtu)) {
> +                       err = -EMSGSIZE;
> +                       goto out;
> +               }
> +
> +               skb = sock_alloc_send_skb(sk, len, !need_wait, &err);
> +               if (unlikely(!skb)) {
> +                       err = -EAGAIN;
> +                       goto out;
> +               }
> +
> +               skb_put(skb, len);
> +               id = desc.idx;
> +               buffer = xdp_umem_get_data(xs->umem, id) + desc.offset;
> +               err = skb_store_bits(skb, 0, buffer, len);
> +               if (unlikely(err))
> +                       goto out_store;

As xsk_destruct_skb delays notification until consume_skb is called, this
copy can be avoided by linking the xdp buffer into the skb frags array,
analogous to tpacket_snd.

You probably don't care much about the copy slow path, and this can be
implemented later, so also no need to do in this patchset.

static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
+                                             struct xdp_desc *desc)
+{
+       struct xdp_rxtx_ring *ring;
+
+       if (q->cons_tail == q->cons_head) {
+               WRITE_ONCE(q->ring->consumer, q->cons_tail);
+               q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
+
+               /* Order consumer and data */
+               smp_rmb();
+
+               return xskq_validate_desc(q, desc);
+       }
+
+       ring = (struct xdp_rxtx_ring *)q->ring;
+       *desc = ring->desc[q->cons_tail & q->ring_mask];
+       return desc;

This only validates descriptors if taking the branch.

^ permalink raw reply

* Re: [PATCH bpf-next 07/15] xsk: add Rx receive functions and poll support
From: Willem de Bruijn @ 2018-04-24 16:56 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	Björn Töpel, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <20180423135619.7179-8-bjorn.topel@gmail.com>

On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Here the actual receive functions of AF_XDP are implemented, that in a
> later commit, will be called from the XDP layers.
>
> There's one set of functions for the XDP_DRV side and another for
> XDP_SKB (generic).
>
> Support for the poll syscall is also implemented.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

> +/* Common functions operating for both RXTX and umem queues */
> +
> +static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
> +{
> +       u32 entries = q->prod_tail - q->cons_tail;
> +
> +       if (entries == 0) {
> +               /* Refresh the local pointer */
> +               q->prod_tail = READ_ONCE(q->ring->producer);
> +       }
> +
> +       entries = q->prod_tail - q->cons_tail;

Probably meant to be inside the branch? Though I see the same
pattern in the userspace example program.

> +static inline u32 *xskq_validate_id(struct xsk_queue *q)
> +{
> +       while (q->cons_tail != q->cons_head) {
> +               struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> +               unsigned int idx = q->cons_tail & q->ring_mask;
> +
> +               if (xskq_is_valid_id(q, ring->desc[idx]))
> +                       return &ring->desc[idx];

Missing a q->cons_tail increment in this loop?

^ permalink raw reply

* Re: [PATCH bpf-next 08/15] bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP
From: Willem de Bruijn @ 2018-04-24 16:56 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	Björn Töpel, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <20180423135619.7179-9-bjorn.topel@gmail.com>

On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The xskmap is yet another BPF map, very much inspired by
> dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
> adds AF_XDP sockets into the map, and by using the bpf_redirect_map
> helper, an XDP program can redirect XDP frames to an AF_XDP socket.
>
> Note that a socket that is bound to certain ifindex/queue index will
> *only* accept XDP frames from that netdev/queue index. If an XDP
> program tries to redirect from a netdev/queue index other than what
> the socket is bound to, the frame will not be received on the socket.
>
> A socket can reside in multiple maps.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

> +struct xsk_map_entry {
> +       struct xdp_sock *xs;
> +       struct rcu_head rcu;
> +};

> +struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, u32 key)
> +{
> +       struct xsk_map *m = container_of(map, struct xsk_map, map);
> +       struct xsk_map_entry *entry;
> +
> +       if (key >= map->max_entries)
> +               return NULL;
> +
> +       entry = READ_ONCE(m->xsk_map[key]);
> +       return entry ? entry->xs : NULL;
> +}

This dynamically allocated structure adds an extra cacheline lookup. If
xdp_sock gets an rcu_head, it can be linked into the map directly.

^ permalink raw reply

* Re: [PATCH bpf-next 05/15] xsk: add support for bind for Rx
From: Willem de Bruijn @ 2018-04-24 16:55 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	michael.lundkvist, Brandeburg, Jesse, Singhai, Anjali,
	Zhang, Qi Z
In-Reply-To: <20180423135619.7179-6-bjorn.topel@gmail.com>

On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
>
> Here, the bind syscall is added. Binding an AF_XDP socket, means
> associating the socket to an umem, a netdev and a queue index. This
> can be done in two ways.
>
> The first way, creating a "socket from scratch". Create the umem using
> the XDP_UMEM_REG setsockopt and an associated fill queue with
> XDP_UMEM_FILL_QUEUE. Create the Rx queue using the XDP_RX_QUEUE
> setsockopt. Call bind passing ifindex and queue index ("channel" in
> ethtool speak).
>
> The second way to bind a socket, is simply skipping the
> umem/netdev/queue index, and passing another already setup AF_XDP
> socket. The new socket will then have the same umem/netdev/queue index
> as the parent so it will share the same umem. You must also set the
> flags field in the socket address to XDP_SHARED_UMEM.
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---

> +static struct socket *xsk_lookup_xsk_from_fd(int fd, int *err)
> +{
> +       struct socket *sock;
> +
> +       *err = -ENOTSOCK;
> +       sock = sockfd_lookup(fd, err);
> +       if (!sock)
> +               return NULL;
> +
> +       if (sock->sk->sk_family != PF_XDP) {
> +               *err = -ENOPROTOOPT;
> +               sockfd_put(sock);
> +               return NULL;
> +       }
> +
> +       *err = 0;
> +       return sock;
> +}

In this and similar cases, can use ERR_PTR to avoid the extra argument.

^ permalink raw reply

* Re: [PATCH bpf-next 03/15] xsk: add umem fill queue support and mmap
From: Willem de Bruijn @ 2018-04-24 16:55 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Michael S. Tsirkin, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Alexander Duyck, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Daniel Borkmann,
	Network Development, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <CAJ8uoz0CSmQeO6E4fNrvivPau5zDOJhTtbXBxX_Z3HY5c3gvAQ@mail.gmail.com>

>>>> +/* Pgoff for mmaping the rings */
>>>> +#define XDP_UMEM_PGOFF_FILL_RING     0x100000000
>>>> +
>>>> +struct xdp_ring {
>>>> +     __u32 producer __attribute__((aligned(64)));
>>>> +     __u32 consumer __attribute__((aligned(64)));
>>>> +};
>>>
>>> Why 64? And do you still need these guys in uapi?
>>
>> I was just about to ask the same. You mean cacheline_aligned?
>
> Yes, I would like to have these cache aligned. How can I accomplish
> this in a uapi?

Good point. This seems fine to me.

> I put a note around this in the cover letter:
>
> * How to deal with cache alignment for uapi when different
>   architectures can have different cache line sizes? We have just
>   aligned it to 64 bytes for now, which works for many popular
>   architectures, but not all. Please advise.
>
>>
>>>> +static int xsk_mmap(struct file *file, struct socket *sock,
>>>> +                 struct vm_area_struct *vma)
>>>> +{
>>>> +     unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
>>>> +     unsigned long size = vma->vm_end - vma->vm_start;
>>>> +     struct xdp_sock *xs = xdp_sk(sock->sk);
>>>> +     struct xsk_queue *q;
>>>> +     unsigned long pfn;
>>>> +     struct page *qpg;
>>>> +
>>>> +     if (!xs->umem)
>>>> +             return -EINVAL;
>>>> +
>>>> +     if (offset == XDP_UMEM_PGOFF_FILL_RING)
>>>> +             q = xs->umem->fq;
>>>> +     else
>>>> +             return -EINVAL;
>>>> +
>>>> +     qpg = virt_to_head_page(q->ring);
>>
>> Is it assured that q is initialized with a call to setsockopt
>> XDP_UMEM_FILL_RING before the call the mmap?
>
> Unfortunately not, so this is a bug. Case in point for running
> syzkaller below, definitely.
>
>> In general, with such an extensive new API, it might be worthwhile to
>> run syzkaller locally on a kernel with these patches. It is pretty
>> easy to set up (https://github.com/google/syzkaller/blob/master/docs/linux/setup.md),
>> though it also needs to be taught about any new APIs.
>
> Good idea. Will set this up and have it torture the API.
>
> Thanks: Magnus

Great, thanks. I forgot to mention how to encode the new APIs for syzkaller:

https://github.com/google/syzkaller/blob/master/docs/syscall_descriptions.md

^ permalink raw reply

* Re: [PATCH] net: phy: TLK10X initial driver submission
From: Florian Fainelli @ 2018-04-24 16:52 UTC (permalink / raw)
  To: Måns Andersson, Rob Herring, Mark Rutland, Andrew Lunn,
	netdev, devicetree, linux-kernel
In-Reply-To: <20180419082816.109338-1-mans.andersson@nibe.se>



On 04/19/2018 01:28 AM, Måns Andersson wrote:
> From: Mans Andersson <mans.andersson@nibe.se>
> 
> Add suport for the TI TLK105 and TLK106 10/100Mbit ethernet phys.
> 
> In addition the TLK10X needs to be removed from DP83848 driver as the
> power back off support is added here for this device.

I would not think this is a compelling enough reason, you could very
well just adjust the dp83848.c driver just to account for these
properties that you are introducing. More comments below.

[snip]

> +#define TLK10X_INT_EN_MASK		\
> +	(TLK10X_MISR_ANC_INT_EN |	\
> +	 TLK10X_MISR_DUP_INT_EN |	\
> +	 TLK10X_MISR_SPD_INT_EN |	\
> +	 TLK10X_MISR_LINK_INT_EN)
> +
> +struct tlk10x_private {
> +	int pwrbo_level;

unsigned int

> +};
> +
> +static int tlk10x_read(struct phy_device *phydev, int reg)
> +{
> +	if (reg & ~0x1f) {
> +		/* Extended register */
> +		phy_write(phydev, TLK10X_REGCR, 0x001F);
> +		phy_write(phydev, TLK10X_ADDAR, reg);
> +		phy_write(phydev, TLK10X_REGCR, 0x401F);
> +		reg = TLK10X_ADDAR;
> +	}

Humm, this looks a bit fragile, you would likely want to create separate
helper functions for these extended registers and make sure you handle
write failures as well. Also consider making use of the page helpers
from include/linux/phy.h.

> +
> +	return phy_read(phydev, reg);
> +}
> +
> +static int tlk10x_write(struct phy_device *phydev, int reg, int val)
> +{
> +	if (reg & ~0x1f) {
> +		/* Extended register */
> +		phy_write(phydev, TLK10X_REGCR, 0x001F);
> +		phy_write(phydev, TLK10X_ADDAR, reg);
> +		phy_write(phydev, TLK10X_REGCR, 0x401F);
> +		reg = TLK10X_ADDAR;
> +	}

Same here.

> +
> +	return phy_write(phydev, reg, val);
> +}
> +
> +#ifdef CONFIG_OF_MDIO
> +static int tlk10x_of_init(struct phy_device *phydev)
> +{
> +	struct tlk10x_private *tlk10x = phydev->priv;
> +	struct device *dev = &phydev->mdio.dev;
> +	struct device_node *of_node = dev->of_node;
> +	int ret;
> +
> +	if (!of_node)
> +		return 0;
> +
> +	ret = of_property_read_u32(of_node, "ti,power-back-off",
> +				   &tlk10x->pwrbo_level);
> +	if (ret) {
> +		dev_err(dev, "missing ti,power-back-off property");
> +		tlk10x->pwrbo_level = 0;

This should not be necessary, that should be the default with a zero
initialized private data structure.

> +	}
> +
> +	return 0;
> +}
> +#else
> +static int tlk10x_of_init(struct phy_device *phydev)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_OF_MDIO */
> +
> +static int tlk10x_config_init(struct phy_device *phydev)
> +{
> +	int ret, reg;
> +	struct tlk10x_private *tlk10x;
> +
> +	ret = genphy_config_init(phydev);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (!phydev->priv) {
> +		tlk10x = devm_kzalloc(&phydev->mdio.dev, sizeof(*tlk10x),
> +				      GFP_KERNEL);
> +		if (!tlk10x)
> +			return -ENOMEM;
> +
> +		phydev->priv = tlk10x;
> +		ret = tlk10x_of_init(phydev);
> +		if (ret)
> +			return ret;
> +	} else {
> +		tlk10x = (struct tlk10x_private *)phydev->priv;
> +	}

You need to implement a probe() function that is responsible for
allocation private memory instead of doing this check.

> +
> +	// Power back off
> +	if (tlk10x->pwrbo_level < 0 || tlk10x->pwrbo_level > 3)
> +		tlk10x->pwrbo_level = 0;

How can you have pwrb_level < 0 when you use of_read_property_u32()?

> +	reg = tlk10x_read(phydev, TLK10X_PWRBOCR);
> +	reg = ((reg & ~TLK10X_PWRBOCR_MASK)
> +		| (tlk10x->pwrbo_level << 6));

One too many levels of parenthesis, the outer ones should not be necessary.

> +	ret = tlk10x_write(phydev, TLK10X_PWRBOCR, reg);
> +	if (ret < 0) {
> +		dev_err(&phydev->mdio.dev,
> +			"unable to set power back-off (err=%d)\n", ret);
> +		return ret;
> +	}
> +	dev_info(&phydev->mdio.dev, "power back-off set to level %d\n",
> +		 tlk10x->pwrbo_level);

config_init() is called often, consider making this a debugging statement.

-- 
Florian

^ permalink raw reply

* Re: VRF: Ingress IPv6 Linklocal/Multicast destined pkt from slave VRF device does not map to Master device socket
From: David Ahern @ 2018-04-24 16:51 UTC (permalink / raw)
  To: Sukumar Gopalakrishnan, netdev
In-Reply-To: <CADiZnkRRJCrHu_QkwNb3G49gdyicJkbeB8YctrMb6jZc9uq6rg@mail.gmail.com>

On 4/23/18 11:57 PM, Sukumar Gopalakrishnan wrote:
> Get master device address from (skb->dev) and  pass master  to socket
> lookup up function for Ipv6 Linklocal/Multicast address.
> 
> ipv6_raw_deliver()
> {
> int mdif;
> ..
> ..
>         mdif = (((nexthdr == IPPROTO_PIM || nexthdr == 89 /* IPPROTO_OSPF */ ||
>                 nexthdr == IPPROTO_ICMPV6 || nexthdr == 112 /*IPPROTO_VRRP*/) &&
>                 (ipv6_addr_type(daddr) &
>                 (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL))) ?
>                 l3mdev_master_ifindex_rcu(skb->dev) : inet6_iif(skb));
> 
> 
>         sk = __raw_v6_lookup(net, sk, nexthdr, daddr, saddr, mdif,
> inet6_sdif(skb));
> 

Packets destined to a linklocal and mcast address stay bound to the
actual ingress device as that is their scope.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox