Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net v2 10/15] drivers: net: cirrus: mac89x0: Remove this driver
From: Geert Uytterhoeven @ 2026-04-23  7:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Michael Fritscher, Byron Stanoszek, Daniel Palmer, linux-kernel,
	netdev, linux-doc, linux-m68k
In-Reply-To: <20260422-v7-0-0-net-next-driver-removal-v1-v2-10-08a5b59784d5@lunn.ch>

CC linux-m68k

On Thu, 23 Apr 2026 at 02:29, Andrew Lunn <andrew@lunn.ch> wrote:
> The mac89x0 was written by Russell Nelson in 1996. It is an MAC

It is based on the ISA cs89x0 driver, written by Russell Nelson.

> device, so unlikely to be used with modern kernels.

Macs do run modern kernels.

>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/ethernet/cirrus/Kconfig   |  10 -
>  drivers/net/ethernet/cirrus/Makefile  |   1 -
>  drivers/net/ethernet/cirrus/cs89x0.h  | 461 ---------------------------
>  drivers/net/ethernet/cirrus/mac89x0.c | 577 ----------------------------------
>  4 files changed, 1049 deletions(-)
>
> diff --git a/drivers/net/ethernet/cirrus/Kconfig b/drivers/net/ethernet/cirrus/Kconfig
> index 1a0c7b3bfcd6..786d379e79fe 100644
> --- a/drivers/net/ethernet/cirrus/Kconfig
> +++ b/drivers/net/ethernet/cirrus/Kconfig
> @@ -25,14 +25,4 @@ config EP93XX_ETH
>           This is a driver for the ethernet hardware included in EP93xx CPUs.
>           Say Y if you are building a kernel for EP93xx based devices.
>
> -config MAC89x0
> -       tristate "Macintosh CS89x0 based ethernet cards"
> -       depends on MAC
> -       help
> -         Support for CS89x0 chipset based Ethernet cards.  If you have a
> -         Nubus or LC-PDS network (Ethernet) card of this type, say Y here.
> -
> -         To compile this driver as a module, choose M here. This module will
> -         be called mac89x0.
> -
>  endif # NET_VENDOR_CIRRUS
> diff --git a/drivers/net/ethernet/cirrus/Makefile b/drivers/net/ethernet/cirrus/Makefile
> index cb740939d976..03800af0f0e1 100644
> --- a/drivers/net/ethernet/cirrus/Makefile
> +++ b/drivers/net/ethernet/cirrus/Makefile
> @@ -4,4 +4,3 @@
>  #
>
>  obj-$(CONFIG_EP93XX_ETH) += ep93xx_eth.o
> -obj-$(CONFIG_MAC89x0) += mac89x0.o
> diff --git a/drivers/net/ethernet/cirrus/cs89x0.h b/drivers/net/ethernet/cirrus/cs89x0.h
> deleted file mode 100644
> index 210f9ec9af4b..000000000000
> --- a/drivers/net/ethernet/cirrus/cs89x0.h
> +++ /dev/null
> @@ -1,461 +0,0 @@
> -/*  Copyright, 1988-1992, Russell Nelson, Crynwr Software
> -
> -   This program is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU General Public License as published by
> -   the Free Software Foundation, version 1.
> -
> -   This program is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU General Public License for more details.
> -
> -   You should have received a copy of the GNU General Public License
> -   along with this program; if not, write to the Free Software
> -   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> -   */
> -
> -
> -#define PP_ChipID 0x0000       /* offset   0h -> Corp -ID              */
> -                               /* offset   2h -> Model/Product Number  */
> -                               /* offset   3h -> Chip Revision Number  */
> -
> -#define PP_ISAIOB 0x0020       /*  IO base address */
> -#define PP_CS8900_ISAINT 0x0022        /*  ISA interrupt select */
> -#define PP_CS8920_ISAINT 0x0370        /*  ISA interrupt select */
> -#define PP_CS8900_ISADMA 0x0024        /*  ISA Rec DMA channel */
> -#define PP_CS8920_ISADMA 0x0374        /*  ISA Rec DMA channel */
> -#define PP_ISASOF 0x0026       /*  ISA DMA offset */
> -#define PP_DmaFrameCnt 0x0028  /*  ISA DMA Frame count */
> -#define PP_DmaByteCnt 0x002A   /*  ISA DMA Byte count */
> -#define PP_CS8900_ISAMemB 0x002C       /*  Memory base */
> -#define PP_CS8920_ISAMemB 0x0348 /*  */
> -
> -#define PP_ISABootBase 0x0030  /*  Boot Prom base  */
> -#define PP_ISABootMask 0x0034  /*  Boot Prom Mask */
> -
> -/* EEPROM data and command registers */
> -#define PP_EECMD 0x0040                /*  NVR Interface Command register */
> -#define PP_EEData 0x0042       /*  NVR Interface Data Register */
> -#define PP_DebugReg 0x0044     /*  Debug Register */
> -
> -#define PP_RxCFG 0x0102                /*  Rx Bus config */
> -#define PP_RxCTL 0x0104                /*  Receive Control Register */
> -#define PP_TxCFG 0x0106                /*  Transmit Config Register */
> -#define PP_TxCMD 0x0108                /*  Transmit Command Register */
> -#define PP_BufCFG 0x010A       /*  Bus configuration Register */
> -#define PP_LineCTL 0x0112      /*  Line Config Register */
> -#define PP_SelfCTL 0x0114      /*  Self Command Register */
> -#define PP_BusCTL 0x0116       /*  ISA bus control Register */
> -#define PP_TestCTL 0x0118      /*  Test Register */
> -#define PP_AutoNegCTL 0x011C   /*  Auto Negotiation Ctrl */
> -
> -#define PP_ISQ 0x0120          /*  Interrupt Status */
> -#define PP_RxEvent 0x0124      /*  Rx Event Register */
> -#define PP_TxEvent 0x0128      /*  Tx Event Register */
> -#define PP_BufEvent 0x012C     /*  Bus Event Register */
> -#define PP_RxMiss 0x0130       /*  Receive Miss Count */
> -#define PP_TxCol 0x0132                /*  Transmit Collision Count */
> -#define PP_LineST 0x0134       /*  Line State Register */
> -#define PP_SelfST 0x0136       /*  Self State register */
> -#define PP_BusST 0x0138                /*  Bus Status */
> -#define PP_TDR 0x013C          /*  Time Domain Reflectometry */
> -#define PP_AutoNegST 0x013E    /*  Auto Neg Status */
> -#define PP_TxCommand 0x0144    /*  Tx Command */
> -#define PP_TxLength 0x0146     /*  Tx Length */
> -#define PP_LAF 0x0150          /*  Hash Table */
> -#define PP_IA 0x0158           /*  Physical Address Register */
> -
> -#define PP_RxStatus 0x0400     /*  Receive start of frame */
> -#define PP_RxLength 0x0402     /*  Receive Length of frame */
> -#define PP_RxFrame 0x0404      /*  Receive frame pointer */
> -#define PP_TxFrame 0x0A00      /*  Transmit frame pointer */
> -
> -/*  Primary I/O Base Address. If no I/O base is supplied by the user, then this */
> -/*  can be used as the default I/O base to access the PacketPage Area. */
> -#define DEFAULTIOBASE 0x0300
> -#define FIRST_IO 0x020C                /*  First I/O port to check */
> -#define LAST_IO 0x037C         /*  Last I/O port to check (+10h) */
> -#define ADD_MASK 0x3000                /*  Mask it use of the ADD_PORT register */
> -#define ADD_SIG 0x3000         /*  Expected ID signature */
> -
> -/* On Macs, we only need use the ISA I/O stuff until we do MEMORY_ON */
> -#ifdef CONFIG_MAC
> -#define LCSLOTBASE 0xfee00000
> -#define MMIOBASE 0x40000
> -#endif
> -
> -#define CHIP_EISA_ID_SIG 0x630E   /*  Product ID Code for Crystal Chip (CS8900 spec 4.3) */
> -#define CHIP_EISA_ID_SIG_STR "0x630E"
> -
> -#ifdef IBMEIPKT
> -#define EISA_ID_SIG 0x4D24     /*  IBM */
> -#define PART_NO_SIG 0x1010     /*  IBM */
> -#define MONGOOSE_BIT 0x0000    /*  IBM */
> -#else
> -#define EISA_ID_SIG 0x630E     /*  PnP Vendor ID (same as chip id for Crystal board) */
> -#define PART_NO_SIG 0x4000     /*  ID code CS8920 board (PnP Vendor Product code) */
> -#define MONGOOSE_BIT 0x2000    /*  PART_NO_SIG + MONGOOSE_BUT => ID of mongoose */
> -#endif
> -
> -#define PRODUCT_ID_ADD 0x0002   /*  Address of product ID */
> -
> -/*  Mask to find out the types of  registers */
> -#define REG_TYPE_MASK 0x001F
> -
> -/*  Eeprom Commands */
> -#define ERSE_WR_ENBL 0x00F0
> -#define ERSE_WR_DISABLE 0x0000
> -
> -/*  Defines Control/Config register quintuplet numbers */
> -#define RX_BUF_CFG 0x0003
> -#define RX_CONTROL 0x0005
> -#define TX_CFG 0x0007
> -#define TX_COMMAND 0x0009
> -#define BUF_CFG 0x000B
> -#define LINE_CONTROL 0x0013
> -#define SELF_CONTROL 0x0015
> -#define BUS_CONTROL 0x0017
> -#define TEST_CONTROL 0x0019
> -
> -/*  Defines Status/Count registers quintuplet numbers */
> -#define RX_EVENT 0x0004
> -#define TX_EVENT 0x0008
> -#define BUF_EVENT 0x000C
> -#define RX_MISS_COUNT 0x0010
> -#define TX_COL_COUNT 0x0012
> -#define LINE_STATUS 0x0014
> -#define SELF_STATUS 0x0016
> -#define BUS_STATUS 0x0018
> -#define TDR 0x001C
> -
> -/* PP_RxCFG - Receive  Configuration and Interrupt Mask bit definition -  Read/write */
> -#define SKIP_1 0x0040
> -#define RX_STREAM_ENBL 0x0080
> -#define RX_OK_ENBL 0x0100
> -#define RX_DMA_ONLY 0x0200
> -#define AUTO_RX_DMA 0x0400
> -#define BUFFER_CRC 0x0800
> -#define RX_CRC_ERROR_ENBL 0x1000
> -#define RX_RUNT_ENBL 0x2000
> -#define RX_EXTRA_DATA_ENBL 0x4000
> -
> -/* PP_RxCTL - Receive Control bit definition - Read/write */
> -#define RX_IA_HASH_ACCEPT 0x0040
> -#define RX_PROM_ACCEPT 0x0080
> -#define RX_OK_ACCEPT 0x0100
> -#define RX_MULTCAST_ACCEPT 0x0200
> -#define RX_IA_ACCEPT 0x0400
> -#define RX_BROADCAST_ACCEPT 0x0800
> -#define RX_BAD_CRC_ACCEPT 0x1000
> -#define RX_RUNT_ACCEPT 0x2000
> -#define RX_EXTRA_DATA_ACCEPT 0x4000
> -#define RX_ALL_ACCEPT (RX_PROM_ACCEPT|RX_BAD_CRC_ACCEPT|RX_RUNT_ACCEPT|RX_EXTRA_DATA_ACCEPT)
> -/*  Default receive mode - individually addressed, broadcast, and error free */
> -#define DEF_RX_ACCEPT (RX_IA_ACCEPT | RX_BROADCAST_ACCEPT | RX_OK_ACCEPT)
> -
> -/* PP_TxCFG - Transmit Configuration Interrupt Mask bit definition - Read/write */
> -#define TX_LOST_CRS_ENBL 0x0040
> -#define TX_SQE_ERROR_ENBL 0x0080
> -#define TX_OK_ENBL 0x0100
> -#define TX_LATE_COL_ENBL 0x0200
> -#define TX_JBR_ENBL 0x0400
> -#define TX_ANY_COL_ENBL 0x0800
> -#define TX_16_COL_ENBL 0x8000
> -
> -/* PP_TxCMD - Transmit Command bit definition - Read-only */
> -#define TX_START_4_BYTES 0x0000
> -#define TX_START_64_BYTES 0x0040
> -#define TX_START_128_BYTES 0x0080
> -#define TX_START_ALL_BYTES 0x00C0
> -#define TX_FORCE 0x0100
> -#define TX_ONE_COL 0x0200
> -#define TX_TWO_PART_DEFF_DISABLE 0x0400
> -#define TX_NO_CRC 0x1000
> -#define TX_RUNT 0x2000
> -
> -/* PP_BufCFG - Buffer Configuration Interrupt Mask bit definition - Read/write */
> -#define GENERATE_SW_INTERRUPT 0x0040
> -#define RX_DMA_ENBL 0x0080
> -#define READY_FOR_TX_ENBL 0x0100
> -#define TX_UNDERRUN_ENBL 0x0200
> -#define RX_MISS_ENBL 0x0400
> -#define RX_128_BYTE_ENBL 0x0800
> -#define TX_COL_COUNT_OVRFLOW_ENBL 0x1000
> -#define RX_MISS_COUNT_OVRFLOW_ENBL 0x2000
> -#define RX_DEST_MATCH_ENBL 0x8000
> -
> -/* PP_LineCTL - Line Control bit definition - Read/write */
> -#define SERIAL_RX_ON 0x0040
> -#define SERIAL_TX_ON 0x0080
> -#define AUI_ONLY 0x0100
> -#define AUTO_AUI_10BASET 0x0200
> -#define MODIFIED_BACKOFF 0x0800
> -#define NO_AUTO_POLARITY 0x1000
> -#define TWO_PART_DEFDIS 0x2000
> -#define LOW_RX_SQUELCH 0x4000
> -
> -/* PP_SelfCTL - Software Self Control bit definition - Read/write */
> -#define POWER_ON_RESET 0x0040
> -#define SW_STOP 0x0100
> -#define SLEEP_ON 0x0200
> -#define AUTO_WAKEUP 0x0400
> -#define HCB0_ENBL 0x1000
> -#define HCB1_ENBL 0x2000
> -#define HCB0 0x4000
> -#define HCB1 0x8000
> -
> -/* PP_BusCTL - ISA Bus Control bit definition - Read/write */
> -#define RESET_RX_DMA 0x0040
> -#define MEMORY_ON 0x0400
> -#define DMA_BURST_MODE 0x0800
> -#define IO_CHANNEL_READY_ON 0x1000
> -#define RX_DMA_SIZE_64K 0x2000
> -#define ENABLE_IRQ 0x8000
> -
> -/* PP_TestCTL - Test Control bit definition - Read/write */
> -#define LINK_OFF 0x0080
> -#define ENDEC_LOOPBACK 0x0200
> -#define AUI_LOOPBACK 0x0400
> -#define BACKOFF_OFF 0x0800
> -#define FDX_8900 0x4000
> -#define FAST_TEST 0x8000
> -
> -/* PP_RxEvent - Receive Event Bit definition - Read-only */
> -#define RX_IA_HASHED 0x0040
> -#define RX_DRIBBLE 0x0080
> -#define RX_OK 0x0100
> -#define RX_HASHED 0x0200
> -#define RX_IA 0x0400
> -#define RX_BROADCAST 0x0800
> -#define RX_CRC_ERROR 0x1000
> -#define RX_RUNT 0x2000
> -#define RX_EXTRA_DATA 0x4000
> -
> -#define HASH_INDEX_MASK 0x0FC00
> -
> -/* PP_TxEvent - Transmit Event Bit definition - Read-only */
> -#define TX_LOST_CRS 0x0040
> -#define TX_SQE_ERROR 0x0080
> -#define TX_OK 0x0100
> -#define TX_LATE_COL 0x0200
> -#define TX_JBR 0x0400
> -#define TX_16_COL 0x8000
> -#define TX_SEND_OK_BITS (TX_OK|TX_LOST_CRS)
> -#define TX_COL_COUNT_MASK 0x7800
> -
> -/* PP_BufEvent - Buffer Event Bit definition - Read-only */
> -#define SW_INTERRUPT 0x0040
> -#define RX_DMA 0x0080
> -#define READY_FOR_TX 0x0100
> -#define TX_UNDERRUN 0x0200
> -#define RX_MISS 0x0400
> -#define RX_128_BYTE 0x0800
> -#define TX_COL_OVRFLW 0x1000
> -#define RX_MISS_OVRFLW 0x2000
> -#define RX_DEST_MATCH 0x8000
> -
> -/* PP_LineST - Ethernet Line Status bit definition - Read-only */
> -#define LINK_OK 0x0080
> -#define AUI_ON 0x0100
> -#define TENBASET_ON 0x0200
> -#define POLARITY_OK 0x1000
> -#define CRS_OK 0x4000
> -
> -/* PP_SelfST - Chip Software Status bit definition */
> -#define ACTIVE_33V 0x0040
> -#define INIT_DONE 0x0080
> -#define SI_BUSY 0x0100
> -#define EEPROM_PRESENT 0x0200
> -#define EEPROM_OK 0x0400
> -#define EL_PRESENT 0x0800
> -#define EE_SIZE_64 0x1000
> -
> -/* PP_BusST - ISA Bus Status bit definition */
> -#define TX_BID_ERROR 0x0080
> -#define READY_FOR_TX_NOW 0x0100
> -
> -/* PP_AutoNegCTL - Auto Negotiation Control bit definition */
> -#define RE_NEG_NOW 0x0040
> -#define ALLOW_FDX 0x0080
> -#define AUTO_NEG_ENABLE 0x0100
> -#define NLP_ENABLE 0x0200
> -#define FORCE_FDX 0x8000
> -#define AUTO_NEG_BITS (FORCE_FDX|NLP_ENABLE|AUTO_NEG_ENABLE)
> -#define AUTO_NEG_MASK (FORCE_FDX|NLP_ENABLE|AUTO_NEG_ENABLE|ALLOW_FDX|RE_NEG_NOW)
> -
> -/* PP_AutoNegST - Auto Negotiation Status bit definition */
> -#define AUTO_NEG_BUSY 0x0080
> -#define FLP_LINK 0x0100
> -#define FLP_LINK_GOOD 0x0800
> -#define LINK_FAULT 0x1000
> -#define HDX_ACTIVE 0x4000
> -#define FDX_ACTIVE 0x8000
> -
> -/*  The following block defines the ISQ event types */
> -#define ISQ_RECEIVER_EVENT 0x04
> -#define ISQ_TRANSMITTER_EVENT 0x08
> -#define ISQ_BUFFER_EVENT 0x0c
> -#define ISQ_RX_MISS_EVENT 0x10
> -#define ISQ_TX_COL_EVENT 0x12
> -
> -#define ISQ_EVENT_MASK 0x003F   /*  ISQ mask to find out type of event */
> -#define ISQ_HIST 16            /*  small history buffer */
> -#define AUTOINCREMENT 0x8000   /*  Bit mask to set bit-15 for autoincrement */
> -
> -#define TXRXBUFSIZE 0x0600
> -#define RXDMABUFSIZE 0x8000
> -#define RXDMASIZE 0x4000
> -#define TXRX_LENGTH_MASK 0x07FF
> -
> -/*  rx options bits */
> -#define RCV_WITH_RXON  1       /*  Set SerRx ON */
> -#define RCV_COUNTS     2       /*  Use Framecnt1 */
> -#define RCV_PONG       4       /*  Pong respondent */
> -#define RCV_DONG       8       /*  Dong operation */
> -#define RCV_POLLING    0x10    /*  Poll RxEvent */
> -#define RCV_ISQ                0x20    /*  Use ISQ, int */
> -#define RCV_AUTO_DMA   0x100   /*  Set AutoRxDMAE */
> -#define RCV_DMA                0x200   /*  Set RxDMA only */
> -#define RCV_DMA_ALL    0x400   /*  Copy all DMA'ed */
> -#define RCV_FIXED_DATA 0x800   /*  Every frame same */
> -#define RCV_IO         0x1000  /*  Use ISA IO only */
> -#define RCV_MEMORY     0x2000  /*  Use ISA Memory */
> -
> -#define RAM_SIZE       0x1000       /*  The card has 4k bytes or RAM */
> -#define PKT_START PP_TxFrame  /*  Start of packet RAM */
> -
> -#define RX_FRAME_PORT  0x0000
> -#define TX_FRAME_PORT RX_FRAME_PORT
> -#define TX_CMD_PORT    0x0004
> -#define TX_NOW         0x0000       /*  Tx packet after   5 bytes copied */
> -#define TX_AFTER_381   0x0040       /*  Tx packet after 381 bytes copied */
> -#define TX_AFTER_ALL   0x00c0       /*  Tx packet after all bytes copied */
> -#define TX_LEN_PORT    0x0006
> -#define ISQ_PORT       0x0008
> -#define ADD_PORT       0x000A
> -#define DATA_PORT      0x000C
> -
> -#define EEPROM_WRITE_EN                0x00F0
> -#define EEPROM_WRITE_DIS       0x0000
> -#define EEPROM_WRITE_CMD       0x0100
> -#define EEPROM_READ_CMD                0x0200
> -
> -/*  Receive Header */
> -/*  Description of header of each packet in receive area of memory */
> -#define RBUF_EVENT_LOW 0   /*  Low byte of RxEvent - status of received frame */
> -#define RBUF_EVENT_HIGH        1   /*  High byte of RxEvent - status of received frame */
> -#define RBUF_LEN_LOW   2   /*  Length of received data - low byte */
> -#define RBUF_LEN_HI    3   /*  Length of received data - high byte */
> -#define RBUF_HEAD_LEN  4   /*  Length of this header */
> -
> -#define CHIP_READ 0x1   /*  Used to mark state of the repins code (chip or dma) */
> -#define DMA_READ 0x2   /*  Used to mark state of the repins code (chip or dma) */
> -
> -/*  for bios scan */
> -/*  */
> -#ifdef CSDEBUG
> -/*  use these values for debugging bios scan */
> -#define BIOS_START_SEG 0x00000
> -#define BIOS_OFFSET_INC 0x0010
> -#else
> -#define BIOS_START_SEG 0x0c000
> -#define BIOS_OFFSET_INC 0x0200
> -#endif
> -
> -#define BIOS_LAST_OFFSET 0x0fc00
> -
> -/*  Byte offsets into the EEPROM configuration buffer */
> -#define ISA_CNF_OFFSET 0x6
> -#define TX_CTL_OFFSET (ISA_CNF_OFFSET + 8)                     /*  8900 eeprom */
> -#define AUTO_NEG_CNF_OFFSET (ISA_CNF_OFFSET + 8)               /*  8920 eeprom */
> -
> -  /*  the assumption here is that the bits in the eeprom are generally  */
> -  /*  in the same position as those in the autonegctl register. */
> -  /*  Of course the IMM bit is not in that register so it must be  */
> -  /*  masked out */
> -#define EE_FORCE_FDX  0x8000
> -#define EE_NLP_ENABLE 0x0200
> -#define EE_AUTO_NEG_ENABLE 0x0100
> -#define EE_ALLOW_FDX 0x0080
> -#define EE_AUTO_NEG_CNF_MASK (EE_FORCE_FDX|EE_NLP_ENABLE|EE_AUTO_NEG_ENABLE|EE_ALLOW_FDX)
> -
> -#define IMM_BIT 0x0040         /*  ignore missing media         */
> -
> -#define ADAPTER_CNF_OFFSET (AUTO_NEG_CNF_OFFSET + 2)
> -#define A_CNF_10B_T 0x0001
> -#define A_CNF_AUI 0x0002
> -#define A_CNF_10B_2 0x0004
> -#define A_CNF_MEDIA_TYPE 0x0070
> -#define A_CNF_MEDIA_AUTO 0x0070
> -#define A_CNF_MEDIA_10B_T 0x0020
> -#define A_CNF_MEDIA_AUI 0x0040
> -#define A_CNF_MEDIA_10B_2 0x0010
> -#define A_CNF_DC_DC_POLARITY 0x0080
> -#define A_CNF_NO_AUTO_POLARITY 0x2000
> -#define A_CNF_LOW_RX_SQUELCH 0x4000
> -#define A_CNF_EXTND_10B_2 0x8000
> -
> -#define PACKET_PAGE_OFFSET 0x8
> -
> -/*  Bit definitions for the ISA configuration word from the EEPROM */
> -#define INT_NO_MASK 0x000F
> -#define DMA_NO_MASK 0x0070
> -#define ISA_DMA_SIZE 0x0200
> -#define ISA_AUTO_RxDMA 0x0400
> -#define ISA_RxDMA 0x0800
> -#define DMA_BURST 0x1000
> -#define STREAM_TRANSFER 0x2000
> -#define ANY_ISA_DMA (ISA_AUTO_RxDMA | ISA_RxDMA)
> -
> -/*  DMA controller registers */
> -#define DMA_BASE 0x00     /*  DMA controller base */
> -#define DMA_BASE_2 0x0C0    /*  DMA controller base */
> -
> -#define DMA_STAT 0x0D0    /*  DMA controller status register */
> -#define DMA_MASK 0x0D4    /*  DMA controller mask register */
> -#define DMA_MODE 0x0D6    /*  DMA controller mode register */
> -#define DMA_RESETFF 0x0D8    /*  DMA controller first/last flip flop */
> -
> -/*  DMA data */
> -#define DMA_DISABLE 0x04     /*  Disable channel n */
> -#define DMA_ENABLE 0x00     /*  Enable channel n */
> -/*  Demand transfers, incr. address, auto init, writes, ch. n */
> -#define DMA_RX_MODE 0x14
> -/*  Demand transfers, incr. address, auto init, reads, ch. n */
> -#define DMA_TX_MODE 0x18
> -
> -#define DMA_SIZE (16*1024) /*  Size of dma buffer - 16k */
> -
> -#define CS8900 0x0000
> -#define CS8920 0x4000
> -#define CS8920M 0x6000
> -#define REVISON_BITS 0x1F00
> -#define EEVER_NUMBER 0x12
> -#define CHKSUM_LEN 0x14
> -#define CHKSUM_VAL 0x0000
> -#define START_EEPROM_DATA 0x001c /*  Offset into eeprom for start of data */
> -#define IRQ_MAP_EEPROM_DATA 0x0046 /*  Offset into eeprom for the IRQ map */
> -#define IRQ_MAP_LEN 0x0004 /*  No of bytes to read for the IRQ map */
> -#define PNP_IRQ_FRMT 0x0022 /*  PNP small item IRQ format */
> -#define CS8900_IRQ_MAP 0x1c20 /*  This IRQ map is fixed */
> -
> -#define CS8920_NO_INTS 0x0F   /*  Max CS8920 interrupt select # */
> -
> -#define PNP_ADD_PORT 0x0279
> -#define PNP_WRITE_PORT 0x0A79
> -
> -#define GET_PNP_ISA_STRUCT 0x40
> -#define PNP_ISA_STRUCT_LEN 0x06
> -#define PNP_CSN_CNT_OFF 0x01
> -#define PNP_RD_PORT_OFF 0x02
> -#define PNP_FUNCTION_OK 0x00
> -#define PNP_WAKE 0x03
> -#define PNP_RSRC_DATA 0x04
> -#define PNP_RSRC_READY 0x01
> -#define PNP_STATUS 0x05
> -#define PNP_ACTIVATE 0x30
> -#define PNP_CNF_IO_H 0x60
> -#define PNP_CNF_IO_L 0x61
> -#define PNP_CNF_INT 0x70
> -#define PNP_CNF_DMA 0x74
> -#define PNP_CNF_MEM 0x48
> diff --git a/drivers/net/ethernet/cirrus/mac89x0.c b/drivers/net/ethernet/cirrus/mac89x0.c
> deleted file mode 100644
> index 6723df9b65d9..000000000000
> --- a/drivers/net/ethernet/cirrus/mac89x0.c
> +++ /dev/null
> @@ -1,577 +0,0 @@
> -/* mac89x0.c: A Crystal Semiconductor CS89[02]0 driver for linux. */
> -/*
> -       Written 1996 by Russell Nelson, with reference to skeleton.c
> -       written 1993-1994 by Donald Becker.
> -
> -       This software may be used and distributed according to the terms
> -       of the GNU General Public License, incorporated herein by reference.
> -
> -       The author may be reached at nelson@crynwr.com, Crynwr
> -       Software, 11 Grant St., Potsdam, NY 13676
> -
> -  Changelog:
> -
> -  Mike Cruse        : mcruse@cti-ltd.com
> -                    : Changes for Linux 2.0 compatibility.
> -                    : Added dev_id parameter in net_interrupt(),
> -                    : request_irq() and free_irq(). Just NULL for now.
> -
> -  Mike Cruse        : Added MOD_INC_USE_COUNT and MOD_DEC_USE_COUNT macros
> -                    : in net_open() and net_close() so kerneld would know
> -                    : that the module is in use and wouldn't eject the
> -                    : driver prematurely.
> -
> -  Mike Cruse        : Rewrote init_module() and cleanup_module using 8390.c
> -                    : as an example. Disabled autoprobing in init_module(),
> -                    : not a good thing to do to other devices while Linux
> -                    : is running from all accounts.
> -
> -  Alan Cox          : Removed 1.2 support, added 2.1 extra counters.
> -
> -  David Huggins-Daines <dhd@debian.org>
> -
> -  Split this off into mac89x0.c, and gutted it of all parts which are
> -  not relevant to the existing CS8900 cards on the Macintosh
> -  (i.e. basically the Daynaport CS and LC cards).  To be precise:
> -
> -    * Removed all the media-detection stuff, because these cards are
> -    TP-only.
> -
> -    * Lobotomized the ISA interrupt bogosity, because these cards use
> -    a hardwired NuBus interrupt and a magic ISAIRQ value in the card.
> -
> -    * Basically eliminated everything not relevant to getting the
> -    cards minimally functioning on the Macintosh.
> -
> -  I might add that these cards are badly designed even from the Mac
> -  standpoint, in that Dayna, in their infinite wisdom, used NuBus slot
> -  I/O space and NuBus interrupts for these cards, but neglected to
> -  provide anything even remotely resembling a NuBus ROM.  Therefore we
> -  have to probe for them in a brain-damaged ISA-like fashion.
> -
> -  Arnaldo Carvalho de Melo <acme@conectiva.com.br> - 11/01/2001
> -  check kmalloc and release the allocated memory on failure in
> -  mac89x0_probe and in init_module
> -  use local_irq_{save,restore}(flags) in net_get_stat, not just
> -  local_irq_{dis,en}able()
> -*/
> -
> -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> -
> -static const char version[] =
> -"cs89x0.c:v1.02 11/26/96 Russell Nelson <nelson@crynwr.com>\n";
> -
> -#include <linux/module.h>
> -
> -/*
> -  Sources:
> -
> -       Crynwr packet driver epktisa.
> -
> -       Crystal Semiconductor data sheets.
> -
> -*/
> -
> -#include <linux/kernel.h>
> -#include <linux/types.h>
> -#include <linux/fcntl.h>
> -#include <linux/interrupt.h>
> -#include <linux/ioport.h>
> -#include <linux/in.h>
> -#include <linux/string.h>
> -#include <linux/nubus.h>
> -#include <linux/errno.h>
> -#include <linux/init.h>
> -#include <linux/netdevice.h>
> -#include <linux/platform_device.h>
> -#include <linux/etherdevice.h>
> -#include <linux/skbuff.h>
> -#include <linux/delay.h>
> -#include <linux/bitops.h>
> -#include <linux/gfp.h>
> -
> -#include <asm/io.h>
> -#include <asm/hwtest.h>
> -#include <asm/macints.h>
> -
> -#include "cs89x0.h"
> -
> -static int debug = -1;
> -module_param(debug, int, 0);
> -MODULE_PARM_DESC(debug, "debug message level");
> -
> -/* Information that need to be kept for each board. */
> -struct net_local {
> -       int msg_enable;
> -       int chip_type;          /* one of: CS8900, CS8920, CS8920M */
> -       char chip_revision;     /* revision letter of the chip ('A'...) */
> -       int send_cmd;           /* the propercommand used to send a packet. */
> -       int rx_mode;
> -       int curr_rx_cfg;
> -        int send_underrun;      /* keep track of how many underruns in a row we get */
> -};
> -
> -/* Index to functions, as function prototypes. */
> -static int net_open(struct net_device *dev);
> -static netdev_tx_t net_send_packet(struct sk_buff *skb, struct net_device *dev);
> -static irqreturn_t net_interrupt(int irq, void *dev_id);
> -static void set_multicast_list(struct net_device *dev);
> -static void net_rx(struct net_device *dev);
> -static int net_close(struct net_device *dev);
> -static struct net_device_stats *net_get_stats(struct net_device *dev);
> -static int set_mac_address(struct net_device *dev, void *addr);
> -
> -/* For reading/writing registers ISA-style */
> -static inline int
> -readreg_io(struct net_device *dev, int portno)
> -{
> -       nubus_writew(swab16(portno), dev->base_addr + ADD_PORT);
> -       return swab16(nubus_readw(dev->base_addr + DATA_PORT));
> -}
> -
> -static inline void
> -writereg_io(struct net_device *dev, int portno, int value)
> -{
> -       nubus_writew(swab16(portno), dev->base_addr + ADD_PORT);
> -       nubus_writew(swab16(value), dev->base_addr + DATA_PORT);
> -}
> -
> -/* These are for reading/writing registers in shared memory */
> -static inline int
> -readreg(struct net_device *dev, int portno)
> -{
> -       return swab16(nubus_readw(dev->mem_start + portno));
> -}
> -
> -static inline void
> -writereg(struct net_device *dev, int portno, int value)
> -{
> -       nubus_writew(swab16(value), dev->mem_start + portno);
> -}
> -
> -static const struct net_device_ops mac89x0_netdev_ops = {
> -       .ndo_open               = net_open,
> -       .ndo_stop               = net_close,
> -       .ndo_start_xmit         = net_send_packet,
> -       .ndo_get_stats          = net_get_stats,
> -       .ndo_set_rx_mode        = set_multicast_list,
> -       .ndo_set_mac_address    = set_mac_address,
> -       .ndo_validate_addr      = eth_validate_addr,
> -};
> -
> -/* Probe for the CS8900 card in slot E.  We won't bother looking
> -   anywhere else until we have a really good reason to do so. */
> -static int mac89x0_device_probe(struct platform_device *pdev)
> -{
> -       struct net_device *dev;
> -       struct net_local *lp;
> -       int i, slot;
> -       unsigned rev_type = 0;
> -       unsigned long ioaddr;
> -       unsigned short sig;
> -       int err = -ENODEV;
> -       struct nubus_rsrc *fres;
> -
> -       dev = alloc_etherdev(sizeof(struct net_local));
> -       if (!dev)
> -               return -ENOMEM;
> -
> -       /* We might have to parameterize this later */
> -       slot = 0xE;
> -       /* Get out now if there's a real NuBus card in slot E */
> -       for_each_func_rsrc(fres)
> -               if (fres->board->slot == slot)
> -                       goto out;
> -
> -       /* The pseudo-ISA bits always live at offset 0x300 (gee,
> -           wonder why...) */
> -       ioaddr = (unsigned long)
> -               nubus_slot_addr(slot) | (((slot&0xf) << 20) + DEFAULTIOBASE);
> -       {
> -               int card_present;
> -
> -               card_present = (hwreg_present((void *)ioaddr + 4) &&
> -                               hwreg_present((void *)ioaddr + DATA_PORT));
> -               if (!card_present)
> -                       goto out;
> -       }
> -
> -       nubus_writew(0, ioaddr + ADD_PORT);
> -       sig = nubus_readw(ioaddr + DATA_PORT);
> -       if (sig != swab16(CHIP_EISA_ID_SIG))
> -               goto out;
> -
> -       SET_NETDEV_DEV(dev, &pdev->dev);
> -
> -       /* Initialize the net_device structure. */
> -       lp = netdev_priv(dev);
> -
> -       lp->msg_enable = netif_msg_init(debug, 0);
> -
> -       /* Fill in the 'dev' fields. */
> -       dev->base_addr = ioaddr;
> -       dev->mem_start = (unsigned long)
> -               nubus_slot_addr(slot) | (((slot&0xf) << 20) + MMIOBASE);
> -       dev->mem_end = dev->mem_start + 0x1000;
> -
> -       /* Turn on shared memory */
> -       writereg_io(dev, PP_BusCTL, MEMORY_ON);
> -
> -       /* get the chip type */
> -       rev_type = readreg(dev, PRODUCT_ID_ADD);
> -       lp->chip_type = rev_type &~ REVISON_BITS;
> -       lp->chip_revision = ((rev_type & REVISON_BITS) >> 8) + 'A';
> -
> -       /* Check the chip type and revision in order to set the correct send command
> -       CS8920 revision C and CS8900 revision F can use the faster send. */
> -       lp->send_cmd = TX_AFTER_381;
> -       if (lp->chip_type == CS8900 && lp->chip_revision >= 'F')
> -               lp->send_cmd = TX_NOW;
> -       if (lp->chip_type != CS8900 && lp->chip_revision >= 'C')
> -               lp->send_cmd = TX_NOW;
> -
> -       netif_dbg(lp, drv, dev, "%s", version);
> -
> -       pr_info("cs89%c0%s rev %c found at %#8lx\n",
> -               lp->chip_type == CS8900 ? '0' : '2',
> -               lp->chip_type == CS8920M ? "M" : "",
> -               lp->chip_revision, dev->base_addr);
> -
> -       /* Try to read the MAC address */
> -       if ((readreg(dev, PP_SelfST) & (EEPROM_PRESENT | EEPROM_OK)) == 0) {
> -               pr_info("No EEPROM, giving up now.\n");
> -               goto out1;
> -        } else {
> -               u8 addr[ETH_ALEN];
> -
> -                for (i = 0; i < ETH_ALEN; i += 2) {
> -                       /* Big-endian (why??!) */
> -                       unsigned short s = readreg(dev, PP_IA + i);
> -                       addr[i] = s >> 8;
> -                       addr[i+1] = s & 0xff;
> -                }
> -               eth_hw_addr_set(dev, addr);
> -        }
> -
> -       dev->irq = SLOT2IRQ(slot);
> -
> -       /* print the IRQ and ethernet address. */
> -
> -       pr_info("MAC %pM, IRQ %d\n", dev->dev_addr, dev->irq);
> -
> -       dev->netdev_ops         = &mac89x0_netdev_ops;
> -
> -       err = register_netdev(dev);
> -       if (err)
> -               goto out1;
> -
> -       platform_set_drvdata(pdev, dev);
> -       return 0;
> -out1:
> -       nubus_writew(0, dev->base_addr + ADD_PORT);
> -out:
> -       free_netdev(dev);
> -       return err;
> -}
> -
> -/* Open/initialize the board.  This is called (in the current kernel)
> -   sometime after booting when the 'ifconfig' program is run.
> -
> -   This routine should set everything up anew at each open, even
> -   registers that "should" only need to be set once at boot, so that
> -   there is non-reboot way to recover if something goes wrong.
> -   */
> -static int
> -net_open(struct net_device *dev)
> -{
> -       struct net_local *lp = netdev_priv(dev);
> -       int i;
> -
> -       /* Disable the interrupt for now */
> -       writereg(dev, PP_BusCTL, readreg(dev, PP_BusCTL) & ~ENABLE_IRQ);
> -
> -       /* Grab the interrupt */
> -       if (request_irq(dev->irq, net_interrupt, 0, "cs89x0", dev))
> -               return -EAGAIN;
> -
> -       /* Set up the IRQ - Apparently magic */
> -       if (lp->chip_type == CS8900)
> -               writereg(dev, PP_CS8900_ISAINT, 0);
> -       else
> -               writereg(dev, PP_CS8920_ISAINT, 0);
> -
> -       /* set the Ethernet address */
> -       for (i=0; i < ETH_ALEN/2; i++)
> -               writereg(dev, PP_IA+i*2, dev->dev_addr[i*2] | (dev->dev_addr[i*2+1] << 8));
> -
> -       /* Turn on both receive and transmit operations */
> -       writereg(dev, PP_LineCTL, readreg(dev, PP_LineCTL) | SERIAL_RX_ON | SERIAL_TX_ON);
> -
> -       /* Receive only error free packets addressed to this card */
> -       lp->rx_mode = 0;
> -       writereg(dev, PP_RxCTL, DEF_RX_ACCEPT);
> -
> -       lp->curr_rx_cfg = RX_OK_ENBL | RX_CRC_ERROR_ENBL;
> -
> -       writereg(dev, PP_RxCFG, lp->curr_rx_cfg);
> -
> -       writereg(dev, PP_TxCFG, TX_LOST_CRS_ENBL | TX_SQE_ERROR_ENBL | TX_OK_ENBL |
> -              TX_LATE_COL_ENBL | TX_JBR_ENBL | TX_ANY_COL_ENBL | TX_16_COL_ENBL);
> -
> -       writereg(dev, PP_BufCFG, READY_FOR_TX_ENBL | RX_MISS_COUNT_OVRFLOW_ENBL |
> -                TX_COL_COUNT_OVRFLOW_ENBL | TX_UNDERRUN_ENBL);
> -
> -       /* now that we've got our act together, enable everything */
> -       writereg(dev, PP_BusCTL, readreg(dev, PP_BusCTL) | ENABLE_IRQ);
> -       netif_start_queue(dev);
> -       return 0;
> -}
> -
> -static netdev_tx_t
> -net_send_packet(struct sk_buff *skb, struct net_device *dev)
> -{
> -       struct net_local *lp = netdev_priv(dev);
> -       unsigned long flags;
> -
> -       netif_dbg(lp, tx_queued, dev, "sent %d byte packet of type %x\n",
> -                 skb->len, skb->data[ETH_ALEN + ETH_ALEN] << 8 |
> -                 skb->data[ETH_ALEN + ETH_ALEN + 1]);
> -
> -       /* keep the upload from being interrupted, since we
> -          ask the chip to start transmitting before the
> -          whole packet has been completely uploaded. */
> -       local_irq_save(flags);
> -       netif_stop_queue(dev);
> -
> -       /* initiate a transmit sequence */
> -       writereg(dev, PP_TxCMD, lp->send_cmd);
> -       writereg(dev, PP_TxLength, skb->len);
> -
> -       /* Test to see if the chip has allocated memory for the packet */
> -       if ((readreg(dev, PP_BusST) & READY_FOR_TX_NOW) == 0) {
> -               /* Gasp!  It hasn't.  But that shouldn't happen since
> -                  we're waiting for TxOk, so return 1 and requeue this packet. */
> -               local_irq_restore(flags);
> -               return NETDEV_TX_BUSY;
> -       }
> -
> -       /* Write the contents of the packet */
> -       skb_copy_from_linear_data(skb, (void *)(dev->mem_start + PP_TxFrame),
> -                                 skb->len+1);
> -
> -       local_irq_restore(flags);
> -       dev_kfree_skb (skb);
> -
> -       return NETDEV_TX_OK;
> -}
> -
> -/* The typical workload of the driver:
> -   Handle the network interface interrupts. */
> -static irqreturn_t net_interrupt(int irq, void *dev_id)
> -{
> -       struct net_device *dev = dev_id;
> -       struct net_local *lp;
> -       int ioaddr, status;
> -
> -       ioaddr = dev->base_addr;
> -       lp = netdev_priv(dev);
> -
> -       /* we MUST read all the events out of the ISQ, otherwise we'll never
> -           get interrupted again.  As a consequence, we can't have any limit
> -           on the number of times we loop in the interrupt handler.  The
> -           hardware guarantees that eventually we'll run out of events.  Of
> -           course, if you're on a slow machine, and packets are arriving
> -           faster than you can read them off, you're screwed.  Hasta la
> -           vista, baby!  */
> -       while ((status = swab16(nubus_readw(dev->base_addr + ISQ_PORT)))) {
> -               netif_dbg(lp, intr, dev, "status=%04x\n", status);
> -               switch(status & ISQ_EVENT_MASK) {
> -               case ISQ_RECEIVER_EVENT:
> -                       /* Got a packet(s). */
> -                       net_rx(dev);
> -                       break;
> -               case ISQ_TRANSMITTER_EVENT:
> -                       dev->stats.tx_packets++;
> -                       netif_wake_queue(dev);
> -                       if ((status & TX_OK) == 0)
> -                               dev->stats.tx_errors++;
> -                       if (status & TX_LOST_CRS)
> -                               dev->stats.tx_carrier_errors++;
> -                       if (status & TX_SQE_ERROR)
> -                               dev->stats.tx_heartbeat_errors++;
> -                       if (status & TX_LATE_COL)
> -                               dev->stats.tx_window_errors++;
> -                       if (status & TX_16_COL)
> -                               dev->stats.tx_aborted_errors++;
> -                       break;
> -               case ISQ_BUFFER_EVENT:
> -                       if (status & READY_FOR_TX) {
> -                               /* we tried to transmit a packet earlier,
> -                                   but inexplicably ran out of buffers.
> -                                   That shouldn't happen since we only ever
> -                                   load one packet.  Shrug.  Do the right
> -                                   thing anyway. */
> -                               netif_wake_queue(dev);
> -                       }
> -                       if (status & TX_UNDERRUN) {
> -                               netif_dbg(lp, tx_err, dev, "transmit underrun\n");
> -                                lp->send_underrun++;
> -                                if (lp->send_underrun == 3) lp->send_cmd = TX_AFTER_381;
> -                                else if (lp->send_underrun == 6) lp->send_cmd = TX_AFTER_ALL;
> -                        }
> -                       break;
> -               case ISQ_RX_MISS_EVENT:
> -                       dev->stats.rx_missed_errors += (status >> 6);
> -                       break;
> -               case ISQ_TX_COL_EVENT:
> -                       dev->stats.collisions += (status >> 6);
> -                       break;
> -               }
> -       }
> -       return IRQ_HANDLED;
> -}
> -
> -/* We have a good packet(s), get it/them out of the buffers. */
> -static void
> -net_rx(struct net_device *dev)
> -{
> -       struct net_local *lp = netdev_priv(dev);
> -       struct sk_buff *skb;
> -       int status, length;
> -
> -       status = readreg(dev, PP_RxStatus);
> -       if ((status & RX_OK) == 0) {
> -               dev->stats.rx_errors++;
> -               if (status & RX_RUNT)
> -                               dev->stats.rx_length_errors++;
> -               if (status & RX_EXTRA_DATA)
> -                               dev->stats.rx_length_errors++;
> -               if ((status & RX_CRC_ERROR) &&
> -                   !(status & (RX_EXTRA_DATA|RX_RUNT)))
> -                       /* per str 172 */
> -                       dev->stats.rx_crc_errors++;
> -               if (status & RX_DRIBBLE)
> -                               dev->stats.rx_frame_errors++;
> -               return;
> -       }
> -
> -       length = readreg(dev, PP_RxLength);
> -       /* Malloc up new buffer. */
> -       skb = alloc_skb(length, GFP_ATOMIC);
> -       if (skb == NULL) {
> -               dev->stats.rx_dropped++;
> -               return;
> -       }
> -       skb_put(skb, length);
> -
> -       skb_copy_to_linear_data(skb, (void *)(dev->mem_start + PP_RxFrame),
> -                               length);
> -
> -       netif_dbg(lp, rx_status, dev, "received %d byte packet of type %x\n",
> -                 length, skb->data[ETH_ALEN + ETH_ALEN] << 8 |
> -                 skb->data[ETH_ALEN + ETH_ALEN + 1]);
> -
> -        skb->protocol=eth_type_trans(skb,dev);
> -       netif_rx(skb);
> -       dev->stats.rx_packets++;
> -       dev->stats.rx_bytes += length;
> -}
> -
> -/* The inverse routine to net_open(). */
> -static int
> -net_close(struct net_device *dev)
> -{
> -
> -       writereg(dev, PP_RxCFG, 0);
> -       writereg(dev, PP_TxCFG, 0);
> -       writereg(dev, PP_BufCFG, 0);
> -       writereg(dev, PP_BusCTL, 0);
> -
> -       netif_stop_queue(dev);
> -
> -       free_irq(dev->irq, dev);
> -
> -       /* Update the statistics here. */
> -
> -       return 0;
> -
> -}
> -
> -/* Get the current statistics. This may be called with the card open or
> -   closed. */
> -static struct net_device_stats *
> -net_get_stats(struct net_device *dev)
> -{
> -       unsigned long flags;
> -
> -       local_irq_save(flags);
> -       /* Update the statistics from the device registers. */
> -       dev->stats.rx_missed_errors += (readreg(dev, PP_RxMiss) >> 6);
> -       dev->stats.collisions += (readreg(dev, PP_TxCol) >> 6);
> -       local_irq_restore(flags);
> -
> -       return &dev->stats;
> -}
> -
> -static void set_multicast_list(struct net_device *dev)
> -{
> -       struct net_local *lp = netdev_priv(dev);
> -
> -       if(dev->flags&IFF_PROMISC)
> -       {
> -               lp->rx_mode = RX_ALL_ACCEPT;
> -       } else if ((dev->flags & IFF_ALLMULTI) || !netdev_mc_empty(dev)) {
> -               /* The multicast-accept list is initialized to accept-all, and we
> -                  rely on higher-level filtering for now. */
> -               lp->rx_mode = RX_MULTCAST_ACCEPT;
> -       }
> -       else
> -               lp->rx_mode = 0;
> -
> -       writereg(dev, PP_RxCTL, DEF_RX_ACCEPT | lp->rx_mode);
> -
> -       /* in promiscuous mode, we accept errored packets, so we have to enable interrupts on them also */
> -       writereg(dev, PP_RxCFG, lp->curr_rx_cfg |
> -            (lp->rx_mode == RX_ALL_ACCEPT? (RX_CRC_ERROR_ENBL|RX_RUNT_ENBL|RX_EXTRA_DATA_ENBL) : 0));
> -}
> -
> -
> -static int set_mac_address(struct net_device *dev, void *addr)
> -{
> -       struct sockaddr *saddr = addr;
> -       int i;
> -
> -       if (!is_valid_ether_addr(saddr->sa_data))
> -               return -EADDRNOTAVAIL;
> -
> -       eth_hw_addr_set(dev, saddr->sa_data);
> -       netdev_info(dev, "Setting MAC address to %pM\n", dev->dev_addr);
> -
> -       /* set the Ethernet address */
> -       for (i=0; i < ETH_ALEN/2; i++)
> -               writereg(dev, PP_IA+i*2, dev->dev_addr[i*2] | (dev->dev_addr[i*2+1] << 8));
> -
> -       return 0;
> -}
> -
> -MODULE_DESCRIPTION("Macintosh CS89x0-based Ethernet driver");
> -MODULE_LICENSE("GPL");
> -
> -static void mac89x0_device_remove(struct platform_device *pdev)
> -{
> -       struct net_device *dev = platform_get_drvdata(pdev);
> -
> -       unregister_netdev(dev);
> -       nubus_writew(0, dev->base_addr + ADD_PORT);
> -       free_netdev(dev);
> -}
> -
> -static struct platform_driver mac89x0_platform_driver = {
> -       .probe = mac89x0_device_probe,
> -       .remove = mac89x0_device_remove,
> -       .driver = {
> -               .name = "mac89x0",
> -       },
> -};
> -
> -module_platform_driver(mac89x0_platform_driver);
>
> --
> 2.53.0

^ permalink raw reply

* Re: [PATCH net] virtio_net: sync rss_trailer.max_tx_vq on queue_pairs change via VQ_PAIRS_SET
From: Paolo Abeni @ 2026-04-23  7:01 UTC (permalink / raw)
  To: Brett Creeley, mst, jasowang, andrew+netdev, davem, edumazet,
	kuba, netdev, Xuan Zhuo, Eugenio Pérez
In-Reply-To: <20260416212121.29073-1-brett.creeley@amd.com>

On 4/16/26 11:21 PM, Brett Creeley wrote:
> When netif_is_rxfh_configured() is true (i.e., the user has explicitly
> configured the RSS indirection table), virtnet_set_queues() skips the
> RSS update path and falls through to the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> command to change the number of queue pairs. However, it does not update
> vi->rss_trailer.max_tx_vq to reflect the new queue_pairs value.
> 
> This causes a mismatch between vi->curr_queue_pairs and
> vi->rss_trailer.max_tx_vq. Any subsequent RSS reconfiguration (e.g.,
> via ethtool -X) calls virtnet_commit_rss_command(), which sends the
> stale max_tx_vq to the device, silently reverting the queue count.
> 
> Reproduction:
> 1. User configured RSS
>   ethtool -X eth0 equal 8
> 2. VQ_PAIRS_SET path; max_tx_vq stays 16
>   ethtool -L eth0 combined 12
> 3. RSS commit uses max_tx_vq=16 instead of 12
>   ethtool -X eth0 equal 4
> 
> Fix this by updating vi->rss_trailer.max_tx_vq after a successful
> VQ_PAIRS_SET command when RSS is enabled, keeping it in sync with
> curr_queue_pairs.
> 
> Fixes: 50bfcaedd78e ("virtio_net: Update rss when set queue")
> Assisted-by: Claude: claude-opus-4.6
> Signed-off-by: Brett Creeley <brett.creeley@amd.com>

The patch LGTM, but waiting a little longer just in case the virtio crew
has some comments.

/P


^ permalink raw reply

* Re: [PATCH] mctp i2c: check packet length before marking flow active
From: William A. Kennington III @ 2026-04-23  6:55 UTC (permalink / raw)
  To: Jeremy Kerr, Matt Johnston, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Wolfram Sang
  Cc: netdev, linux-kernel
In-Reply-To: <61bb1b3838609996600f46ccb2c4ff89d085ee6f.camel@codeconstruct.com.au>

On 4/22/26 20:47, Jeremy Kerr wrote:

> Hi William,
>
>> Move the mctp_i2c_get_tx_flow_state() call to after the length sanity
>> check to ensure we only transition the flow state if we are actually
>> going to proceed with the transmission and locking.
> Good catch, thanks!
>
>> Subject: [PATCH] mctp i2c: check packet length before marking flow active
> You'll want to indicate that this is for the net tree, rather than
> net-next, so something like:
>
>     Subject: [PATCH net] net: mctp i2c: check packet length [...]

Thanks, forgot the convention...

>
> With that change:
>
> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
>
> Out of curiosity though, how did you hit the hdr_byte_count mismatch in
> the first place?

Our current theory is that we have known buggy firmware on our NVME MCTP 
devices and we are seeing some kind of corruption on the bus that we are 
going to fix in on the firmware side. We started also seeing kernel 
crashes along with the bad firmware symptoms, walked through ~110 kdumps 
and found i2c locks that were held by 2 owners (eeprom reading and the 
MCTP TX queue). This ended up causing deadlocks in the i2c stack that 
result in panics. We noticed in many of these cases that we had 10K+ 
BERs and 800+ NACKs in the i2c driver stats struct at the time of crash.

>
> Cheers,
>
>
> Jeremy

^ permalink raw reply

* RE: [EXTERNAL] Re: [PATCH net] hv_sock: Return -EIO for malformed/short packets
From: Dexuan Cui @ 2026-04-23  6:50 UTC (permalink / raw)
  To: Dexuan Cui, Stefano Garzarella
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Long Li,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, niuxuewei.nxw@antgroup.com,
	linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
In-Reply-To: <SA1PR21MB6921EDE5E6530B09931ABD93BF2D2@SA1PR21MB6921.namprd21.prod.outlook.com>

> From: Dexuan Cui
> Sent: Wednesday, April 22, 2026 11:15 AM
>  ...
> Thank you Stefano! I'll post v2 later today.

Posted v2:
https://lore.kernel.org/linux-hyperv/20260423064811.1371749-1-decui@microsoft.com/T/#u

^ permalink raw reply

* [PATCH net v2] hv_sock: Return -EIO for malformed/short packets
From: Dexuan Cui @ 2026-04-23  6:48 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, sgarzare, davem, edumazet,
	kuba, pabeni, horms, niuxuewei.nxw, linux-hyperv, virtualization,
	netdev, linux-kernel
  Cc: stable

Commit f63152958994 fixes a regression, however it fails to report an
error for malformed/short packets -- normally we should never see such
packets, but let's report an error for them just in case.

Fixes: f63152958994 ("hv_sock: Report EOF instead of -EIO for FIN")
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Commit f63152958994 is currently only in net.git's master branch.

Changes since v1:
    Integrated comments from Stefano Garzarella:

        1) access 'vsk' directly:
           s/hvs->vsk->peer_shutdown/vsk->peer_shutdown/

        2) test the error condition first and return -EIO for that.

    NO other changes.


 net/vmw_vsock/hyperv_transport.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 76e78c83fdbc..f862988c1e86 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -704,17 +704,26 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
 		if (hvs->recv_desc) {
 			/* Here hvs->recv_data_len is 0, so hvs->recv_desc must
 			 * be NULL unless it points to the 0-byte-payload FIN
-			 * packet: see hvs_update_recv_data().
+			 * packet or a malformed/short packet: see
+			 * hvs_update_recv_data().
 			 *
-			 * Here all the payload has been dequeued, but
-			 * hvs_channel_readable_payload() still returns 1,
-			 * because the VMBus ringbuffer's read_index is not
-			 * updated for the FIN packet: hvs_stream_dequeue() ->
-			 * hv_pkt_iter_next() updates the cached priv_read_index
-			 * but has no opportunity to update the read_index in
-			 * hv_pkt_iter_close() as hvs_stream_has_data() returns
-			 * 0 for the FIN packet, so it won't get dequeued.
+			 * If hvs->recv_desc points to the FIN packet, here all
+			 * the payload has been dequeued and the peer_shutdown
+			 * flag is set, but hvs_channel_readable_payload() still
+			 * returns 1, because the VMBus ringbuffer's read_index
+			 * is not updated for the FIN packet:
+			 * hvs_stream_dequeue() -> hv_pkt_iter_next() updates
+			 * the cached priv_read_index but has no opportunity to
+			 * update the read_index in hv_pkt_iter_close() as
+			 * hvs_stream_has_data() returns 0 for the FIN packet,
+			 * so it won't get dequeued.
+			 *
+			 * In case hvs->recv_desc points to a malformed/short
+			 * packet, return -EIO.
 			 */
+			if (!(vsk->peer_shutdown & SEND_SHUTDOWN))
+				return -EIO;
+
 			return 0;
 		}
 
-- 
2.49.0


^ permalink raw reply related

* [PATCH] rdma: display resource usage rate in resource show
From: Tao Cui @ 2026-04-23  6:47 UTC (permalink / raw)
  To: dsahern, leon; +Cc: netdev, linux-rdma, Tao Cui

Parse the new RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_MAX netlink attribute
to show resource usage percentage alongside current counts:

  Before: 0: mlx5_0: qp 123  cq 45  mr 200  pd 10
  After:  0: mlx5_0: qp 123 (0.0%)  cq 45 (0.0%)  mr 200 (0.0%)  pd 10 (0.0%)

JSON output gains "max" and "usage" fields per resource type.
Backward compatible: no output change when kernel lacks the new attribute.

Link: https://lore.kernel.org/all/20260423061352.359749-1-cuitao@kylinos.cn/

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 rdma/include/uapi/rdma/rdma_netlink.h |  5 +++++
 rdma/res.c                            | 15 +++++++++++++++
 rdma/utils.c                          |  1 +
 3 files changed, 21 insertions(+)

diff --git a/rdma/include/uapi/rdma/rdma_netlink.h b/rdma/include/uapi/rdma/rdma_netlink.h
index 8709e558..12ca07ef 100644
--- a/rdma/include/uapi/rdma/rdma_netlink.h
+++ b/rdma/include/uapi/rdma/rdma_netlink.h
@@ -604,6 +604,11 @@ enum rdma_nldev_attr {
 	RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED,		/* u32 */
 	RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY, /* u64 */
 
+	/*
+	 * Resource summary entry maximum value, used to calculate usage rate.
+	 */
+	RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_MAX,		/* u64 */
+
 	/*
 	 * Always the end
 	 */
diff --git a/rdma/res.c b/rdma/res.c
index 062f0007..cf9c79a3 100644
--- a/rdma/res.c
+++ b/rdma/res.c
@@ -56,6 +56,21 @@ static int res_print_summary(struct nlattr **tb)
 		name = mnl_attr_get_str(nla_line[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_NAME]);
 		curr = mnl_attr_get_u64(nla_line[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR]);
 		res_print_u64(name, curr, nla_line[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_CURR]);
+
+		if (nla_line[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_MAX]) {
+			uint64_t max;
+
+			max = mnl_attr_get_u64(
+				nla_line[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_MAX]);
+			print_u64(PRINT_JSON, "max", NULL, max);
+			if (max > 0) {
+				char usage_str[32];
+
+				snprintf(usage_str, sizeof(usage_str), "(%.1f%%)",
+					 (double)curr * 100.0 / (double)max);
+				print_string(PRINT_ANY, "usage", "%s ", usage_str);
+			}
+		}
 	}
 	return 0;
 }
diff --git a/rdma/utils.c b/rdma/utils.c
index 87003b2c..90ea1c55 100644
--- a/rdma/utils.c
+++ b/rdma/utils.c
@@ -480,6 +480,7 @@ static const enum mnl_attr_data_type nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_EVENT_TYPE] = MNL_TYPE_U8,
 	[RDMA_NLDEV_SYS_ATTR_MONITOR_MODE] = MNL_TYPE_U8,
 	[RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED] = MNL_TYPE_U8,
+	[RDMA_NLDEV_ATTR_RES_SUMMARY_ENTRY_MAX] = MNL_TYPE_U64,
 };
 
 static int rd_attr_check(const struct nlattr *attr, int *typep)
-- 
2.43.0


^ permalink raw reply related

* [PATCH net] vrf: Fix a potential NPD when removing a port from a VRF
From: Ido Schimmel @ 2026-04-23  6:36 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, dsahern, jiri, andrew+netdev,
	royenheart, yifanwucs, tomapufckgml, yuantan098, Ido Schimmel

RCU readers that identified a net device as a VRF port using
netif_is_l3_slave() assume that a subsequent call to
netdev_master_upper_dev_get_rcu() will return a VRF device. They then
continue to dereference its l3mdev operations.

This assumption is not always correct and can result in a NPD [1]. There
is no RCU synchronization when removing a port from a VRF, so it is
possible for an RCU reader to see a new master device (e.g., a bridge)
that does not have l3mdev operations.

Fix by adding RCU synchronization after clearing the IFF_L3MDEV_SLAVE
flag. Skip this synchronization when a net device is removed from a VRF
as part of its deletion and when the VRF device itself is deleted. In
the latter case an RCU grace period will pass by the time RTNL is
released.

[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
RIP: 0010:l3mdev_fib_table_rcu (net/l3mdev/l3mdev.c:181)
[...]
Call Trace:
<TASK>
l3mdev_fib_table_by_index (net/l3mdev/l3mdev.c:201 net/l3mdev/l3mdev.c:189)
__inet_bind (net/ipv4/af_inet.c:499 (discriminator 3))
inet_bind_sk (net/ipv4/af_inet.c:469)
__sys_bind (./include/linux/file.h:62 (discriminator 1) ./include/linux/file.h:83 (discriminator 1) net/socket.c:1951 (discriminator 1))
__x64_sys_bind (net/socket.c:1969 (discriminator 1) net/socket.c:1967 (discriminator 1) net/socket.c:1967 (discriminator 1))
do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Fixes: fdeea7be88b1 ("net: vrf: Set slave's private flag before linking")
Reported-by: Haoze Xie <royenheart@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Yuan Tan <yuantan098@gmail.com>
Closes: https://lore.kernel.org/netdev/20260419145332.3988923-1-n05ec@lzu.edu.cn/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/net/vrf.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 2cf2dbd1c12f..46209917ae4d 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1034,6 +1034,7 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev,
 
 err:
 	port_dev->priv_flags &= ~IFF_L3MDEV_SLAVE;
+	synchronize_net();
 	return ret;
 }
 
@@ -1053,10 +1054,16 @@ static int vrf_add_slave(struct net_device *dev, struct net_device *port_dev,
 }
 
 /* inverse of do_vrf_add_slave */
-static int do_vrf_del_slave(struct net_device *dev, struct net_device *port_dev)
+static int do_vrf_del_slave(struct net_device *dev, struct net_device *port_dev,
+			    bool needs_sync)
 {
 	netdev_upper_dev_unlink(port_dev, dev);
 	port_dev->priv_flags &= ~IFF_L3MDEV_SLAVE;
+	/* Make sure that concurrent RCU readers that identified the device
+	 * as a VRF port see a VRF master or no master at all.
+	 */
+	if (needs_sync)
+		synchronize_net();
 
 	cycle_netdev(port_dev, NULL);
 
@@ -1065,7 +1072,7 @@ static int do_vrf_del_slave(struct net_device *dev, struct net_device *port_dev)
 
 static int vrf_del_slave(struct net_device *dev, struct net_device *port_dev)
 {
-	return do_vrf_del_slave(dev, port_dev);
+	return do_vrf_del_slave(dev, port_dev, true);
 }
 
 static void vrf_dev_uninit(struct net_device *dev)
@@ -1619,7 +1626,7 @@ static void vrf_dellink(struct net_device *dev, struct list_head *head)
 	struct list_head *iter;
 
 	netdev_for_each_lower_dev(dev, port_dev, iter)
-		vrf_del_slave(dev, port_dev);
+		do_vrf_del_slave(dev, port_dev, false);
 
 	vrf_map_unregister_dev(dev);
 
@@ -1751,7 +1758,7 @@ static int vrf_device_event(struct notifier_block *unused,
 			goto out;
 
 		vrf_dev = netdev_master_upper_dev_get(dev);
-		vrf_del_slave(vrf_dev, dev);
+		do_vrf_del_slave(vrf_dev, dev, false);
 	}
 out:
 	return NOTIFY_DONE;
-- 
2.53.0


^ permalink raw reply related

* [PATCH net] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
From: Eric Dumazet @ 2026-04-23  6:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet

fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.

Move this acquisition before we fill tc_fq_pie_xstats with live data.

Alternative would be to add READ_ONCE() and WRITE_ONCE() annotations,
but the spinlock is needed anyway to scan q->new_flows and q->old_flows.

Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_fq_pie.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 154c70f489f289066db5d61bb51e58aaf328f16e..7becbf5362b3165bac4517f32887386b01301612 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -509,18 +509,19 @@ static int fq_pie_dump(struct Qdisc *sch, struct sk_buff *skb)
 static int fq_pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 {
 	struct fq_pie_sched_data *q = qdisc_priv(sch);
-	struct tc_fq_pie_xstats st = {
-		.packets_in	= q->stats.packets_in,
-		.overlimit	= q->stats.overlimit,
-		.overmemory	= q->overmemory,
-		.dropped	= q->stats.dropped,
-		.ecn_mark	= q->stats.ecn_mark,
-		.new_flow_count = q->new_flow_count,
-		.memory_usage   = q->memory_usage,
-	};
+	struct tc_fq_pie_xstats st = { 0 };
 	struct list_head *pos;
 
 	sch_tree_lock(sch);
+
+	st.packets_in	= q->stats.packets_in;
+	st.overlimit	= q->stats.overlimit;
+	st.overmemory	= q->overmemory;
+	st.dropped	= q->stats.dropped;
+	st.ecn_mark	= q->stats.ecn_mark;
+	st.new_flow_count = q->new_flow_count;
+	st.memory_usage   = q->memory_usage;
+
 	list_for_each(pos, &q->new_flows)
 		st.new_flows_len++;
 
-- 
2.54.0.rc2.544.gc7ae2d5bb8-goog


^ permalink raw reply related

* [PATCH net] net/sched: sch_choke: annotate data-races in choke_dump_stats()
From: Eric Dumazet @ 2026-04-23  6:28 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet

choke_dump_stats() only runs with RTNL held.
It reads fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_choke.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 94df8e741a979191a06885ad3ee813f12650ff3c..2875bcdb18a413075c795665e95f9dbbaac45962 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -229,7 +229,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 		/* Draw a packet at random from queue and compare flow */
 		if (choke_match_random(q, skb, &idx)) {
-			q->stats.matched++;
+			WRITE_ONCE(q->stats.matched, q->stats.matched + 1);
 			choke_drop_by_idx(sch, idx, to_free);
 			goto congestion_drop;
 		}
@@ -241,11 +241,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			qdisc_qstats_overlimit(sch);
 			if (use_harddrop(q) || !use_ecn(q) ||
 			    !INET_ECN_set_ce(skb)) {
-				q->stats.forced_drop++;
+				WRITE_ONCE(q->stats.forced_drop,
+					   q->stats.forced_drop + 1);
 				goto congestion_drop;
 			}
 
-			q->stats.forced_mark++;
+			WRITE_ONCE(q->stats.forced_mark,
+				   q->stats.forced_mark + 1);
 		} else if (++q->vars.qcount) {
 			if (red_mark_probability(p, &q->vars, q->vars.qavg)) {
 				q->vars.qcount = 0;
@@ -253,11 +255,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 				qdisc_qstats_overlimit(sch);
 				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
-					q->stats.prob_drop++;
+					WRITE_ONCE(q->stats.prob_drop,
+					           q->stats.prob_drop + 1);
 					goto congestion_drop;
 				}
 
-				q->stats.prob_mark++;
+				WRITE_ONCE(q->stats.prob_mark,
+					   q->stats.prob_mark + 1);
 			}
 		} else
 			q->vars.qR = red_random(p);
@@ -272,7 +276,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return NET_XMIT_SUCCESS;
 	}
 
-	q->stats.pdrop++;
+	WRITE_ONCE(q->stats.pdrop, q->stats.pdrop + 1);
 	return qdisc_drop(skb, sch, to_free);
 
 congestion_drop:
@@ -461,10 +465,12 @@ static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 	struct tc_choke_xstats st = {
-		.early	= q->stats.prob_drop + q->stats.forced_drop,
-		.marked	= q->stats.prob_mark + q->stats.forced_mark,
-		.pdrop	= q->stats.pdrop,
-		.matched = q->stats.matched,
+		.early	= READ_ONCE(q->stats.prob_drop) +
+			  READ_ONCE(q->stats.forced_drop),
+		.marked	= READ_ONCE(q->stats.prob_mark) +
+			  READ_ONCE(q->stats.forced_mark),
+		.pdrop	= READ_ONCE(q->stats.pdrop),
+		.matched = READ_ONCE(q->stats.matched),
 	};
 
 	return gnet_stats_copy_app(d, &st, sizeof(st));
-- 
2.54.0.rc2.544.gc7ae2d5bb8-goog


^ permalink raw reply related

* [PATCH v11 12/12] x86/vmscape: Add cmdline vmscape=on to override attack vector controls
From: Pawan Gupta @ 2026-04-23  6:17 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

In general, individual mitigation knobs override the attack vector
controls. For VMSCAPE, =ibpb exists but nothing to select BHB clearing
mitigation. The =force option would select BHB clearing when supported, but
with a side-effect of also forcing the bug, hence deploying the mitigation
on unaffected parts too.

Add a new cmdline option vmscape=on to enable the mitigation based on the
VMSCAPE variant the CPU is affected by.

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 Documentation/admin-guide/hw-vuln/vmscape.rst   | 4 ++++
 Documentation/admin-guide/kernel-parameters.txt | 2 ++
 arch/x86/kernel/cpu/bugs.c                      | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index 7c40cf70ad7a..2558a5c3d956 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -117,3 +117,7 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
 
    Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
    (default when CONFIG_MITIGATION_VMSCAPE=y)
+
+ * ``vmscape=on``:
+
+   Same as ``auto``, except that it overrides attack vector controls.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3853c7109419..98204d464477 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8383,6 +8383,8 @@ Kernel parameters
 					  unaffected processors
 			auto		- (default) use IBPB or BHB clear
 					  mitigation based on CPU
+			on		- same as "auto", but override attack
+					  vector control
 
 	vsyscall=	[X86-64,EARLY]
 			Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index fbdb137720c4..4e0b77fb21dd 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3088,6 +3088,8 @@ static int __init vmscape_parse_cmdline(char *str)
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
 		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
+	} else if (!strcmp(str, "on")) {
+		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
 	} else if (!strcmp(str, "auto")) {
 		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
 	} else {

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 11/12] x86/vmscape: Resolve conflict between attack-vectors and vmscape=force
From: Pawan Gupta @ 2026-04-23  6:17 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

vmscape=force option currently defaults to AUTO mitigation. This lets
attack-vector controls to override the vmscape mitigation. Preventing the
user from being able to force VMSCAPE mitigation.

When vmscape mitigation is forced, allow it be deployed irrespective of
attack vectors. Introduce VMSCAPE_MITIGATION_ON that wins over
attack-vector controls.

Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/kernel/cpu/bugs.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1082ed1fb2e6..fbdb137720c4 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3058,6 +3058,7 @@ static void __init srso_apply_mitigation(void)
 enum vmscape_mitigations {
 	VMSCAPE_MITIGATION_NONE,
 	VMSCAPE_MITIGATION_AUTO,
+	VMSCAPE_MITIGATION_ON,
 	VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
 	VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
 	VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
@@ -3066,6 +3067,7 @@ enum vmscape_mitigations {
 static const char * const vmscape_strings[] = {
 	[VMSCAPE_MITIGATION_NONE]			= "Vulnerable",
 	/* [VMSCAPE_MITIGATION_AUTO] */
+	/* [VMSCAPE_MITIGATION_ON] */
 	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]		= "Mitigation: IBPB before exit to userspace",
 	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]		= "Mitigation: IBPB on VMEXIT",
 	[VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER]	= "Mitigation: Clear BHB before exit to userspace",
@@ -3085,7 +3087,7 @@ static int __init vmscape_parse_cmdline(char *str)
 		vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
-		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
 	} else if (!strcmp(str, "auto")) {
 		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
 	} else {
@@ -3117,6 +3119,7 @@ static void __init vmscape_select_mitigation(void)
 		break;
 
 	case VMSCAPE_MITIGATION_AUTO:
+	case VMSCAPE_MITIGATION_ON:
 		/*
 		 * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use
 		 * BHB clear sequence. These CPUs are only vulnerable to the BHI
@@ -3244,6 +3247,7 @@ void cpu_bugs_smt_update(void)
 	switch (vmscape_mitigation) {
 	case VMSCAPE_MITIGATION_NONE:
 	case VMSCAPE_MITIGATION_AUTO:
+	case VMSCAPE_MITIGATION_ON:
 		break;
 	case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
 	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 10/12] x86/vmscape: Deploy BHB clearing mitigation
From: Pawan Gupta @ 2026-04-23  6:17 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

IBPB mitigation for VMSCAPE is an overkill on CPUs that are only affected
by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
indirect branch isolation between guest and host userspace. However, branch
history from guest may also influence the indirect branches in host
userspace.

To mitigate the BHI aspect, use the BHB clearing sequence. Since now, IBPB
is not the only mitigation for VMSCAPE, update the documentation to reflect
that =auto could select either IBPB or BHB clear mitigation based on the
CPU.

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 Documentation/admin-guide/hw-vuln/vmscape.rst   | 11 ++++++++-
 Documentation/admin-guide/kernel-parameters.txt |  4 +++-
 arch/x86/include/asm/entry-common.h             |  4 ++++
 arch/x86/include/asm/nospec-branch.h            |  2 ++
 arch/x86/kernel/cpu/bugs.c                      | 30 +++++++++++++++++++------
 5 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index d9b9a2b6c114..7c40cf70ad7a 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -86,6 +86,10 @@ The possible values in this file are:
    run a potentially malicious guest and issues an IBPB before the first
    exit to userspace after VM-exit.
 
+ * 'Mitigation: Clear BHB before exit to userspace':
+
+   As above, conditional BHB clearing mitigation is enabled.
+
  * 'Mitigation: IBPB on VMEXIT':
 
    IBPB is issued on every VM-exit. This occurs when other mitigations like
@@ -102,9 +106,14 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
 
  * ``vmscape=ibpb``:
 
-   Enable conditional IBPB mitigation (default when CONFIG_MITIGATION_VMSCAPE=y).
+   Enable conditional IBPB mitigation.
 
  * ``vmscape=force``:
 
    Force vulnerability detection and mitigation even on processors that are
    not known to be affected.
+
+ * ``vmscape=auto``:
+
+   Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
+   (default when CONFIG_MITIGATION_VMSCAPE=y)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 03a550630644..3853c7109419 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8378,9 +8378,11 @@ Kernel parameters
 
 			off		- disable the mitigation
 			ibpb		- use Indirect Branch Prediction Barrier
-					  (IBPB) mitigation (default)
+					  (IBPB) mitigation
 			force		- force vulnerability detection even on
 					  unaffected processors
+			auto		- (default) use IBPB or BHB clear
+					  mitigation based on CPU
 
 	vsyscall=	[X86-64,EARLY]
 			Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 783e7cb50cae..13db31472f3a 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -96,6 +96,10 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	choose_random_kstack_offset(rdtsc());
 
 	if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+		/*
+		 * Since the mitigation is for userspace, an explicit
+		 * speculation barrier is not required after flush.
+		 */
 		static_call_cond(vmscape_predictor_flush)();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 066fd8095200..38478383139b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -390,6 +390,8 @@ extern void write_ibpb(void);
 
 #ifdef CONFIG_X86_64
 extern void clear_bhb_loop_nofence(void);
+#else
+static inline void clear_bhb_loop_nofence(void) {}
 #endif
 
 extern void (*x86_return_thunk)(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bfc0e41697f6..1082ed1fb2e6 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -61,9 +61,8 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
 EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
 
 /*
- * Set when the CPU has run a potentially malicious guest. An IBPB will
- * be needed to before running userspace. That IBPB will flush the branch
- * predictor content.
+ * Set when the CPU has run a potentially malicious guest. Indicates that a
+ * branch predictor flush is needed before running userspace.
  */
 DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
 EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
@@ -3061,13 +3060,15 @@ enum vmscape_mitigations {
 	VMSCAPE_MITIGATION_AUTO,
 	VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
 	VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
+	VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
 };
 
 static const char * const vmscape_strings[] = {
-	[VMSCAPE_MITIGATION_NONE]		= "Vulnerable",
+	[VMSCAPE_MITIGATION_NONE]			= "Vulnerable",
 	/* [VMSCAPE_MITIGATION_AUTO] */
-	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]	= "Mitigation: IBPB before exit to userspace",
-	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]	= "Mitigation: IBPB on VMEXIT",
+	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]		= "Mitigation: IBPB before exit to userspace",
+	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]		= "Mitigation: IBPB on VMEXIT",
+	[VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER]	= "Mitigation: Clear BHB before exit to userspace",
 };
 
 static enum vmscape_mitigations vmscape_mitigation __ro_after_init =
@@ -3085,6 +3086,8 @@ static int __init vmscape_parse_cmdline(char *str)
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
 		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+	} else if (!strcmp(str, "auto")) {
+		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
 	} else {
 		pr_err("Ignoring unknown vmscape=%s option.\n", str);
 	}
@@ -3114,7 +3117,17 @@ static void __init vmscape_select_mitigation(void)
 		break;
 
 	case VMSCAPE_MITIGATION_AUTO:
-		if (boot_cpu_has(X86_FEATURE_IBPB))
+		/*
+		 * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use
+		 * BHB clear sequence. These CPUs are only vulnerable to the BHI
+		 * variant of the VMSCAPE attack, and thus they do not require a
+		 * full predictor flush.
+		 *
+		 * Note, in 32-bit mode BHB clear sequence is not supported.
+		 */
+		if (boot_cpu_has(X86_FEATURE_BHI_CTRL) && IS_ENABLED(CONFIG_X86_64))
+			vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER;
+		else if (boot_cpu_has(X86_FEATURE_IBPB))
 			vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 		else
 			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
@@ -3141,6 +3154,8 @@ static void __init vmscape_apply_mitigation(void)
 {
 	if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
 		static_call_update(vmscape_predictor_flush, write_ibpb);
+	else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER)
+		static_call_update(vmscape_predictor_flush, clear_bhb_loop_nofence);
 }
 
 #undef pr_fmt
@@ -3232,6 +3247,7 @@ void cpu_bugs_smt_update(void)
 		break;
 	case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
 	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+	case VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER:
 		/*
 		 * Hypervisors can be attacked across-threads, warn for SMT when
 		 * STIBP is not already enabled system-wide.

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 09/12] x86/vmscape: Use static_call() for predictor flush
From: Pawan Gupta @ 2026-04-23  6:17 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

Adding more mitigation options at exit-to-userspace for VMSCAPE would
usually require a series of checks to decide which mitigation to use. In
this case, the mitigation is done by calling a function, which is decided
at boot. So, adding more feature flags and multiple checks can be avoided
by using static_call() to the mitigating function.

Replace the flag-based mitigation selector with a static_call(). This also
frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/Kconfig                     | 1 +
 arch/x86/include/asm/cpufeatures.h   | 2 +-
 arch/x86/include/asm/entry-common.h  | 7 +++----
 arch/x86/include/asm/nospec-branch.h | 3 +++
 arch/x86/kernel/cpu/bugs.c           | 9 ++++++++-
 arch/x86/kvm/x86.c                   | 2 +-
 6 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e2df1b147184..5b8def9ddb98 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2720,6 +2720,7 @@ config MITIGATION_TSA
 config MITIGATION_VMSCAPE
 	bool "Mitigate VMSCAPE"
 	depends on KVM
+	depends on HAVE_STATIC_CALL
 	default y
 	help
 	  Enable mitigation for VMSCAPE attacks. VMSCAPE is a hardware security
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dbe104df339b..b4d529dd6d30 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -503,7 +503,7 @@
 #define X86_FEATURE_TSA_SQ_NO		(21*32+11) /* AMD CPU not vulnerable to TSA-SQ */
 #define X86_FEATURE_TSA_L1_NO		(21*32+12) /* AMD CPU not vulnerable to TSA-L1 */
 #define X86_FEATURE_CLEAR_CPU_BUF_VM	(21*32+13) /* Clear CPU buffers using VERW before VMRUN */
-#define X86_FEATURE_IBPB_EXIT_TO_USER	(21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
+/* Free */
 #define X86_FEATURE_ABMC		(21*32+15) /* Assignable Bandwidth Monitoring Counters */
 #define X86_FEATURE_MSR_IMM		(21*32+16) /* MSR immediate form instructions */
 #define X86_FEATURE_SGX_EUPDATESVN	(21*32+17) /* Support for ENCLS[EUPDATESVN] instruction */
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 78b143673ca7..783e7cb50cae 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -4,6 +4,7 @@
 
 #include <linux/randomize_kstack.h>
 #include <linux/user-return-notifier.h>
+#include <linux/static_call_types.h>
 
 #include <asm/nospec-branch.h>
 #include <asm/io_bitmap.h>
@@ -94,10 +95,8 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	 */
 	choose_random_kstack_offset(rdtsc());
 
-	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
-	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
-	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
-		write_ibpb();
+	if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+		static_call_cond(vmscape_predictor_flush)();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 0381db59c39d..066fd8095200 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -542,6 +542,9 @@ static inline void indirect_branch_prediction_barrier(void)
 			    :: "rax", "rcx", "rdx", "memory");
 }
 
+#include <linux/static_call_types.h>
+DECLARE_STATIC_CALL(vmscape_predictor_flush, write_ibpb);
+
 /* The Intel SPEC CTRL MSR base value cache */
 extern u64 x86_spec_ctrl_base;
 DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 636280c612f0..bfc0e41697f6 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -144,6 +144,13 @@ EXPORT_SYMBOL_GPL(cpu_buf_idle_clear);
  */
 DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
 
+/*
+ * Controls how vmscape is mitigated e.g. via IBPB or BHB-clear
+ * sequence. This defaults to no mitigation.
+ */
+DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb);
+EXPORT_STATIC_CALL_FOR_KVM(vmscape_predictor_flush);
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"mitigations: " fmt
 
@@ -3133,7 +3140,7 @@ static void __init vmscape_update_mitigation(void)
 static void __init vmscape_apply_mitigation(void)
 {
 	if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
-		setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
+		static_call_update(vmscape_predictor_flush, write_ibpb);
 }
 
 #undef pr_fmt
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45d7cfedc507..5582056b2fa1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11463,7 +11463,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * set for the CPU that actually ran the guest, and not the CPU that it
 	 * may migrate to.
 	 */
-	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
+	if (static_call_query(vmscape_predictor_flush))
 		this_cpu_write(x86_predictor_flush_exit_to_user, true);
 
 	/*

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 08/12] kvm: Define EXPORT_STATIC_CALL_FOR_KVM()
From: Pawan Gupta @ 2026-04-23  6:16 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

EXPORT_SYMBOL_FOR_KVM() exists to export symbols to KVM modules. Static
calls need the same treatment when the core kernel defines a static_call
that KVM needs access to (e.g. from a VM-exit path).

Define EXPORT_STATIC_CALL_FOR_KVM() as the static_call analogue of
EXPORT_SYMBOL_FOR_KVM(). The same three-way logic applies:

  - KVM_SUB_MODULES defined: export to "kvm," plus all sub-modules
  - KVM=m, no sub-modules: export to "kvm" only
  - KVM built-in: no export needed (noop)

  As with EXPORT_SYMBOL_FOR_KVM(), allow architectures to override both
  macros (e.g. to suppress the export when kvm.ko itself will not be
  built despite CONFIG_KVM=m). Add the x86 no-op overrides in
  arch/x86/include/asm/kvm_types.h for that case. To keep the pair in
  sync, EXPORT_STATIC_CALL_FOR_KVM() is defined inside the
  EXPORT_SYMBOL_FOR_KVM #ifndef block; an arch that defines
  EXPORT_SYMBOL_FOR_KVM must also define EXPORT_STATIC_CALL_FOR_KVM or the
  build will fail with a compile-time error.

As with EXPORT_SYMBOL_FOR_KVM(), allow architectures to override
EXPORT_STATIC_CALL_FOR_KVM definition (e.g. to suppress the export when
kvm.ko itself will not be built despite CONFIG_KVM=m). Add the x86 no-op
override in arch/x86/include/asm/kvm_types.h for that case.

Architectures must also define EXPORT_STATIC_CALL_FOR_KVM when they define
EXPORT_SYMBOL_FOR_KVM.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/include/asm/kvm_types.h |  1 +
 include/linux/kvm_types.h        | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_types.h
index d7c704ed1be9..bceeaed2940e 100644
--- a/arch/x86/include/asm/kvm_types.h
+++ b/arch/x86/include/asm/kvm_types.h
@@ -15,6 +15,7 @@
  * at least one vendor module is enabled.
  */
 #define EXPORT_SYMBOL_FOR_KVM(symbol)
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol)
 #endif
 
 #define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a568d8e6f4e8..be602d3f287e 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -13,6 +13,8 @@
 	EXPORT_SYMBOL_FOR_MODULES(symbol, __stringify(KVM_SUB_MODULES))
 #define EXPORT_SYMBOL_FOR_KVM(symbol) \
 	EXPORT_SYMBOL_FOR_MODULES(symbol, "kvm," __stringify(KVM_SUB_MODULES))
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol) \
+	EXPORT_STATIC_CALL_FOR_MODULES(symbol, "kvm," __stringify(KVM_SUB_MODULES))
 #else
 #define EXPORT_SYMBOL_FOR_KVM_INTERNAL(symbol)
 /*
@@ -23,11 +25,17 @@
 #ifndef EXPORT_SYMBOL_FOR_KVM
 #if IS_MODULE(CONFIG_KVM)
 #define EXPORT_SYMBOL_FOR_KVM(symbol) EXPORT_SYMBOL_FOR_MODULES(symbol, "kvm")
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol) EXPORT_STATIC_CALL_FOR_MODULES(symbol, "kvm")
 #else
 #define EXPORT_SYMBOL_FOR_KVM(symbol)
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol)
 #endif /* IS_MODULE(CONFIG_KVM) */
-#endif /* EXPORT_SYMBOL_FOR_KVM */
+#else
+#ifndef EXPORT_STATIC_CALL_FOR_KVM
+#error Must #define EXPORT_STATIC_CALL_FOR_KVM if #defining EXPORT_SYMBOL_FOR_KVM
 #endif
+#endif /* EXPORT_SYMBOL_FOR_KVM */
+#endif /* KVM_SUB_MODULES */
 
 #ifndef __ASSEMBLER__
 

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 07/12] static_call: Define EXPORT_STATIC_CALL_FOR_MODULES()
From: Pawan Gupta @ 2026-04-23  6:16 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

There is EXPORT_STATIC_CALL_TRAMP() that hides the static key from all
modules. But there is no equivalent of EXPORT_SYMBOL_FOR_MODULES() to
restrict symbol visibility to only certain modules.

Add EXPORT_STATIC_CALL_FOR_MODULES(name, mods) that wraps both the key and
the trampoline with EXPORT_SYMBOL_FOR_MODULES(), allowing only a limited
set of modules to see and update the static key.

The immediate user is KVM, in the following commit.

checkpatch reported below warnings with this change that I believe don't
apply in this case:

  include/linux/static_call.h:219: WARNING: Non-declarative macros with multiple statements should be enclosed in a do - while loop
  include/linux/static_call.h:220: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 include/linux/static_call.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 78a77a4ae0ea..b610afd1ed55 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -216,6 +216,9 @@ extern long __static_call_return0(void);
 #define EXPORT_STATIC_CALL_GPL(name)					\
 	EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name));			\
 	EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name))
+#define EXPORT_STATIC_CALL_FOR_MODULES(name, mods)			\
+	EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_KEY(name), mods);		\
+	EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_TRAMP(name), mods)
 
 /* Leave the key unexported, so modules can't change static call targets: */
 #define EXPORT_STATIC_CALL_TRAMP(name)					\
@@ -276,6 +279,9 @@ extern long __static_call_return0(void);
 #define EXPORT_STATIC_CALL_GPL(name)					\
 	EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name));			\
 	EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name))
+#define EXPORT_STATIC_CALL_FOR_MODULES(name, mods)			\
+	EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_KEY(name), mods);		\
+	EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_TRAMP(name), mods)
 
 /* Leave the key unexported, so modules can't change static call targets: */
 #define EXPORT_STATIC_CALL_TRAMP(name)					\
@@ -346,6 +352,8 @@ static inline int static_call_text_reserved(void *start, void *end)
 
 #define EXPORT_STATIC_CALL(name)	EXPORT_SYMBOL(STATIC_CALL_KEY(name))
 #define EXPORT_STATIC_CALL_GPL(name)	EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name))
+#define EXPORT_STATIC_CALL_FOR_MODULES(name, mods)			\
+	EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_KEY(name), mods)
 
 #endif /* CONFIG_HAVE_STATIC_CALL */
 

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 06/12] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier()
From: Pawan Gupta @ 2026-04-23  6:16 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

indirect_branch_prediction_barrier() is a wrapper to write_ibpb(), which
also checks if the CPU supports IBPB. For VMSCAPE, call to
indirect_branch_prediction_barrier() is only possible when CPU supports
IBPB.

Simply call write_ibpb() directly to avoid unnecessary alternative
patching.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/include/asm/entry-common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index c45858db16c9..78b143673ca7 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -97,7 +97,7 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
 	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
-		indirect_branch_prediction_barrier();
+		write_ibpb();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 05/12] x86/vmscape: Move mitigation selection to a switch()
From: Pawan Gupta @ 2026-04-23  6:16 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

This ensures that all mitigation modes are explicitly handled, while
keeping the mitigation selection for each mode together. This also prepares
for adding BHB-clearing mitigation mode for VMSCAPE.

Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/kernel/cpu/bugs.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 002bf4adccc3..636280c612f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3088,17 +3088,33 @@ early_param("vmscape", vmscape_parse_cmdline);
 
 static void __init vmscape_select_mitigation(void)
 {
-	if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) ||
-	    !boot_cpu_has(X86_FEATURE_IBPB)) {
+	if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) {
 		vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
 		return;
 	}
 
-	if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) {
-		if (should_mitigate_vuln(X86_BUG_VMSCAPE))
+	if ((vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) &&
+	    !should_mitigate_vuln(X86_BUG_VMSCAPE))
+		vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+
+	switch (vmscape_mitigation) {
+	case VMSCAPE_MITIGATION_NONE:
+		break;
+
+	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+		if (!boot_cpu_has(X86_FEATURE_IBPB))
+			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+		break;
+
+	case VMSCAPE_MITIGATION_AUTO:
+		if (boot_cpu_has(X86_FEATURE_IBPB))
 			vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 		else
 			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+		break;
+
+	default:
+		break;
 	}
 }
 

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 04/12] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
From: Pawan Gupta @ 2026-04-23  6:15 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

With the upcoming changes x86_ibpb_exit_to_user will also be used when BHB
clearing sequence is used. Rename it cover both the cases.

No functional change.

Suggested-by: Sean Christopherson <seanjc@google.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/include/asm/entry-common.h  | 6 +++---
 arch/x86/include/asm/nospec-branch.h | 2 +-
 arch/x86/kernel/cpu/bugs.c           | 4 ++--
 arch/x86/kvm/x86.c                   | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index ce3eb6d5fdf9..c45858db16c9 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -94,11 +94,11 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	 */
 	choose_random_kstack_offset(rdtsc());
 
-	/* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
+	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
-	    this_cpu_read(x86_ibpb_exit_to_user)) {
+	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
 		indirect_branch_prediction_barrier();
-		this_cpu_write(x86_ibpb_exit_to_user, false);
+		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }
 #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 157eb69c7f0f..0381db59c39d 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -533,7 +533,7 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
 		: "memory");
 }
 
-DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
+DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
 
 static inline void indirect_branch_prediction_barrier(void)
 {
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 2cb4a96247d8..002bf4adccc3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -65,8 +65,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
  * be needed to before running userspace. That IBPB will flush the branch
  * predictor content.
  */
-DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user);
-EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user);
+DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
+EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
 
 u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd1c4a36b593..45d7cfedc507 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11464,7 +11464,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * may migrate to.
 	 */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
-		this_cpu_write(x86_ibpb_exit_to_user, true);
+		this_cpu_write(x86_predictor_flush_exit_to_user, true);
 
 	/*
 	 * Consume any pending interrupts, including the possible source of

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 03/12] x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence()
From: Pawan Gupta @ 2026-04-23  6:15 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

To reflect the recent change that moved LFENCE to the caller side.

Suggested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            | 8 ++++----
 arch/x86/include/asm/nospec-branch.h | 6 +++---
 arch/x86/net/bpf_jit_comp.c          | 2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bbd4b1c7ec04..1f56d086d312 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1532,7 +1532,7 @@ SYM_CODE_END(rewind_stack_and_make_dead)
  * Note, callers should use a speculation barrier like LFENCE immediately after
  * a call to this function to ensure BHB is cleared before indirect branches.
  */
-SYM_FUNC_START(clear_bhb_loop)
+SYM_FUNC_START(clear_bhb_loop_nofence)
 	ANNOTATE_NOENDBR
 	push	%rbp
 	mov	%rsp, %rbp
@@ -1570,6 +1570,6 @@ SYM_FUNC_START(clear_bhb_loop)
 5:
 	pop	%rbp
 	RET
-SYM_FUNC_END(clear_bhb_loop)
-EXPORT_SYMBOL_FOR_KVM(clear_bhb_loop)
-STACK_FRAME_NON_STANDARD(clear_bhb_loop)
+SYM_FUNC_END(clear_bhb_loop_nofence)
+EXPORT_SYMBOL_FOR_KVM(clear_bhb_loop_nofence)
+STACK_FRAME_NON_STANDARD(clear_bhb_loop_nofence)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 87b83ae7c97f..157eb69c7f0f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -331,11 +331,11 @@
 
 #ifdef CONFIG_X86_64
 .macro CLEAR_BRANCH_HISTORY
-	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
+	ALTERNATIVE "", "call clear_bhb_loop_nofence; lfence", X86_FEATURE_CLEAR_BHB_LOOP
 .endm
 
 .macro CLEAR_BRANCH_HISTORY_VMEXIT
-	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
+	ALTERNATIVE "", "call clear_bhb_loop_nofence; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
 .endm
 #else
 #define CLEAR_BRANCH_HISTORY
@@ -389,7 +389,7 @@ extern void entry_untrain_ret(void);
 extern void write_ibpb(void);
 
 #ifdef CONFIG_X86_64
-extern void clear_bhb_loop(void);
+extern void clear_bhb_loop_nofence(void);
 #endif
 
 extern void (*x86_return_thunk)(void);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 63d6c9fa5e80..f40e88f87273 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1619,7 +1619,7 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
 		EMIT1(0x51); /* push rcx */
 		ip += 2;
 
-		func = (u8 *)clear_bhb_loop;
+		func = (u8 *)clear_bhb_loop_nofence;
 		ip += x86_call_depth_emit_accounting(&prog, func, ip);
 
 		if (emit_call(&prog, func, ip))

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 02/12] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Pawan Gupta @ 2026-04-23  6:15 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
the Branch History Buffer (BHB). On Alder Lake and newer parts this
sequence is not sufficient because it doesn't clear enough entries. This
was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
in the kernel.

Now with VMSCAPE (BHI variant) it is also required to isolate branch
history between guests and userspace. Since BHI_DIS_S only protects the
kernel, the newer CPUs also use IBPB.

A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
But it currently does not clear enough BHB entries to be effective on newer
CPUs with larger BHB. At boot, dynamically set the loop count of
clear_bhb_loop() such that it is effective on newer CPUs too.

Introduce global loop counts, initializing them with appropriate value
based on the hardware feature X86_FEATURE_BHI_CTRL.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            |  8 +++++---
 arch/x86/include/asm/nospec-branch.h |  2 ++
 arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3a180a36ca0e..bbd4b1c7ec04 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
 	ANNOTATE_NOENDBR
 	push	%rbp
 	mov	%rsp, %rbp
-	movl	$5, %ecx
+
+	movzbl    bhb_seq_outer_loop(%rip), %ecx
+
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call	1f
 	jmp	5f
@@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
 	 * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
 	 * but some Clang versions (e.g. 18) don't like this.
 	 */
-	.skip 32 - 18, 0xcc
-2:	movl	$5, %eax
+	.skip 32 - 20, 0xcc
+2:	movzbl  bhb_seq_inner_loop(%rip), %eax
 3:	jmp	4f
 	nop
 4:	sub	$1, %eax
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 70b377fcbc1c..87b83ae7c97f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
 extern void update_spec_ctrl_cond(u64 val);
 extern u64 spec_ctrl_current(void);
 
+extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
+
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 83f51cab0b1e..2cb4a96247d8 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2047,6 +2047,10 @@ enum bhi_mitigations {
 static enum bhi_mitigations bhi_mitigation __ro_after_init =
 	IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
 
+/* Default to short BHB sequence values */
+u8 bhb_seq_outer_loop __ro_after_init = 5;
+u8 bhb_seq_inner_loop __ro_after_init = 5;
+
 static int __init spectre_bhi_parse_cmdline(char *str)
 {
 	if (!str)
@@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
 		x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
 	}
 
+	/*
+	 * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
+	 * support), see Intel's BHI guidance.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
+		bhb_seq_outer_loop = 12;
+		bhb_seq_inner_loop = 7;
+	}
+
 	x86_arch_cap_msr = x86_read_arch_cap_msr();
 
 	cpu_print_attack_vectors();

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 01/12] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
From: Pawan Gupta @ 2026-04-23  6:14 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260422-vmscape-bhb-v11-0-b18e0cf32af4@linux.intel.com>

Currently, the BHB clearing sequence is followed by an LFENCE to prevent
transient execution of subsequent indirect branches prematurely. However,
the LFENCE barrier could be unnecessary in certain cases. For example, when
the kernel is using the BHI_DIS_S mitigation, and BHB clearing is only
needed for userspace. In such cases, the LFENCE is redundant because ring
transitions would provide the necessary serialization.

Below is a quick recap of BHI mitigation options:

On Alder Lake and newer

    BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low
    performance overhead.

    Long loop: Alternatively, a longer version of the BHB clearing sequence
    can be used to mitigate BHI. It can also be used to mitigate the BHI
    variant of VMSCAPE. This is not yet implemented in Linux.

On older CPUs

    Short loop: Clears BHB at kernel entry and VMexit. The "Long loop" is
    effective on older CPUs as well, but should be avoided because of
    unnecessary overhead.

On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between
guest and host. But when affected by the BHI variant of VMSCAPE, a guest's
branch history may still influence indirect branches in userspace. This
also means the big hammer IBPB could be replaced with a cheaper option that
clears the BHB at exit-to-userspace after a VMexit.

In preparation for adding the support for the BHB sequence (without LFENCE)
on newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
executed. Allow callers to decide whether they need the LFENCE or not. This
adds a few extra bytes to the call sites, but it obviates the need for
multiple variants of clear_bhb_loop().

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            | 5 ++++-
 arch/x86/include/asm/nospec-branch.h | 4 ++--
 arch/x86/net/bpf_jit_comp.c          | 2 ++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 42447b1e1dff..3a180a36ca0e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1528,6 +1528,9 @@ SYM_CODE_END(rewind_stack_and_make_dead)
  * refactored in the future if needed. The .skips are for safety, to ensure
  * that all RETs are in the second half of a cacheline to mitigate Indirect
  * Target Selection, rather than taking the slowpath via its_return_thunk.
+ *
+ * Note, callers should use a speculation barrier like LFENCE immediately after
+ * a call to this function to ensure BHB is cleared before indirect branches.
  */
 SYM_FUNC_START(clear_bhb_loop)
 	ANNOTATE_NOENDBR
@@ -1562,7 +1565,7 @@ SYM_FUNC_START(clear_bhb_loop)
 	sub	$1, %ecx
 	jnz	1b
 .Lret2:	RET
-5:	lfence
+5:
 	pop	%rbp
 	RET
 SYM_FUNC_END(clear_bhb_loop)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4f4b5e8a1574..70b377fcbc1c 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -331,11 +331,11 @@
 
 #ifdef CONFIG_X86_64
 .macro CLEAR_BRANCH_HISTORY
-	ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP
+	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
 .endm
 
 .macro CLEAR_BRANCH_HISTORY_VMEXIT
-	ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_VMEXIT
+	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
 .endm
 #else
 #define CLEAR_BRANCH_HISTORY
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e9b78040d703..63d6c9fa5e80 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1624,6 +1624,8 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
 
 		if (emit_call(&prog, func, ip))
 			return -EINVAL;
+		/* Don't speculate past this until BHB is cleared */
+		EMIT_LFENCE();
 		EMIT1(0x59); /* pop rcx */
 		EMIT1(0x58); /* pop rax */
 	}

-- 
2.34.1



^ permalink raw reply related

* [PATCH v11 00/12] VMSCAPE optimization for BHI variant
From: Pawan Gupta @ 2026-04-23  6:14 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

v11:
- Use ifdef EXPORT_SYMBOL_FOR_KVM guard for EXPORT_STATIC_CALL_FOR_KVM()
  definition. It is not practical to define one but not the other. (Sean)
- Collected tags.

v10: https://lore.kernel.org/r/20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com
- Add patches to define EXPORT_STATIC_CALL_FOR_MODULES() and
  EXPORT_STATIC_CALL_FOR_KVM(), so that vmscape_predictor_flush static key
  is only accessible to KVM and not to other kernel modules. (PeterZ)
  (Borisov earlier objected to exporting the static key to all modules, but
  now the static key is only exported to KVM. I guess that resolves the
  concern.)
- Avoid an explicit call to vmscape_mitigation_enabled() and instead use
  static_call_query() in VMexit hot path. (Sean)
- Drop vmscape_mitigation_enabled(), as it is no longer needed.
- Rebased to v7.0

v9: https://lore.kernel.org/r/20260402-vmscape-bhb-v9-0-94d16bc29774@linux.intel.com
- Use global variables for BHB loop counters instead of ALTERNATIVE-based
  approach. (Dave & others)
- Use 32-bit registers (%eax/%ecx) for loop counters, loaded via movzbl
  from 8-bit globals. 8-bit registers (e.g. %ah in the inner loop) caused
  performance regression on certain CPUs due to partial-register stalls. (David Laight)
- Let BPF save/restore %rax/%rcx as in the original implementation, since
  it is the only caller that needs these registers preserved across the
  BHB clearing sequence.
- Drop Reviewed-by from patch 2/10 as the implementation changed significantly.
- Apply Tested-by from Jon Kohler to the series (except patch 2/10).
- Fix commit message grammar. (Borislav)
- Rebased to v7.0-rc6.

v8: https://lore.kernel.org/r/20260324-vmscape-bhb-v8-0-68bb524b3ab9@linux.intel.com
- Use helper in KVM to convey the mitigation status. (PeterZ/Borisov)
- Fix the documentation for default vmscape mitigation. (BPF bot)
- Remove the stray lines in bug.c (BPF bot).
- Updated commit messages and comments.
- Rebased to v7.0-rc5.

v7: https://lore.kernel.org/r/20260319-vmscape-bhb-v7-0-b76a777a98af@linux.intel.com
- s/This allows/Allow/ and s/This does adds/This adds/ in patch 1/10 commit
  message (Borislav).
- Minimize register usage in BHB clearing seq. (David Laight)
  - Instead of separate ecx/eax counters, use al/ah.
  - Adjust the alignment of RET due to register size change.
  - save/restore rax in the seq itself.
  - Remove the save/restore of rax/rcx for BPF callers.
- Rename clear_bhb_loop() to clear_bhb_loop_nofence() to make it
  obvious that the LFENCE is not part of the sequence (Borislav).
- Fix Kconfig: s/select/depends on/ HAVE_STATIC_CALL (PeterZ).
- Rebased to v7.0-rc4.

v6: https://lore.kernel.org/r/20251201-vmscape-bhb-v6-0-d610dd515714@linux.intel.com
- Remove semicolon at the end of asm in ALTERNATIVE (Uros).
- Fix build warning in vmscape_select_mitigation() (LKP).
- Rebased to v6.18.

v5: https://lore.kernel.org/r/20251126-vmscape-bhb-v5-2-02d66e423b00@linux.intel.com
- For BHI seq, limit runtime-patching to loop counts only (Dave).
  Dropped 2 patches that moved the BHB seq to a macro.
- Remove redundant switch cases in vmscape_select_mitigation() (Nikolay).
- Improve commit message (Nikolay).
- Collected tags.

v4: https://lore.kernel.org/r/20251119-vmscape-bhb-v4-0-1adad4e69ddc@linux.intel.com
- Move LFENCE to the callsite, out of clear_bhb_loop(). (Dave)
- Make clear_bhb_loop() work for larger BHB. (Dave)
  This now uses hardware enumeration to determine the BHB size to clear.
- Use write_ibpb() instead of indirect_branch_prediction_barrier() when
  IBPB is known to be available. (Dave)
- Use static_call() to simplify mitigation at exit-to-userspace. (Dave)
- Refactor vmscape_select_mitigation(). (Dave)
- Fix vmscape=on which was wrongly behaving as AUTO. (Dave)
- Split the patches. (Dave)
  - Patch 1-4 prepares for making the sequence flexible for VMSCAPE use.
  - Patch 5 trivial rename of variable.
  - Patch 6-8 prepares for deploying BHB mitigation for VMSCAPE.
  - Patch 9 deploys the mitigation.
  - Patch 10-11 fixes ON Vs AUTO mode.

v3: https://lore.kernel.org/r/20251027-vmscape-bhb-v3-0-5793c2534e93@linux.intel.com
- s/x86_pred_flush_pending/x86_predictor_flush_exit_to_user/ (Sean).
- Removed IBPB & BHB-clear mutual exclusion at exit-to-userspace.
- Collected tags.

v2: https://lore.kernel.org/r/20251015-vmscape-bhb-v2-0-91cbdd9c3a96@linux.intel.com
- Added check for IBPB feature in vmscape_select_mitigation(). (David)
- s/vmscape=auto/vmscape=on/ (David)
- Added patch to remove LFENCE from VMSCAPE BHB-clear sequence.
- Rebased to v6.18-rc1.

v1: https://lore.kernel.org/r/20250924-vmscape-bhb-v1-0-da51f0e1934d@linux.intel.com

Hi All,

These patches aim to improve the performance of a recent mitigation for
VMSCAPE[1] vulnerability. This improvement is relevant for BHI variant of
VMSCAPE that affect Alder Lake and newer processors.

The current mitigation approach uses IBPB on kvm-exit-to-userspace for all
affected range of CPUs. This is an overkill for CPUs that are only affected
by the BHI variant. On such CPUs clearing the branch history is sufficient
for VMSCAPE, and also more apt as the underlying issue is due to poisoned
branch history.

Below is the iPerf data for transfer between guest and host, comparing IBPB
and BHB-clear mitigation. BHB-clear shows performance improvement over IBPB
in most cases.

Platform: Emerald Rapids
Baseline: vmscape=off
Target: IBPB at VMexit-to-userspace Vs the new BHB-clear at
	VMexit-to-userspace mitigation (both compared against baseline).

(pN = N parallel connections)

| iPerf user-net | IBPB    | BHB Clear |
|----------------|---------|-----------|
| UDP 1-vCPU_p1  | -12.5%  |   1.3%    |
| TCP 1-vCPU_p1  | -10.4%  |  -1.5%    |
| TCP 1-vCPU_p1  | -7.5%   |  -3.0%    |
| UDP 4-vCPU_p16 | -3.7%   |  -3.7%    |
| TCP 4-vCPU_p4  | -2.9%   |  -1.4%    |
| UDP 4-vCPU_p4  | -0.6%   |   0.0%    |
| TCP 4-vCPU_p4  |  3.5%   |   0.0%    |

| iPerf bridge-net | IBPB    | BHB Clear |
|------------------|---------|-----------|
| UDP 1-vCPU_p1    | -9.4%   |  -0.4%    |
| TCP 1-vCPU_p1    | -3.9%   |  -0.5%    |
| UDP 4-vCPU_p16   | -2.2%   |  -3.8%    |
| TCP 4-vCPU_p4    | -1.0%   |  -1.0%    |
| TCP 4-vCPU_p4    |  0.5%   |   0.5%    |
| UDP 4-vCPU_p4    |  0.0%   |   0.9%    |
| TCP 1-vCPU_p1    |  0.0%   |   0.9%    |

| iPerf vhost-net | IBPB    | BHB Clear |
|-----------------|---------|-----------|
| UDP 1-vCPU_p1   | -4.3%   |   1.0%    |
| TCP 1-vCPU_p1   | -3.8%   |  -0.5%    |
| TCP 1-vCPU_p1   | -2.7%   |  -0.7%    |
| UDP 4-vCPU_p16  | -0.7%   |  -2.2%    |
| TCP 4-vCPU_p4   | -0.4%   |   0.8%    |
| UDP 4-vCPU_p4   |  0.4%   |  -0.7%    |
| TCP 4-vCPU_p4   |  0.0%   |   0.6%    |

[1] https://comsec.ethz.ch/research/microarch/vmscape-exposing-and-exploiting-incomplete-branch-predictor-isolation-in-cloud-environments/

---
Pawan Gupta (12):
      x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
      x86/bhi: Make clear_bhb_loop() effective on newer CPUs
      x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence()
      x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
      x86/vmscape: Move mitigation selection to a switch()
      x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier()
      static_call: Define EXPORT_STATIC_CALL_FOR_MODULES()
      kvm: Define EXPORT_STATIC_CALL_FOR_KVM()
      x86/vmscape: Use static_call() for predictor flush
      x86/vmscape: Deploy BHB clearing mitigation
      x86/vmscape: Resolve conflict between attack-vectors and vmscape=force
      x86/vmscape: Add cmdline vmscape=on to override attack vector controls

 Documentation/admin-guide/hw-vuln/vmscape.rst   | 15 ++++-
 Documentation/admin-guide/kernel-parameters.txt |  6 +-
 arch/x86/Kconfig                                |  1 +
 arch/x86/entry/entry_64.S                       | 21 ++++---
 arch/x86/include/asm/cpufeatures.h              |  2 +-
 arch/x86/include/asm/entry-common.h             | 13 ++--
 arch/x86/include/asm/kvm_types.h                |  1 +
 arch/x86/include/asm/nospec-branch.h            | 15 +++--
 arch/x86/kernel/cpu/bugs.c                      | 84 +++++++++++++++++++++----
 arch/x86/kvm/x86.c                              |  4 +-
 arch/x86/net/bpf_jit_comp.c                     |  4 +-
 include/linux/kvm_types.h                       | 10 ++-
 include/linux/static_call.h                     |  8 +++
 13 files changed, 147 insertions(+), 37 deletions(-)
---
base-commit: 028ef9c96e96197026887c0f092424679298aae8
change-id: 20250916-vmscape-bhb-d7d469977f2f

Best regards,
--  
Thanks,
Pawan



^ permalink raw reply

* Re: [PATCH net v2 13/15] drivers: net: 8390: pcnet: Remove this driver
From: Dominik Brodowski @ 2026-04-23  5:49 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Geert Uytterhoeven, Michael Fritscher, Byron Stanoszek,
	Daniel Palmer, linux-kernel, netdev, linux-doc
In-Reply-To: <20260422-v7-0-0-net-next-driver-removal-v1-v2-13-08a5b59784d5@lunn.ch>

On Wed, Apr 22, 2026 at 01:01:56PM -0500, Andrew Lunn wrote:
> The pcnet was written by David A. Hindsh in 1999. It is an PCMCIA
> device, so unlikely to be used with modern kernels.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/ethernet/8390/Kconfig    |   11 -
>  drivers/net/ethernet/8390/Makefile   |    1 -
>  drivers/net/ethernet/8390/pcnet_cs.c | 1717 ----------------------------------

NACK for this one. This is the most common PCMCIA networking device, so
should be removed only once the rest of the PCMCIA subsystem is removed.
For the other removals of PCMCIA device drivers in this series:

	Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>

Thanks,
	Dominik

^ permalink raw reply

* Re: [PATCH 3/4] drm/drm_ras: Add DRM RAS netlink error event notification
From: Tauro, Riana @ 2026-04-23  5:52 UTC (permalink / raw)
  To: Raag Jadav, aravind.iddamsetty, Jakub Kicinski, rodrigo.vivi
  Cc: dri-devel, netdev, anshuman.gupta, joonas.lahtinen, simona.vetter,
	airlied, pratik.bari, joshua.santosh.ranjan,
	ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
	anvesh.bakwad, maarten.lankhorst, Zack McKevitt, Lijo Lazar,
	Hawking Zhang, David S. Miller, Paolo Abeni, Eric Dumazet,
	intel-xe
In-Reply-To: <69c061b4-eb40-4e2d-aeaa-5dfa47b2db8c@intel.com>

Hi Jakub

We had some questions regarding events in netlink

1) According to netlink spec, "Events are considered less idiomatic for 
netlink and notifications should be preferred."
Since we currently don't have a response that matches a get operation to 
use notify. Would using an event be acceptable?

2) Is there a way to check if there are subscribers to a group before 
creating the event message?
    Currently the subscriber check happens in 
netlink_broadcast_filtered, but a reviewer suggested we could
    optimize by skipping message creation if there are no subscribers.

Thanks
Riana

On 4/10/2026 11:50 AM, Tauro, Riana wrote:
>
> On 4/9/2026 11:05 AM, Raag Jadav wrote:
>> On Wed, Apr 08, 2026 at 07:59:33PM +0530, Tauro, Riana wrote:
>>> On 3/25/2026 7:01 PM, Raag Jadav wrote:
>>>> On Wed, Mar 11, 2026 at 03:59:17PM +0530, Riana Tauro wrote:
>> ...
>>
>>>>> +Example: Listen to error events
>>>>> +
>>>>> +.. code-block:: bash
>>>>> +
>>>>> +    sudo ynl --family drm_ras --subscribe error-notify
>>>>> +    {'msg': {'error-id': 1, 'node-id': 1}, 'name': 'error-event'}
>>>> Can we also have error-name and node-name? I'd be pulling my hair off
>>>> if I need to remember all the ids.
>>> Yeah makes sense. We can add the node_name, error_name.
>>> Adding device_name would also be more useful in the event.
>>>
>>> @Rodrigo/@aravind thoughts?
>>>
> I tried adding all parameters, but the event response seems overloaded.
> I think node-name, error-name is not necessary since this will be 
> mostly used
>
> by tools and scripts that get the nodes and error-ids prior to 
> subscribing.
>
> Let me know your thoughts
>
> $ sudo ./tools/net/ynl/pyynl/cli.py --family drm_ras --subscribe 
> error-notify
>
> {'msg': {'device-name': '0000:03:00.0', 'error-id': 1, 'error-name': 
> 'core-compute', 'node-id': 3, 'node-name': 'uncorrectable-errors'},
>
> 'name': 'error-event'}
>
> {'msg': {'device-name': '0000:04:00.0', 'error-id': 1, 'error-name': 
> 'core-compute', 'node-id': 1, 'node-name': 'uncorrectable-errors'},
>
> 'name': 'error-event'}
>
> Thanks
> Riana
>
>>>
>>>> On that note, I think it'll be good to have them as part of request
>>>> attributes as an alternative to ids (also for existing commands) but
>>>> that can done as a follow up.
>>>>
>>> We cannot use names as alternative because it won't work for 
>>> multiple cards.
>>> example in xe: Suppose there are 2 cards and each has 2 nodes. We 
>>> cannot
>>> query using node_name+error_name.
>>> Also most of the netlink implementations use id's as unique 
>>> identifiers.
>>>
>>> $ sudo ./cli.py --family drm_ras  --dump list-nodes
>>> [{'device-name': 'bdf_1', 'node-id': 0, 'node-name': 
>>> 'correctable-errors',
>>> 'node-type': 'error-counter'},
>>>   {'device-name': 'bdf_1, 'node-id': 1, 'node-name': 
>>> 'uncorrectable-errors',
>>> 'node-type': 'error-counter'},
>>>   {'device-name': 'bdf_2', 'node-id': 2, 'node-name': 
>>> 'correctable-errors',
>>> 'node-type': 'error-counter'},
>>>   {'device-name': 'bdf_2', 'node-id': 3, 'node-name': 
>>> 'uncorrectable-errors',
>>> 'node-type': 'error-counter'}]
>> This means they don't persist the user needs to figures out all the 
>> ids before
>> anything can happen. In device node world we have 
>> /dev/dri/by-path/<bdf> which
>> makes it much easier.
>>
>> Also, I'm not much informed about the history and it's still unclear 
>> to me what
>> problem did netlink solve here that cannot be solved by anything 
>> else? But we're
>> too late for that discussion, and again, not my call.
>>
>>>> Also, what if I have multiple devices with multiple nodes. Do they 
>>>> need
>>>> separate subscription?
>>>>
>>> No, we subscribe only to the group not the nodes. In this case the 
>>> group is
>>> 'error-notify'
>>>
>>> $ sudo ./cli.py --family drm_ras --subscribe error-notify
>>> {'msg': {'error-id': 1, 'node-id': 1}, 'name': 'error-event'}
>>> {'msg': {'error-id': 1, 'node-id': 3}, 'name': 'error-event'}
>> Hm, perhaps I need to spend some time wrapping my head around the new 
>> concept.
>> Let's catch up sometime this week.
>>
>> Raag

^ permalink raw reply

* Re: [PATCH] smb: smbdirect: move fs/smb/common/smbdirect/ to fs/smb/smbdirect/
From: Christoph Hellwig @ 2026-04-23  5:52 UTC (permalink / raw)
  To: Stefan Metzmacher
  Cc: Christoph Hellwig, linux-cifs, linux-rdma, netdev,
	samba-technical, Tom Talpey, Steve French, Linus Torvalds,
	Namjae Jeon, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	ceph-devel, Jeff Layton, linux-nfs
In-Reply-To: <9cb0901c-18c5-4858-941c-3b37ee112af9@samba.org>

On Wed, Apr 22, 2026 at 10:16:41AM +0200, Stefan Metzmacher wrote:
> > Why is this not in net/smbdirect/ or driver/infiniband/ulp/smdirect?
> 
> Yes, I also thought about net/smbdirect.
> 
> As IPPROTO_SMBDIRECT or PF_SMBDIRECT will be the next step,
> see the open discussion here:
> https://lore.kernel.org/linux-cifs/cover.1775571957.git.metze@samba.org/
> (I'll follow with that discussion soon)

Seems like it is the right fit then.

> I was just unsure about the consequences, e.g. would
> the maintainer/pull request flow have to change in that case?
> Or would Steve be able to take the changes via his trees?
> Any I also didn't want to offend anybody, so I just took
> what Linus proposed.

You might want to ask the sunrpc or ceph maintainers as they have a
similar split.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox