Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v2] usbcore: Add quirk for 255-bytes initial config read
From: Alan Stern @ 2026-06-23 20:24 UTC (permalink / raw)
  To: Nikhil Solanke
  Cc: linux-usb, gregkh, linux-kernel, michal.pecio, stable, corbet,
	skhan, linux-doc
In-Reply-To: <20260623161035.5792-1-nikhilsolanke5@gmail.com>

This is looking better.  A few more things to mention below, but 
otherwise okay.

On Tue, Jun 23, 2026 at 09:40:35PM +0530, Nikhil Solanke wrote:
> Certain third-party USB game controllers exposing (or spoofing) an Xbox
> 360-compatible interface (VID:PID 045e:028e) fail to enumerate under Linux.
> The device disconnects from the bus without responding to the initial
> GET_DESCRIPTOR(CONFIGURATION) request, and the kernel logs 'unable to read
> config index 0 descriptor/start: -71'.
> 
> The device then falls back to a secondary Android HID mode (with a
> different VID:PID), losing XInput functionality including rumble support.
> The failure reproduces across multiple machines, host controller types, and
> kernel versions including current mainline and LTS. The device enumerates
> correctly and remains in XInput mode under Windows. Notably, the device
> enumerates correctly in Android mode when the same 9-byte request
> is issued for that mode's configuration descriptor, confirming the firmware
> bug is specific to the XInput mode.
> 
> usbmon traces from Linux and Wireshark/USBPcap traces from Windows are
> identical up to the point of failure, with no visible protocol-level
> difference explaining the divergence. The root cause was identified when
> Michal Pecio discovered via a QEMU bus-level capture that Windows does not
> use wLength=9 for the initial config descriptor request; it uses
> wLength=255. Alan Stern subsequently confirmed this with a bus
> analyzer on a different USB 2.0 device, and Michal verified the behavior
> goes back to Windows 95 OSR2.1.
> 
> So, add a new quirk flag USB_QUIRK_CONFIG_SIZE which causes
> usb_get_configuration() to issue a 255 byte sized configuration request
> instead of USB_DT_CONFIG_SIZE (9) for the initial
> GET_DESCRIPTOR(CONFIGURATION) request, mimicking long-standing Windows
> behavior.
> 
> Suggested-by: Alan Stern <stern@rowland.harvard.edu>
> Suggested-by: Michal Pecio <michal.pecio@gmail.com>
> Closes: https://lore.kernel.org/linux-usb/CAFgddh+JWdT4LLwMc5qjM8q_pBu-fRo2qADR5ovAKoGHWMQrRw@mail.gmail.com/
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@vger.kernel.org
> 
> Signed-off-by: Nikhil Solanke <nikhilsolanke5@gmail.com>

Normally people don't leave a blank line before their Signed-off-by: 
(unless it's the first tag to appear).  But I don't think it makes any 
difference, especially if the checkpatch.pl script doesn't object.

> ---
> Changes in v2:
> - Add Documentation
> - Naming changes
> - Refactored to have a better flow with existing code.
> 
>  .../admin-guide/kernel-parameters.txt         |  9 +++
>  drivers/usb/core/config.c                     | 61 ++++++++++++++-----
>  drivers/usb/core/hub.c                        |  6 +-
>  drivers/usb/core/quirks.c                     |  4 ++
>  include/linux/usb/quirks.h                    |  3 +
>  5 files changed, 67 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 97007f4f69d4..af4bf0ef2c7b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -8158,6 +8158,15 @@ Kernel parameters
>  				q = USB_QUIRK_FORCE_ONE_CONFIG (Device
>  					claims zero configurations,
>  					forcing to 1);
> +                r = USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE (Device
> +                    fails during initialization when asked for
> +                    9-bytes configuration desciptor request. Ask
> +                    for 255-bytes request instead to mirror
> +                    Windows' behavior. This quirk is originally
> +                    meant to fix some quirky gamepads that refuse
> +                    to connect in their XInput mode. But it can also
> +                    potentially fix issues with other USB devices
> +                    that work on Windows but not on Linux)
>  			Example: quirks=0781:5580:bk,0a5c:5834:gij

As Randy said, use tabs instead of spaces.  And this new entry should be 
aligned with all the preceding entries, and it should end with a ';' 
like they do.

I would leave out the two sentences of explanation, but that's more of a 
personal choice.  We'll see if Greg KH objects.

>  
>  	usbhid.mousepoll=
> diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c
> index 45e20c6d76c0..4fc3145404d6 100644
> --- a/drivers/usb/core/config.c
> +++ b/drivers/usb/core/config.c
> @@ -19,6 +19,9 @@
>  
>  #define USB_MAXCONFIG			8	/* Arbitrary limit */
>  
> +/* config req size if USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE is set */
> +#define USB_CONFIG_WINDOWS_REQ_SIZE	255
> +
>  static int find_next_descriptor(unsigned char *buffer, int size,
>      int dt1, int dt2, int *num_skipped)
>  {
> @@ -912,6 +915,13 @@ int usb_get_configuration(struct usb_device *dev)
>  	unsigned char *bigbuffer;
>  	struct usb_config_descriptor *desc;
>  	int result;
> +	/*
> +	 * Devices with quirky firmware will stall or reset when asked only for
> +	 * the configuration header. This variable decides which size to use in
> +	 * that case, if the quirk for that device was set.
> +	 */
> +	size_t usb_config_req_size = (dev->quirks & USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE)
> +		? USB_CONFIG_WINDOWS_REQ_SIZE : USB_DT_CONFIG_SIZE;

It's a little unusual to have a multiline comment in the middle of a 
bunch of variable definitions.  The alternative is to do the assignment 
later on and move the comment there.

>  
>  	if (ncfg > USB_MAXCONFIG) {
>  		dev_notice(ddev, "too many configurations: %d, "
> @@ -938,18 +948,27 @@ int usb_get_configuration(struct usb_device *dev)
>  	if (!dev->rawdescriptors)
>  		return -ENOMEM;
>  
> -	desc = kmalloc(USB_DT_CONFIG_SIZE, GFP_KERNEL);
> +	desc = kmalloc(usb_config_req_size, GFP_KERNEL);
> +
>  	if (!desc)
>  		return -ENOMEM;
>  
>  	for (cfgno = 0; cfgno < ncfg; cfgno++) {
> -		/* We grab just the first descriptor so we know how long
> -		 * the whole configuration is */
> +
> +		if (dev->quirks & USB_QUIRK_DELAY_INIT)
> +			msleep(200);

Moving this delay up here changes the behavior when the quirk flag isn't 
set.  While it agrees with the intention of the USB_QUIRK_DELAY_INIT 
flag, such a change should be mentioned in the patch description.

> +
> +		/*
> +		 * Grab just the first descriptor so we know how long the whole
> +		 * configuration is. In case of quirky firmware, try to grab the
> +		 * whole thing in one go by asking for a 255-bytes sized buffer
> +		 * mirroring Windows behavior.
> +		 */

This needs to be rewritten, as it is self-contradictory.  When the quirk 
flag is set we issue a 255-byte request to mimic the Windows behavior, 
and only when the flag isn't set do we grab just the first descriptor.

>  		result = usb_get_descriptor(dev, USB_DT_CONFIG, cfgno,
> -		    desc, USB_DT_CONFIG_SIZE);
> +						desc, usb_config_req_size);

Don't make extraneous changes to the existing indentation (or whitespace 
in general), here and below.

>  		if (result < 0) {
>  			dev_err(ddev, "unable to read config index %d "
> -			    "descriptor/%s: %d\n", cfgno, "start", result);
> +				"descriptor/%s: %d\n", cfgno, "start", result);

At the time this code was originally written, the policy for kernel code 
was to break lines before 80 columns.  Since then the policy has changed 
to avoid splitting long literal strings into pieces, even when that 
means exceeding 80 columns.  So your new string here should all go on 
one line.

>  			if (result != -EPIPE)
>  				goto err;
>  			dev_notice(ddev, "chopping to %d config(s)\n", cfgno);
> @@ -957,13 +976,25 @@ int usb_get_configuration(struct usb_device *dev)
>  			break;
>  		} else if (result < 4) {
>  			dev_err(ddev, "config index %d descriptor too short "
> -			    "(expected %i, got %i)\n", cfgno,
> -			    USB_DT_CONFIG_SIZE, result);
> +				"(asked for %zu, got %i, expected at least %i)\n",
> +				cfgno, usb_config_req_size, result, 4);
>  			result = -EINVAL;
>  			goto err;
>  		}
> +
>  		length = max_t(int, le16_to_cpu(desc->wTotalLength),
> -		    USB_DT_CONFIG_SIZE);
> +				USB_DT_CONFIG_SIZE);

This is another example of a change that has nothing to do with the 
purpose of the patch.

> +
> +		/*
> +		 * If the device returns the full length configuration
> +		 * descriptor, skip the second read. Otherwise, send a second

Strictly speaking, the configuration descriptor is only 9 bytes long.  
What you mean here is the entire configuration descriptor set.

> +		 * request asking for the full length.
> +		 */
> +		if (result >= le16_to_cpu(desc->wTotalLength)) {

Shouldn't this be: result >= length?  No point in repeating the 
le16_to_cpu calculation.

> +			bigbuffer = (unsigned char *) desc;
> +			desc = NULL;
> +			goto store_and_parse;
> +		}
>  
>  		/* Now that we know the length, get the whole thing */
>  		bigbuffer = kmalloc(length, GFP_KERNEL);
> @@ -972,23 +1003,25 @@ int usb_get_configuration(struct usb_device *dev)
>  			goto err;
>  		}
>  
> -		if (dev->quirks & USB_QUIRK_DELAY_INIT)
> -			msleep(200);
> -
>  		result = usb_get_descriptor(dev, USB_DT_CONFIG, cfgno,
> -		    bigbuffer, length);
> +						bigbuffer, length);
> +
>  		if (result < 0) {
>  			dev_err(ddev, "unable to read config index %d "
> -			    "descriptor/%s\n", cfgno, "all");
> +				"descriptor/%s\n", cfgno, "all");
>  			kfree(bigbuffer);
>  			goto err;
>  		}
> +

More examples of unnecessary whitespace changes.

>  		if (result < length) {
>  			dev_notice(ddev, "config index %d descriptor too short "
> -			    "(expected %i, got %i)\n", cfgno, length, result);
> +				"(asked for %i, got %i)\n",
> +				cfgno, length, result);
>  			length = result;
>  		}
>  
> +store_and_parse:
> +		krealloc(bigbuffer, length, GFP_KERNEL);
>  		dev->rawdescriptors[cfgno] = bigbuffer;
>  
>  		result = usb_parse_configuration(dev, cfgno,
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 24960ba9caa9..9acd278666fc 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -2527,8 +2527,10 @@ static int usb_enumerate_device(struct usb_device *udev)
>  		err = usb_get_configuration(udev);
>  		if (err < 0) {
>  			if (err != -ENODEV)
> -				dev_err(&udev->dev, "can't read configurations, error %d\n",
> -						err);
> +				dev_err(&udev->dev, "can't read configurations, "
> +					"for device %04x:%04x, error %d\n",

Like above, this string should all be on one line.

Alan Stern

> +					le16_to_cpu(udev->descriptor.idVendor),
> +					le16_to_cpu(udev->descriptor.idProduct), err);
>  			return err;
>  		}
>  	}
> diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
> index 87810eff974e..df670b0b66fe 100644
> --- a/drivers/usb/core/quirks.c
> +++ b/drivers/usb/core/quirks.c
> @@ -142,6 +142,10 @@ static int quirks_param_set(const char *value, const struct kernel_param *kp)
>  				break;
>  			case 'q':
>  				flags |= USB_QUIRK_FORCE_ONE_CONFIG;
> +				break;
> +			case 'r':
> +				flags |= USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE;
> +				break;
>  			/* Ignore unrecognized flag characters */
>  			}
>  		}
> diff --git a/include/linux/usb/quirks.h b/include/linux/usb/quirks.h
> index b3cc7beab4a3..a4043b33c2c2 100644
> --- a/include/linux/usb/quirks.h
> +++ b/include/linux/usb/quirks.h
> @@ -81,4 +81,7 @@
>  /* Device claims zero configurations, forcing to 1 */
>  #define USB_QUIRK_FORCE_ONE_CONFIG		BIT(18)
>  
> +/* Use a 255 bytes config descriptor request mirroring windows behavior */
> +#define USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE	BIT(19)
> +
>  #endif /* __LINUX_USB_QUIRKS_H */
> -- 
> 2.54.0
> 

^ permalink raw reply

* Re: [PATCH] kernel-doc: xforms: support __SYSFS_FUNCTION_ALTERNATIVE()
From: Thomas Weißschuh @ 2026-06-23 20:10 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, Rafael J. Wysocki, Danilo Krummrich, driver-core,
	Greg Kroah-Hartman, Jonathan Corbet, Shuah Khan, linux-doc,
	Mauro Carvalho Chehab, stable
In-Reply-To: <20260623190006.406571-1-rdunlap@infradead.org>

On 2026-06-23 12:00:04-0700, Randy Dunlap wrote:
> Add support for __SYSFS_FUNCTION_ALTERNATIVE() to create a union of its
> members (as though CONFIG_CFI is unset).
> 
> Fixes these docs build warnings:
> 
> WARNING: include/linux/device.h:117 Invalid param: __SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf)
> WARNING: include/linux/device.h:117 struct member '__SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*show' not described in 'device_attribute'
> WARNING: include/linux/device.h:117 Invalid param: __SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*store)(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
> WARNING: include/linux/device.h:117 struct member '__SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*store' not described in 'device_attribute'
> 
> Fixes: 434506b86a6c ("driver core: Allow the constification of device attributes")
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>

Thanks for the fix. I would have expected 0day to catch this :-(

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>

> ---
> Cc: Thomas Weißschuh <linux@weissschuh.net>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: driver-core@lists.linux.dev
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <skhan@linuxfoundation.org>
> Cc: linux-doc@vger.kernel.org
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: stable@vger.kernel.org
> 
>  tools/lib/python/kdoc/xforms_lists.py |    1 +
>  1 file changed, 1 insertion(+)
> 
> --- linux-next-20260619.orig/tools/lib/python/kdoc/xforms_lists.py
> +++ linux-next-20260619/tools/lib/python/kdoc/xforms_lists.py
> @@ -49,6 +49,7 @@ class CTransforms:
>          (CMatch("DEFINE_DMA_UNMAP_ADDR"), r"dma_addr_t \1"),
>          (CMatch("DEFINE_DMA_UNMAP_LEN"), r"__u32 \1"),
>          (CMatch("VIRTIO_DECLARE_FEATURES"), r"union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }"),
> +        (CMatch("__SYSFS_FUNCTION_ALTERNATIVE"), r"union { \1+ }"),
>          (CMatch("__attribute__"), ""),
>  
>          #

^ permalink raw reply

* Re: [RFC PATCH v2 06/10] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: tarunsahu @ 2026-06-23 20:06 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen,
	linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <2vxz8q85mdyh.fsf@kernel.org>

Pratyush Yadav <pratyush@kernel.org> writes:

> On Tue, Jun 23 2026, tarunsahu@google.com wrote:
>
>> Ackerley Tng <ackerleytng@google.com> writes:
>>
>>> Tarun Sahu <tarunsahu@google.com> writes:
>>>
>>>>  static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
>>>>  			       loff_t len)
>>>>  {
>>>> +	struct inode *inode = file_inode(file);
>>>>  	int ret;
>>>> +	int idx;
>>>>
>>>> -	if (!(mode & FALLOC_FL_KEEP_SIZE))
>>>> -		return -EOPNOTSUPP;
>>>> +	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
>>>> +	if (kvm_gmem_is_frozen(inode)) {
>>>> +		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
>>>> +		return -EPERM;
>>>> +	}
>>>
>>> fallocate may eventually go to kvm_gmem_get_folio(), so that would check
>>> kvm_gmem_is_frozen() twice. Is this meant to catch the punch hole case?
>
> Yeah, I reckon you can get away with doing this check only in
> kvm_gmem_get_folio(). Normally you'd like to fail early, but as of now I
> don't see much of a problem. If you drop the check here and fail in
> kvm_gmem_get_folio() you'd end up taking and releasing the mapping
> invalidate_lock, but this isn't a fast path anyway so I don't think it
> should matter much.

No, Don't agree.
kvm_gmem_get_folios already have the is_frozen check. which blocks the
kvm_gmem_allocate. But not kvm_gmem_punch_hole. Your argument is correct
for kvm_gmem_allocate only. So is_frozen check in fallocate is to
block the punch hole as well. What ackerley said is correct.

>
> I think either way can work just as fine...
>
>>>
>>>>
>>>> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
>>>> -		return -EOPNOTSUPP;
>>>> +	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>>>> +		ret = -EOPNOTSUPP;
>>>> +		goto out;
>>>> +	}
>>>>
>>>> -	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
>>>> -		return -EINVAL;
>>>> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
>>>> +		ret = -EOPNOTSUPP;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
>>>> +		ret = -EINVAL;
>>>> +		goto out;
>>>> +	}
>>>
>>> There's some reordering here. Why not let the validation happen like
>>> before, then check kvm_gmem_is_frozen()?
>
> There is no reordering, if I am reading the diff correctly. The diff is
> somewhat misleading. The kvm_gmem_is_frozen() call is added at the top
> of the function, and then all the later checks are in the same place but
> get a goto out (and hence a full body to the if block). So the diff
> reads like reordering, but there is none.
>

That is right. I thought Ackerley was asking why not add the check after
the all the if condition and before the kvm_gmem_punch_hole/allocate calls.

> It would be very neat if scru had a cleanup.h style scope-based locking
> function, but on a quick glance I can't see one.
>
>>
>> To align with design. "stop the fallocate call if inode is frozen, No
>> need to go further". I dont have strict opinion on this. I am fine with
>> taking it across punch hole as well to make it more fine grained. But it
>> will no longer claims stop the fallocate call (allocation one is stopped
>> in separate path: fault path) , though functionally it does the same
>> thing.
>>
>> WDYT?
>>
>> ~Tarun
>
> -- 
> Regards,
> Pratyush Yadav

^ permalink raw reply

* Re: [RFC PATCH v2 10/10] selftests: kvm: Add guest_memfd_preservation_test
From: tarunsahu @ 2026-06-23 19:50 UTC (permalink / raw)
  To: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen
  Cc: linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <CAEvNRgE2GZNiDg_g6SP_H9CDsDDAnpN7KTRWJEK18wxpTZFJZw@mail.gmail.com>

Ackerley Tng <ackerleytng@google.com> writes:

> Tarun Sahu <tarunsahu@google.com> writes:
>
>> Add a new KVM selftest `guest_memfd_preservation_test` to verify that
>> guest memory backed by guest_memfd is preserved properly.
>>
>
> Don't think using backticks in commit messages is a common practice but
> I might be wrong here.
>
>> The test leverages the Live Update Orchestrator (LUO) infrastructure
>> to validate that memory folios and configuration layouts are
>> successfully saved and then restored during kernel live updates,
>> preventing any memory loss for the guest.
>>
>> Here, I have used the kvm selftests framework by creating a new
>> vm and mapping two memory slots to it. One is the code that is executed
>> inside the vm and other is the guest_memfd whose memory is being
>> written by the guest code.
>>
>
> Don't think commit messages with "I" are common either


Will take care of commit message everywhere as per the guidelines,
Sorry about that.
>
>> In Phase 1: Once data is written the vm exits and wait for the user
>> to trigger the kexec.
>>
>> In Phase 2: A new vm is created with retrieved kvm and again two
>> memory slots are assigned. Once for guest code, and another is for
>> retrieved guest_memfd where guest_memfd memory is verified by the
>> executed guest code. If verification succeeds, The test passes.
>>
>>
>> [...snip...]
>>
>> +#define SESSION_NAME "gmem_vm_preservation_session"
>> +#define VM_TOKEN 0x1001
>> +#define GMEM_TOKEN 0x1002
>> +
>> +#define GMEM_SIZE (16ULL * 1024 * 1024)
>> +#define DATA_SIZE (5ULL * 1024 * 1024)
>> +
>> +static size_t page_size;
>> +
>> +/* Deterministic byte pattern generation based on offset */
>> +static inline uint8_t get_pattern_byte(size_t offset)
>> +{
>> +	return (uint8_t)(offset ^ 0x5A);
>> +}
>> +
>> +static void guest_code_phase1(uint64_t gpa, uint64_t size, uint64_t data_size)
>> +{
>> +	uint8_t *mem = (uint8_t *)gpa;
>> +	size_t i;
>> +
>> +	for (i = 0; i < data_size; i++)
>> +		mem[i] = get_pattern_byte(i);
>> +
>> +	GUEST_DONE();
>> +}
>> +
>> +static void guest_code_phase2(uint64_t gpa, uint64_t size, uint64_t data_size)
>> +{
>> +	uint8_t *mem = (uint8_t *)gpa;
>> +	size_t i;
>> +
>> +	for (i = 0; i < data_size; i++) {
>> +		uint8_t val = get_pattern_byte(i);
>> +
>> +		__GUEST_ASSERT(mem[i] == val,
>> +			       "Data mismatch at offset %lu! Expected 0x%x, got 0x%x",
>> +			       i, val, mem[i]);
>> +	}
>> +
>> +	GUEST_DONE();
>> +}
>> +
>> +static void do_phase1(void)
>> +{
>> +	uint64_t flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
>
> Is there a reason to set GUEST_MEMFD_FLAG_MMAP? We're not really
> accessing that memory from the host in this test.

Right, We can skip it.

>
>> +	int gmem_fd, dev_luo_fd, session_fd, ret;
>> +	const uint64_t gpa = SZ_4G;
>> +	struct kvm_vcpu *vcpu;
>> +	const int slot = 1;
>> +	struct kvm_vm *vm;
>> +
>> +	vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1,
>> +					guest_code_phase1);
>> +	gmem_fd = vm_create_guest_memfd(vm, GMEM_SIZE, flags);
>> +	vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE, NULL,
>> +				 gmem_fd, 0);
>> +
>> +	for (size_t i = 0; i < GMEM_SIZE; i += page_size)
>> +		virt_pg_map(vm, gpa + i, gpa + i);
>> +
>> +	vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
>
> If GMEM_SIZE and DATA_SIZE are static I think we don't have to set those
> as vcpu_args_set(), they can be used as macros from within the guest.

Yes, There are multiple places we can skip it, Like passing them as the
argument in the guest_code_phase1/2. Will update it.

>
>> +
>> +	vcpu_run(vcpu);
>> +	TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
>> +
>> +	dev_luo_fd = luo_open_device();
>> +	TEST_ASSERT(dev_luo_fd >= 0, "Failed to open /dev/liveupdate");
>> +
>> +	session_fd = luo_create_session(dev_luo_fd, SESSION_NAME);
>> +	TEST_ASSERT(session_fd >= 0, "Failed to create LUO session");
>> +
>> +	ret = luo_session_preserve_fd(session_fd, vm->fd, VM_TOKEN);
>> +	TEST_ASSERT(ret == 0, "Failed to preserve VM file descriptor");
>> +
>> +	ret = luo_session_preserve_fd(session_fd, gmem_fd, GMEM_TOKEN);
>> +	TEST_ASSERT(ret == 0, "Failed to preserve guest_memfd file descriptor");
>> +
>
> Thanks for showing how this works :)

Glad to know. it helped.
. .
 v


>
>> +	printf("\n============================================================\n");
>> +	printf("Phase 1 Complete Successfully!\n");
>> +	printf("VM file and guest_memfd file have been preserved via LUO.\n");
>> +	printf("Tokens: VM_TOKEN=0x%x, GMEM_TOKEN=0x%x\n", VM_TOKEN, GMEM_TOKEN);
>> +	printf("Machine Size: %llu MB, Data Size: %llu MB\n", GMEM_SIZE / SZ_1M,
>> +				 DATA_SIZE / SZ_1M);
>> +	printf("------------------------------------------------------------\n");
>> +
>> +	daemonize_and_wait();
>> +}
>> +
>> +static struct kvm_vm *vm_create_from_fd(int resurrected_vm_fd,
>> +					struct vm_shape shape)
>> +{
>> +	struct kvm_vm *vm;
>> +
>> +	vm = calloc(1, sizeof(*vm));
>> +	TEST_ASSERT(vm != NULL, "Insufficient Memory");
>> +
>> +	vm_init_fields(vm, shape);
>
> What would happen if the shape was changed between preserving and
> restoring?
Shape must be consistent across kexec. It may break. struct vm_shape
includes two fields
1. mode: 4k, 16k or 64k mapping, how many page table bit etc.
         which are used to setup mapping with guest code, and memory.
         guest_memfd does not support mapping other than PAGE_SIZE.
2. type: This is userspace side of the vm type. And it will not have
         information about the preserved vm_type via vm_file (kernel).
         If userspace changes this on stage2, some action might not
         work. for example, if userspace is expecting vm_type to be
         COCO VM and preserved vm_type is shared, this will have
         conflict when userspace will try to perform some operation that
         only works with COCO VM.


My Question is: Why should someone change the shape, unless they are
                planning to make the vm fail?

>
>> +
>> +	vm->kvm_fd = open_path_or_exit(KVM_DEV_PATH, O_RDWR);
>> +	vm->fd = resurrected_vm_fd;
>> +
>> +	if (kvm_has_cap(KVM_CAP_BINARY_STATS_FD))
>> +		vm->stats.fd = vm_get_stats_fd(vm);
>> +	else
>> +		vm->stats.fd = -1;
>> +
>> +	vm_init_memory_properties(vm);
>> +
>> +	return vm;
>> +}
>> +
>
> I think vm_create_from_fd() could be introduced in an earlier patch to
> reduce the amount of new code in this patch. Also, I think it could
> perhaps be moved to kvm_util.c assuming that other test will use it too.
>
Makes sense, Will take of it next revision v4.


>> +static void do_phase2(void)
>> +{
>> +	int retrieved_vm_fd, retrieved_gmem_fd, dev_luo_fd, session_fd;
>> +	struct vm_shape shape = VM_SHAPE_DEFAULT;
>> +	const uint64_t gpa = SZ_4G;
>> +	struct kvm_vcpu *vcpu;
>> +	const int slot = 1;
>> +	struct kvm_vm *vm;
>> +
>> +	dev_luo_fd = luo_open_device();
>> +	TEST_ASSERT(dev_luo_fd >= 0, "Failed to open /dev/liveupdate");
>> +
>> +	session_fd = luo_retrieve_session(dev_luo_fd, SESSION_NAME);
>> +	TEST_ASSERT(session_fd >= 0, "Failed to retrieve LUO session");
>> +
>> +	retrieved_vm_fd = luo_session_retrieve_fd(session_fd, VM_TOKEN);
>> +	TEST_ASSERT(retrieved_vm_fd >= 0, "Failed to retrieve VM file descriptor");
>> +
>> +	retrieved_gmem_fd = luo_session_retrieve_fd(session_fd, GMEM_TOKEN);
>> +	TEST_ASSERT(retrieved_gmem_fd >= 0, "Failed to retrieve guest_memfd file descriptor");
>> +
>> +	vm = vm_create_from_fd(retrieved_vm_fd, shape);
>> +
>> +	u64 nr_pages = 2048; /* 8MB is plenty for slot0 pages */
>> +
>
> I don't think declarations are usually mixed with regular code.
Okay, will udpate that.

>
>> +	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0);
>> +	kvm_vm_elf_load(vm, program_invocation_name);
>> +
>> +	for (int i = 0; i < NR_MEM_REGIONS; i++)
>> +		vm->memslots[i] = 0;
>> +
>> +	struct userspace_mem_region *slot0 = memslot2region(vm, 0);
>> +
>> +	ucall_init(vm, slot0->region.guest_phys_addr + slot0->region.memory_size);
>> +
>> +	vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE, NULL,
>> +				   retrieved_gmem_fd, 0);
>> +
>> +	for (size_t i = 0; i < GMEM_SIZE; i += page_size)
>> +		virt_pg_map(vm, gpa + i, gpa + i);
>> +
>> +	vcpu = vm_vcpu_add(vm, 0, guest_code_phase2);
>> +	kvm_arch_vm_finalize_vcpus(vm);
>> +
>> +	vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
>> +
>> +	printf("Resuming / Running VM in Phase 2...\n");
>> +	vcpu_run(vcpu);
>> +	TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
>> +
>> +	printf("\nSUCCESS: Phase 2 Complete! All 5MB complex data verified intact!\n");
>> +
>> +	luo_session_finish(session_fd);
>> +	close(session_fd);
>> +	close(dev_luo_fd);
>> +	/* This will also close the vm_fd */
>> +	kvm_vm_free(vm);
>> +	close(retrieved_gmem_fd);
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +	bool phase2 = false;
>> +
>> +	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
>> +	page_size = getpagesize();
>> +
>> +	for (int i = 1; i < argc; i++) {
>> +		if (strcmp(argv[i], "--phase2") == 0)
>> +			phase2 = true;
>> +	}
>> +
>
> Maybe use getopt() here?

In V3, it is update to use liveupdate library.
>
>> +	if (phase2)
>> +		do_phase2();
>> +	else
>> +		do_phase1();
>> +
>> +	return 0;
>> +}
>> --
>> 2.54.0.1032.g2f8565e1d1-goog
>
> I think we also need tests for trying to allocate while frozen, and
> conversion while frozen, and trying to preserve while preservation is
> not allowed.

Yes, We need those tests. For this series, I wanted to focus on design.
Now that we are aligned, next revision, I will send with more tests.

~Tarun

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Eric Biggers @ 2026-06-23 19:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour, Andy Lutomirski
In-Reply-To: <202606231216.14A774833@keescook>

On Tue, Jun 23, 2026 at 12:24:28PM -0700, Kees Cook wrote:
> On Mon, Jun 22, 2026 at 04:48:03PM -0700, Eric Biggers wrote:
> > AF_ALG is a frequent source of vulnerabilities and a maintenance
> > nightmare.  It exposes far more functionality to userspace than ever
> > should have been exposed, especially to unprivileged processes.  Recent
> > exploits have targeted kernel internal implementation details like
> > "authencesn" that have zero use case for userspace access.
> 
> I absolutely want to see this attack surface reduction.
> 
> > Add a sysctl /proc/sys/crypto/af_alg_restrict with meaning:
> > [...]
> > Note that the list may be tweaked in the future.  However, the common
> > use cases such as iwd and bluez are taken into account already.  I've
> > tested that iwd still works with the default value of 1.
> 
> I wince at this bit, though. This is a "security policy in the kernel"
> which we try to avoid, and it's could be done already in userspace with
> modprobe blacklist.
> 
> But, as you say, AF_ALG is deprecated. I understand that to mean that
> the alg list is only ever going to *shrink* in the future.
> 
> Using a sysctl means monolithic kernels are protected, but wouldn't
> those systems just compile AF_ALG out?
> 
> So, I guess, I would want a more clear rationale for why we do it this
> way instead of via modprobe blacklist. I see a few reasons, but they
> don't really convince me that we should ignore the "no security policy
> in the kernel" rule to do it this way.

As we saw when distros tried to mitigate copy.fail, a lot of distros
have CONFIG_CRYPTO_USER_API_* set to 'y', so algif_aead.ko couldn't be
blacklisted.  (Ironically because of FIPS 140, which is yet another
example of how FIPS 140 harms real-world security.)

But even when 'm', the module blacklist is just a binary choice for each
algorithm type: aead, skcipher, hash, and rng.  Loading algif_aead.ko
allows not just "ccm(aes)" that bluez needs, but also bizarre things
like "authencesn(hmac(sha256),cbc(aes))" that are used only in exploits.

And sure, userspace could theoretically gather the complete list of
algorithm modules (e.g. authencesn.ko) and blacklist them individually.
But no one does that, and many are built-in anyway -- and this time not
just because of FIPS.

So we need an allowlist at the algorithm level, not just the algorithm
type level.  Putting the allowlist in the kernel, taking into account
the real use cases like iwd and bluez, and having a simple tristate
sysctl similar to some of the existing ones, is the simplest and most
practical way to achieve this by default across Linux distros.

If we did something like delegate the algorithm allowlist to LSMs, I
think that in practice it's just going to be almost never used.

- Eric

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Eric Biggers @ 2026-06-23 19:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour
In-Reply-To: <CALCETrXPj0u=FZ=aFcZAHk3fFZa7rCuPEjx6cOMXmT3sdkC7SA@mail.gmail.com>

On Tue, Jun 23, 2026 at 12:12:24PM -0700, Andy Lutomirski wrote:
> On Mon, Jun 22, 2026 at 4:49 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > AF_ALG is a frequent source of vulnerabilities and a maintenance
> > nightmare.  It exposes far more functionality to userspace than ever
> > should have been exposed, especially to unprivileged processes.  Recent
> > exploits have targeted kernel internal implementation details like
> > "authencesn" that have zero use case for userspace access.
> >
> > Fortunately, AF_ALG is rarely used in practice, as userspace crypto
> > libraries exist.  And when it is used, only some functionality is known
> > to be used, and many users are known to hold capabilities already.
> > iwd for example requires CAP_NET_ADMIN and has a known algorithm list
> > (https://lore.kernel.org/linux-crypto/bcbbef00-5881-421b-8892-7be6c04b832d@gmail.com/).
> >
> > Thus, let's restrict the set of allowed algorithms by default, depending
> > on the capabilities held.
> >
> > Add a sysctl /proc/sys/crypto/af_alg_restrict with meaning:
> >
> >     0: unrestricted
> >     1: limited functionality
> >     2: completely disabled
> >
> > Set the default value to 1, which enables an algorithm allowlist for
> > unprivileged processes and a slightly longer allowlist for privileged
> > processes.
> 
> In our brave new world of containers, this is a bit awkward.  The
> admin is sort of asking two separate questions:
> 
> 1. Is the actual running distro and its privileged components capable
> of working without AF_ALG or with only the parts marked as being
> unprivileged?
> 
> 2. Is the system running contains that need the unprivileged parts?
> (Which is maybe just sha1 for ip?  I really don't know.)
> 
> Should there maybe be two separate options so that all options are
> available?  Or maybe something between 2 and 3 that means "limited
> functionality and privileged modes are completely disabled"?

If we want to offer more settings we could.  I could see this getting
quite complex pretty quickly once everyone weighs in, though.  There's
quite a bit of value in keeping things simple, even if the offered
settings won't be optimal for every case.

- Eric

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Kees Cook @ 2026-06-23 19:24 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour, Andy Lutomirski
In-Reply-To: <20260622234803.6982-1-ebiggers@kernel.org>

On Mon, Jun 22, 2026 at 04:48:03PM -0700, Eric Biggers wrote:
> AF_ALG is a frequent source of vulnerabilities and a maintenance
> nightmare.  It exposes far more functionality to userspace than ever
> should have been exposed, especially to unprivileged processes.  Recent
> exploits have targeted kernel internal implementation details like
> "authencesn" that have zero use case for userspace access.

I absolutely want to see this attack surface reduction.

> Add a sysctl /proc/sys/crypto/af_alg_restrict with meaning:
> [...]
> Note that the list may be tweaked in the future.  However, the common
> use cases such as iwd and bluez are taken into account already.  I've
> tested that iwd still works with the default value of 1.

I wince at this bit, though. This is a "security policy in the kernel"
which we try to avoid, and it's could be done already in userspace with
modprobe blacklist.

But, as you say, AF_ALG is deprecated. I understand that to mean that
the alg list is only ever going to *shrink* in the future.

Using a sysctl means monolithic kernels are protected, but wouldn't
those systems just compile AF_ALG out?

So, I guess, I would want a more clear rationale for why we do it this
way instead of via modprobe blacklist. I see a few reasons, but they
don't really convince me that we should ignore the "no security policy
in the kernel" rule to do it this way.

-Kees

-- 
Kees Cook

^ permalink raw reply

* Re:[PATCH 1/2] dt-bindings: hwmon: chipcap2: Add label property
From: Flaviu Nistor @ 2026-06-23 19:22 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Javier Carrasco, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Shuah Khan, Flaviu Nistor, linux-hwmon,
	linux-kernel, devicetree, linux-doc
In-Reply-To: <b6d34af3-0c4a-4870-a240-f7873621d2ce@roeck-us.net>

On 6/23/26 9:58 PM CET, Guenter Roeck wrote:
>On 6/23/26 11:16, Flaviu Nistor wrote:
>> On Mon Jun 22, 2026 at 7:29 PM CEST, Javier Carrasco wrote:
>>> On Mon Jun 22, 2026 at 2:21 PM CEST, Flaviu Nistor wrote:
>>>> Add support for an optional label property similar to other hwmon devices
>>>> This allows, in case of boards with multiple CHIPCAP2 sensors, to assign
>>>> distinct names to each instance.
>>>>
>>>> Signed-off-by: Flaviu Nistor <flaviu.nistor@gmail.com>
>>>> ---
>>>>   .../devicetree/bindings/hwmon/amphenol,chipcap2.yaml         | 5 +++++
>>>>   1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.ya=
>>> ml b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>>>> index 17351fdbefce..f00b5a4b14dd 100644
>>>> --- a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>>>> +++ b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>>>> @@ -33,6 +33,10 @@ properties:
>>>>     reg:
>>>>       maxItems: 1
>>>>
>>>> +  label:
>>>> +    description:
>>>> +      A descriptive name for this channel, like "ambient" or "psu".
>>>> +
>>>>     interrupts:
>>>>       items:
>>>>         - description: measurement ready indicator
>>>> @@ -72,6 +76,7 @@ examples:
>>>>                            <5 IRQ_TYPE_EDGE_RISING>,
>>>>                            <6 IRQ_TYPE_EDGE_RISING>;
>>>>               interrupt-names =3D "ready", "low", "high";
>>>> +            label =3D "somelabel";
>>>>               vdd-supply =3D <&reg_vdd>;
>>>>           };
>>>       };
>>>
>>> Hello Falviu, thank you for your patch.
>>>
>>
>> Hello Javier, thanks for your reply.
>>
>>> Should we not add a reference to hwmon-common.yaml (with
>>> unevelautedProperties instead of additionalProperties), as label is
>>> defined there? I believe that Krzysztof Kozlowski did something similar
>>> for the shunt-resistor-micro-ohms property. Could we follow suit here?
>>>
>>
>> This is a good question and I am happy you asked. I also thought a lot
>> about this and the reason I decided to go for this approach is that by using
>> $ref: hwmon-common.yaml#, I would have to change additionalProperties: false
>> to unevaluatedProperties: false, which will evaluate in case it is used, also
>> shunt-resistor-micro-ohms property which does not apply to this sensor. At
>> least this is my understanding, but of course I can be wrong (I see lm75 binding
>> also uses $ref: hwmon-common.yaml# but shunt-resistor-micro-ohms does not apply).
>>
>
>Where does the idea come from that shunt-resistor-micro-ohms would be mandatory ?
>That would make hwmon-common.yaml unusable for most chips.

I think this is a misunderstanding since I never had the intention to imply that
shunt-resistor-micro-ohms would be mandatory, but rather I observed that if I used
$ref: hwmon-common.yaml#, property shunt-resistor-micro-ohms can be added (no need to,
but still possible) in the example section and the dt_binding_check will pass.
Since hwmon-common.yaml is already there I will change the binding in a v2 and use it.

Best regards,
Flaviu

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Document the deprecation of AF_ALG
From: Eric Biggers @ 2026-06-23 19:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bastien Nocera, linux-crypto, Herbert Xu, Marcel Holtmann,
	Luiz Augusto von Dentz, linux-doc, linux-api, linux-kernel,
	netdev, linux-bluetooth, ell
In-Reply-To: <CAHk-=wgNG=F3xO9PjL0RcKy3UWvq0Np9uZu+nFUQBAA8So9xdA@mail.gmail.com>

On Tue, Jun 23, 2026 at 11:56:10AM -0700, Linus Torvalds wrote:
> On Tue, 23 Jun 2026 at 09:51, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > We're aware of that and are taking it into account in the allowlist:
> 
> Note that if we can  just unconditionally make it depend on
> CAP_NET_ADMIN, that would be good - independently of any allowlist.
> 
> Because if iwd and abluetoothd are the main two users, and both of
> those already require CAP_NET_ADMIN anyway...

There's also cryptsetup, including unprivileged benchmarking and also
(in theory) formatting support, and pre-7.0 versions of iproute2 which
used it for computing SHA-1 hashes of BPF programs.

If we broke unprivileged 'cryptsetup benchmark', some people would
definitely notice.  However, since it's just a manually-run benchmark
anyway, users could just run it with sudo.

I don't know about the iproute2 case.

It depends how aggressive we want to be.  My current proposal
(https://lore.kernel.org/linux-crypto/20260622234803.6982-1-ebiggers@kernel.org/)
has the entries in the allowlist marked as either privileged or
unprivileged.  There are just a few unprivileged ones, for cryptsetup
and iproute2 as mentioned.  But we could try doing away with the
unprivileged ones entirely and see who complains.

- Eric

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Andy Lutomirski @ 2026-06-23 19:12 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour
In-Reply-To: <20260622234803.6982-1-ebiggers@kernel.org>

On Mon, Jun 22, 2026 at 4:49 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> AF_ALG is a frequent source of vulnerabilities and a maintenance
> nightmare.  It exposes far more functionality to userspace than ever
> should have been exposed, especially to unprivileged processes.  Recent
> exploits have targeted kernel internal implementation details like
> "authencesn" that have zero use case for userspace access.
>
> Fortunately, AF_ALG is rarely used in practice, as userspace crypto
> libraries exist.  And when it is used, only some functionality is known
> to be used, and many users are known to hold capabilities already.
> iwd for example requires CAP_NET_ADMIN and has a known algorithm list
> (https://lore.kernel.org/linux-crypto/bcbbef00-5881-421b-8892-7be6c04b832d@gmail.com/).
>
> Thus, let's restrict the set of allowed algorithms by default, depending
> on the capabilities held.
>
> Add a sysctl /proc/sys/crypto/af_alg_restrict with meaning:
>
>     0: unrestricted
>     1: limited functionality
>     2: completely disabled
>
> Set the default value to 1, which enables an algorithm allowlist for
> unprivileged processes and a slightly longer allowlist for privileged
> processes.

In our brave new world of containers, this is a bit awkward.  The
admin is sort of asking two separate questions:

1. Is the actual running distro and its privileged components capable
of working without AF_ALG or with only the parts marked as being
unprivileged?

2. Is the system running contains that need the unprivileged parts?
(Which is maybe just sha1 for ip?  I really don't know.)

Should there maybe be two separate options so that all options are
available?  Or maybe something between 2 and 3 that means "limited
functionality and privileged modes are completely disabled"?

^ permalink raw reply

* Re: [PATCH v2] usbcore: Add quirk for 255-bytes initial config read
From: Nikhil Solanke @ 2026-06-23 19:08 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-usb, gregkh, linux-kernel, stern, michal.pecio, stable,
	corbet, skhan, linux-doc
In-Reply-To: <75822857-473d-4067-a378-aae2cdab4176@infradead.org>

> add ending '.'
>
> For all lines added here, use tabs instead of spaces for indentation.

Done! Waiting for any other changes before submitting v3

^ permalink raw reply

* [PATCH] kernel-doc: xforms: support __SYSFS_FUNCTION_ALTERNATIVE()
From: Randy Dunlap @ 2026-06-23 19:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Randy Dunlap, Thomas Weißschuh, Rafael J. Wysocki,
	Danilo Krummrich, driver-core, Greg Kroah-Hartman,
	Jonathan Corbet, Shuah Khan, linux-doc, Mauro Carvalho Chehab,
	stable

Add support for __SYSFS_FUNCTION_ALTERNATIVE() to create a union of its
members (as though CONFIG_CFI is unset).

Fixes these docs build warnings:

WARNING: include/linux/device.h:117 Invalid param: __SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf)
WARNING: include/linux/device.h:117 struct member '__SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*show' not described in 'device_attribute'
WARNING: include/linux/device.h:117 Invalid param: __SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*store)(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
WARNING: include/linux/device.h:117 struct member '__SYSFS_FUNCTION_ALTERNATIVE( ssize_t (*store' not described in 'device_attribute'

Fixes: 434506b86a6c ("driver core: Allow the constification of device attributes")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: driver-core@lists.linux.dev
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: stable@vger.kernel.org

 tools/lib/python/kdoc/xforms_lists.py |    1 +
 1 file changed, 1 insertion(+)

--- linux-next-20260619.orig/tools/lib/python/kdoc/xforms_lists.py
+++ linux-next-20260619/tools/lib/python/kdoc/xforms_lists.py
@@ -49,6 +49,7 @@ class CTransforms:
         (CMatch("DEFINE_DMA_UNMAP_ADDR"), r"dma_addr_t \1"),
         (CMatch("DEFINE_DMA_UNMAP_LEN"), r"__u32 \1"),
         (CMatch("VIRTIO_DECLARE_FEATURES"), r"union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }"),
+        (CMatch("__SYSFS_FUNCTION_ALTERNATIVE"), r"union { \1+ }"),
         (CMatch("__attribute__"), ""),
 
         #

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: hwmon: chipcap2: Add label property
From: Guenter Roeck @ 2026-06-23 18:58 UTC (permalink / raw)
  To: Flaviu Nistor, Javier Carrasco
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
	Shuah Khan, linux-hwmon, linux-kernel, devicetree, linux-doc
In-Reply-To: <20260623181625.5697-1-flaviu.nistor@gmail.com>

On 6/23/26 11:16, Flaviu Nistor wrote:
> On Mon Jun 22, 2026 at 7:29 PM CEST, Javier Carrasco wrote:
>> On Mon Jun 22, 2026 at 2:21 PM CEST, Flaviu Nistor wrote:
>>> Add support for an optional label property similar to other hwmon devices
>>> This allows, in case of boards with multiple CHIPCAP2 sensors, to assign
>>> distinct names to each instance.
>>>
>>> Signed-off-by: Flaviu Nistor <flaviu.nistor@gmail.com>
>>> ---
>>>   .../devicetree/bindings/hwmon/amphenol,chipcap2.yaml         | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.ya=
>> ml b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>>> index 17351fdbefce..f00b5a4b14dd 100644
>>> --- a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>>> +++ b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>>> @@ -33,6 +33,10 @@ properties:
>>>     reg:
>>>       maxItems: 1
>>>
>>> +  label:
>>> +    description:
>>> +      A descriptive name for this channel, like "ambient" or "psu".
>>> +
>>>     interrupts:
>>>       items:
>>>         - description: measurement ready indicator
>>> @@ -72,6 +76,7 @@ examples:
>>>                            <5 IRQ_TYPE_EDGE_RISING>,
>>>                            <6 IRQ_TYPE_EDGE_RISING>;
>>>               interrupt-names =3D "ready", "low", "high";
>>> +            label =3D "somelabel";
>>>               vdd-supply =3D <&reg_vdd>;
>>>           };
>>       };
>>
>> Hello Falviu, thank you for your patch.
>>
> 
> Hello Javier, thanks for your reply.
> 
>> Should we not add a reference to hwmon-common.yaml (with
>> unevelautedProperties instead of additionalProperties), as label is
>> defined there? I believe that Krzysztof Kozlowski did something similar
>> for the shunt-resistor-micro-ohms property. Could we follow suit here?
>>
> 
> This is a good question and I am happy you asked. I also thought a lot
> about this and the reason I decided to go for this approach is that by using
> $ref: hwmon-common.yaml#, I would have to change additionalProperties: false
> to unevaluatedProperties: false, which will evaluate in case it is used, also
> shunt-resistor-micro-ohms property which does not apply to this sensor. At
> least this is my understanding, but of course I can be wrong (I see lm75 binding
> also uses $ref: hwmon-common.yaml# but shunt-resistor-micro-ohms does not apply).
> 

Where does the idea come from that shunt-resistor-micro-ohms would be mandatory ?
That would make hwmon-common.yaml unusable for most chips.

Guenter

>> I am also not a big fan of a name like "somelabel", and a more
>> meaningful name from a "real" example would look better. I know that
>> some examples have already used "somelabel" as an example, but others
>> have used more meaningful names too.
>>
> 
> I will have to send a v2 since for the label property description I used
> "channel" instead of "sensor" (detected by Sashiko AI review), so I can
> use in the example section a more meaningful name like "Room" if no other
> suggestion.
> 
>> Best regards,
>> Javier Carrasco
> 
> Best regards,
> Flaviu Nistor
> 


^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Document the deprecation of AF_ALG
From: Linus Torvalds @ 2026-06-23 18:56 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Bastien Nocera, linux-crypto, Herbert Xu, Marcel Holtmann,
	Luiz Augusto von Dentz, linux-doc, linux-api, linux-kernel,
	netdev, linux-bluetooth, ell
In-Reply-To: <20260623164932.GA1793@sol>

On Tue, 23 Jun 2026 at 09:51, Eric Biggers <ebiggers@kernel.org> wrote:
>
> We're aware of that and are taking it into account in the allowlist:

Note that if we can  just unconditionally make it depend on
CAP_NET_ADMIN, that would be good - independently of any allowlist.

Because if iwd and abluetoothd are the main two users, and both of
those already require CAP_NET_ADMIN anyway...

                Linus

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Luiz Augusto von Dentz @ 2026-06-23 18:51 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour, Andy Lutomirski
In-Reply-To: <20260623180502.GC1850517@google.com>

Hi Eric,

On Tue, Jun 23, 2026 at 2:05 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Jun 23, 2026 at 09:52:08AM -0700, Eric Biggers wrote:
> > On Tue, Jun 23, 2026 at 11:04:14AM -0400, Luiz Augusto von Dentz wrote:
> > > > +===  ==================================================================
> > > > +0    AF_ALG is unrestricted.
> > > > +
> > > > +1    AF_ALG is supported with a limited list of algorithms. The list
> > > > +     is designed for compatibility with known users such as iwd and
> > > > +     bluez that haven't yet been fixed to use userspace crypto code.
> > >
> > > Is the expectation that we go shopping for userspace crypto here?
> >
> > Yes, same as what 99% of userspace already does.  Probably you'll just
> > want to link to OpenSSL, but it could be something else if you want.
> >
> > - Eric
>
> By the way you do know that bluez already has a local implementation of
> ECDH, right?  See src/shared/ecc.c.

It's never been audited; it's only used for hardware emulation, I
didn't even remember we had that thingy. What we really use is
src/shared/crypto.c, and I'm not looking forward to having it changed.
With something like Zephyr, changing crypto libraries every so often
just because one didn't fit on a platform wasn't a great experience,
and that is a much bigger project. In the end it seems they are using
a forked mbedtls:

https://github.com/zephyrproject-rtos/mbedtls

I'm quite sure whatever choice we make will be the wrong choice for
someone. Then someone will have the brilliant idea to add some sort of
backend support to let everyone plug in their preferred crypto
library, possibly adding even more code to audit.

> - Eric

-- 
Luiz Augusto von Dentz

^ permalink raw reply

* Re: [PATCH v2] usbcore: Add quirk for 255-bytes initial config read
From: Randy Dunlap @ 2026-06-23 18:35 UTC (permalink / raw)
  To: Nikhil Solanke, linux-usb
  Cc: gregkh, linux-kernel, stern, michal.pecio, stable, corbet, skhan,
	linux-doc
In-Reply-To: <20260623161035.5792-1-nikhilsolanke5@gmail.com>



On 6/23/26 9:10 AM, Nikhil Solanke wrote:
> Certain third-party USB game controllers exposing (or spoofing) an Xbox
> 360-compatible interface (VID:PID 045e:028e) fail to enumerate under Linux.
> The device disconnects from the bus without responding to the initial
> GET_DESCRIPTOR(CONFIGURATION) request, and the kernel logs 'unable to read
> config index 0 descriptor/start: -71'.
> 
> The device then falls back to a secondary Android HID mode (with a
> different VID:PID), losing XInput functionality including rumble support.
> The failure reproduces across multiple machines, host controller types, and
> kernel versions including current mainline and LTS. The device enumerates
> correctly and remains in XInput mode under Windows. Notably, the device
> enumerates correctly in Android mode when the same 9-byte request
> is issued for that mode's configuration descriptor, confirming the firmware
> bug is specific to the XInput mode.
> 
> usbmon traces from Linux and Wireshark/USBPcap traces from Windows are
> identical up to the point of failure, with no visible protocol-level
> difference explaining the divergence. The root cause was identified when
> Michal Pecio discovered via a QEMU bus-level capture that Windows does not
> use wLength=9 for the initial config descriptor request; it uses
> wLength=255. Alan Stern subsequently confirmed this with a bus
> analyzer on a different USB 2.0 device, and Michal verified the behavior
> goes back to Windows 95 OSR2.1.
> 
> So, add a new quirk flag USB_QUIRK_CONFIG_SIZE which causes
> usb_get_configuration() to issue a 255 byte sized configuration request
> instead of USB_DT_CONFIG_SIZE (9) for the initial
> GET_DESCRIPTOR(CONFIGURATION) request, mimicking long-standing Windows
> behavior.
> 
> Suggested-by: Alan Stern <stern@rowland.harvard.edu>
> Suggested-by: Michal Pecio <michal.pecio@gmail.com>
> Closes: https://lore.kernel.org/linux-usb/CAFgddh+JWdT4LLwMc5qjM8q_pBu-fRo2qADR5ovAKoGHWMQrRw@mail.gmail.com/
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@vger.kernel.org
> 
> Signed-off-by: Nikhil Solanke <nikhilsolanke5@gmail.com>
> ---
> Changes in v2:
> - Add Documentation
> - Naming changes
> - Refactored to have a better flow with existing code.
> 
>  .../admin-guide/kernel-parameters.txt         |  9 +++
>  drivers/usb/core/config.c                     | 61 ++++++++++++++-----
>  drivers/usb/core/hub.c                        |  6 +-
>  drivers/usb/core/quirks.c                     |  4 ++
>  include/linux/usb/quirks.h                    |  3 +
>  5 files changed, 67 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 97007f4f69d4..af4bf0ef2c7b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -8158,6 +8158,15 @@ Kernel parameters
>  				q = USB_QUIRK_FORCE_ONE_CONFIG (Device
>  					claims zero configurations,
>  					forcing to 1);
> +                r = USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE (Device
> +                    fails during initialization when asked for
> +                    9-bytes configuration desciptor request. Ask

		                             descriptor

> +                    for 255-bytes request instead to mirror
> +                    Windows' behavior. This quirk is originally
> +                    meant to fix some quirky gamepads that refuse
> +                    to connect in their XInput mode. But it can also
> +                    potentially fix issues with other USB devices
> +                    that work on Windows but not on Linux)

add ending '.'

For all lines added here, use tabs instead of spaces for indentation.


>  			Example: quirks=0781:5580:bk,0a5c:5834:gij
>  
>  	usbhid.mousepoll=


-- 
~Randy


^ permalink raw reply

* Re: [PATCH v7 03/10] tracing/probes: Support dumping fetcharg program for debugging dynamic events
From: Julian Braha @ 2026-06-23 18:31 UTC (permalink / raw)
  To: Masami Hiramatsu (Google), Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
	linux-doc, linux-kselftest
In-Reply-To: <178217907822.643090.14693478306190628970.stgit@devnote2>

Hi Masami,

On 6/23/26 02:44, Masami Hiramatsu (Google) wrote:

> +config PROBE_EVENTS_DUMP_FETCHARG
> +	depends on PROBE_EVENTS
> +	bool "Dump of dynamic probe event fetch-arguments"
> +	default n

Sorry, kconfig nitpick: could you match the style used by the rest of
the config options in this file? E.g. the type and prompt come first in
the list of attributes?

- Julian Braha

^ permalink raw reply

* Re: [PATCH v4 1/5] mm/zswap: Extend shrink_memcg() writeback capability
From: Yosry Ahmed @ 2026-06-23 18:17 UTC (permalink / raw)
  To: Hao Jia
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia
In-Reply-To: <d0f05c35-457a-4b2c-6faa-7a83d4bdec01@gmail.com>

> My initial thought was that if cold memory is evenly distributed across
> nodes and we are doing a large writeback, it would be better to balance
> the zswap entry writeback across all nodes rather than just draining
> node 0 first. However, since we currently lack a proper metric to
> represent hot/cold memory (such as age-based tracking), doing this
> probably doesn't make much sense right now.

Yeah let's start simple and go from there.

>
> So, perhaps we want something like this? Please correct me if I'm wrong.
>
> static long shrink_memcg(struct mem_cgroup *memcg,
>         unsigned long nr_to_scan)
> {
>    struct zswap_shrink_walk_arg walk_arg = {
>      .bytes_written = 0,
>      .encountered_page_in_swapcache = false,
>    };
>    unsigned long nr_remaining = nr_to_scan;
>    bool memcg_list_is_empty = true;
>    int nid;
>
>    if (!mem_cgroup_zswap_writeback_enabled(memcg))
>      return -ENOENT;
>
>    if (memcg && !mem_cgroup_online(memcg))
>      return -ENOENT;
>
>    for_each_node_state(nid, N_NORMAL_MEMORY) {
>      unsigned long nr_to_walk;
>
>      /*
>       * Cap the per-node scan by the current LRU length. A referenced
>       * entry is only rotated to the tail (second chance) and may be
>       * revisited within a single walk; without this cap those rotated
>       * entries could drain the shared scan budget on one node.
>       */

The comment here is a bit misleading. It's not just about draining one
node. One call to shrink_memcg() should only scan entries once. The
caller can then choose to scan the memcg again, or scan a different
one. In this case, the caller should iterate all memcgs first before
retrying memcgs again and reclaiming rotated entries.

>      nr_to_walk = min(nr_remaining,
>           list_lru_count_one(&zswap_list_lru, nid, memcg));
>      if (!nr_to_walk)
>        continue;
>      memcg_list_is_empty = false;
>
>      nr_remaining -= nr_to_walk;
>      list_lru_walk_one(&zswap_list_lru, nid, memcg,
>            &shrink_memcg_cb, &walk_arg, &nr_to_walk);
>      /* Return the unused share of the budget to the pool. */
>      nr_remaining += nr_to_walk;
>
>      /* Bail out once the whole scan budget has been spent. */

The comment is unnecessary.

>      if (!nr_remaining)
>        break;
>
>      cond_resched();

Did you observe a problem here or did you just add this due to an
abundance of caution?

>    }
>
>    if (memcg_list_is_empty)

Do we need memcg_list_is_empty? Can we just check if nr_remaining
matches nr_to_scan?

>      return -ENOENT;
>
>    return walk_arg.bytes_written;
> }
>
> Thanks,
> Hao

^ permalink raw reply

* Re:[PATCH 1/2] dt-bindings: hwmon: chipcap2: Add label property
From: Flaviu Nistor @ 2026-06-23 18:16 UTC (permalink / raw)
  To: Javier Carrasco
  Cc: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Shuah Khan, Flaviu Nistor, linux-hwmon,
	linux-kernel, devicetree, linux-doc
In-Reply-To: <DJFPYCV2FXW7.1BFG9DURPZRCC@gmail.com>

On Mon Jun 22, 2026 at 7:29 PM CEST, Javier Carrasco wrote:
>On Mon Jun 22, 2026 at 2:21 PM CEST, Flaviu Nistor wrote:
>> Add support for an optional label property similar to other hwmon devices
>> This allows, in case of boards with multiple CHIPCAP2 sensors, to assign
>> distinct names to each instance.
>>
>> Signed-off-by: Flaviu Nistor <flaviu.nistor@gmail.com>
>> ---
>>  .../devicetree/bindings/hwmon/amphenol,chipcap2.yaml         | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.ya=
>ml b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>> index 17351fdbefce..f00b5a4b14dd 100644
>> --- a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>> +++ b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml
>> @@ -33,6 +33,10 @@ properties:
>>    reg:
>>      maxItems: 1
>>
>> +  label:
>> +    description:
>> +      A descriptive name for this channel, like "ambient" or "psu".
>> +
>>    interrupts:
>>      items:
>>        - description: measurement ready indicator
>> @@ -72,6 +76,7 @@ examples:
>>                           <5 IRQ_TYPE_EDGE_RISING>,
>>                           <6 IRQ_TYPE_EDGE_RISING>;
>>              interrupt-names =3D "ready", "low", "high";
>> +            label =3D "somelabel";
>>              vdd-supply =3D <&reg_vdd>;
>>          };
>      };
>
>Hello Falviu, thank you for your patch.
>

Hello Javier, thanks for your reply.

>Should we not add a reference to hwmon-common.yaml (with
>unevelautedProperties instead of additionalProperties), as label is
>defined there? I believe that Krzysztof Kozlowski did something similar
>for the shunt-resistor-micro-ohms property. Could we follow suit here?
>

This is a good question and I am happy you asked. I also thought a lot
about this and the reason I decided to go for this approach is that by using
$ref: hwmon-common.yaml#, I would have to change additionalProperties: false
to unevaluatedProperties: false, which will evaluate in case it is used, also
shunt-resistor-micro-ohms property which does not apply to this sensor. At
least this is my understanding, but of course I can be wrong (I see lm75 binding
also uses $ref: hwmon-common.yaml# but shunt-resistor-micro-ohms does not apply). 

>I am also not a big fan of a name like "somelabel", and a more
>meaningful name from a "real" example would look better. I know that
>some examples have already used "somelabel" as an example, but others
>have used more meaningful names too.
>

I will have to send a v2 since for the label property description I used
"channel" instead of "sensor" (detected by Sashiko AI review), so I can
use in the example section a more meaningful name like "Room" if no other
suggestion.

>Best regards,
>Javier Carrasco

Best regards,
Flaviu Nistor

^ permalink raw reply

* Re: [PATCH v12 11/12] x86/vmscape: Resolve conflict between attack-vectors and vmscape=force
From: bot+bpf-ci @ 2026-06-23 18:13 UTC (permalink / raw)
  To: pawan.kumar.gupta, x86, jon, nik.borisov, hpa, jpoimboe,
	david.kaplan, seanjc, bp, dave.hansen, peterz, ast, daniel,
	andrii, kpsingh, jolsa, davem, david.laight.linux, luto, tglx,
	mingo, dsahern, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, sdf, haoluo, pbonzini, corbet, jbaron, aliceryhl,
	rostedt, ardb, skhan
  Cc: linux-kernel, kvm, asit.k.mallick, tao1.zhang, bpf, netdev,
	linux-doc, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260622-vmscape-bhb-v12-11-76cbda0ae3e5@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

> x86/vmscape: Resolve conflict between attack-vectors and vmscape=force
>
> vmscape=force option currently defaults to AUTO mitigation. This lets
> attack-vector controls to override the vmscape mitigation. Preventing the
> user from being able to force VMSCAPE mitigation.
>
> When vmscape mitigation is forced, allow it be deployed irrespective of
> attack vectors. Introduce VMSCAPE_MITIGATION_ON that wins over
> attack-vector controls.
>
> Tested-by: Jon Kohler <jon@nutanix.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

This appears to fix a behavioral bug where vmscape=force mapped to
VMSCAPE_MITIGATION_AUTO, which is gated by should_mitigate_vuln(), so
attack-vector controls could override a user-forced mitigation.

Should this carry a Fixes: tag?

The force -> VMSCAPE_MITIGATION_AUTO mapping that allowed the override
predates this series. The earlier refactor 68d94c61c09a ("x86/vmscape:
Move mitigation selection to a switch()") only moved the logic into a
switch() and did not introduce the conflict, so it does not look like the
right Fixes: target. A precise commit to point at could not be confirmed
from this tree.

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/28046221590

^ permalink raw reply

* Re: [PATCH v12 02/12] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: bot+bpf-ci @ 2026-06-23 18:13 UTC (permalink / raw)
  To: pawan.kumar.gupta, x86, jon, nik.borisov, hpa, jpoimboe,
	david.kaplan, seanjc, bp, dave.hansen, peterz, ast, daniel,
	andrii, kpsingh, jolsa, davem, david.laight.linux, luto, tglx,
	mingo, dsahern, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, sdf, haoluo, pbonzini, corbet, jbaron, aliceryhl,
	rostedt, ardb, skhan
  Cc: linux-kernel, kvm, asit.k.mallick, tao1.zhang, bpf, netdev,
	linux-doc, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260622-vmscape-bhb-v12-2-76cbda0ae3e5@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 3513 bytes --]

> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 83f51cab0b1e..2cb4a96247d8 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c

[ ... ]

> @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
>  		x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
>  	}
>
> +	/*
> +	 * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> +	 * support), see Intel's BHI guidance.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> +		bhb_seq_outer_loop = 12;
> +		bhb_seq_inner_loop = 7;
> +	}

Should this also consider the case of a guest running under a hypervisor
that does not expose BHI_CTRL?

When BHI_CTRL is absent the loop counts keep their short defaults (5/5)
and the guest declares itself BHI-mitigated.  In a heterogeneous
migration pool a guest may present as Ice Lake (no BHI_CTRL) while
actually running on Alder Lake or newer hardware, where the short
sequence is not sufficient.

A default-safe approach would be to also select the long sequence when
running under a hypervisor without BHI_CTRL, for example:

	if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL) ||
	    cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
		bhb_seq_outer_loop = 12;
		bhb_seq_inner_loop = 7;
	}

There is precedent for assuming the worst under a hypervisor, such as
vulnerable_to_its(), cpu_set_bug_bits() and check_null_seg_clears_base().

This came up during review and the resolution was to let the series move
forward and revisit the hypervisor case afterwards.  Is that still the
intent, or should the hypervisor fallback land with this patch?


For the missing Fixes: tag, quoting the full commit message:

commit a2a0812335ee19ae1df81f88e736da5210748c2b
Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

>     x86/bhi: Make clear_bhb_loop() effective on newer CPUs
>
>     As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
>     the Branch History Buffer (BHB). On Alder Lake and newer parts this
>     sequence is not sufficient because it doesn't clear enough entries. This
>     was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
>     in the kernel.
>     Now with VMSCAPE (BHI variant) it is also required to isolate branch
>     history between guests and userspace. Since BHI_DIS_S only protects the
>     kernel, the newer CPUs also use IBPB.
>     A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
>     But it currently does not clear enough BHB entries to be effective on newer
>     CPUs with larger BHB. At boot, dynamically set the loop count of
>     clear_bhb_loop() such that it is effective on newer CPUs too.
>     Introduce global loop counts, initializing them with appropriate value
>     based on the hardware feature X86_FEATURE_BHI_CTRL.
>     Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
>     Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
>     Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

The short BHB clear sequence with the hardcoded loop count of 5 was
added by clear_bhb_loop() in entry_64.S, which is the code this patch
makes configurable.  Should this carry:

  Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/28046221590

^ permalink raw reply

* Re: [PATCH v12 08/12] KVM: Define EXPORT_STATIC_CALL_FOR_KVM()
From: bot+bpf-ci @ 2026-06-23 18:13 UTC (permalink / raw)
  To: pawan.kumar.gupta, x86, jon, nik.borisov, hpa, jpoimboe,
	david.kaplan, seanjc, bp, dave.hansen, peterz, ast, daniel,
	andrii, kpsingh, jolsa, davem, david.laight.linux, luto, tglx,
	mingo, dsahern, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, sdf, haoluo, pbonzini, corbet, jbaron, aliceryhl,
	rostedt, ardb, skhan
  Cc: linux-kernel, kvm, asit.k.mallick, tao1.zhang, bpf, netdev,
	linux-doc, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260622-vmscape-bhb-v12-8-76cbda0ae3e5@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]

> EXPORT_SYMBOL_FOR_KVM() exists to export symbols to KVM modules. Static
> calls need the same treatment when the core kernel defines a static_call
> that KVM needs access to (e.g. from a VM-exit path).
>
> Define EXPORT_STATIC_CALL_FOR_KVM() as the static_call analogue of
> EXPORT_SYMBOL_FOR_KVM(). The same three-way logic applies:
>
>   - KVM_SUB_MODULES defined: export to "kvm," plus all sub-modules
>   - KVM=m, no sub-modules: export to "kvm" only
>   - KVM built-in: no export needed (noop)
>
>   As with EXPORT_SYMBOL_FOR_KVM(), allow architectures to override both
>   macros (e.g. to suppress the export when kvm.ko itself will not be
>   built despite CONFIG_KVM=m). Add the x86 no-op overrides in
>   arch/x86/include/asm/kvm_types.h for that case. To keep the pair in
>   sync, EXPORT_STATIC_CALL_FOR_KVM() is defined inside the
>   EXPORT_SYMBOL_FOR_KVM #ifndef block; an arch that defines
>   EXPORT_SYMBOL_FOR_KVM must also define EXPORT_STATIC_CALL_FOR_KVM or the
>   build will fail with a compile-time error.
>
> As with EXPORT_SYMBOL_FOR_KVM(), allow architectures to override
> EXPORT_STATIC_CALL_FOR_KVM definition (e.g. to suppress the export when
> kvm.ko itself will not be built despite CONFIG_KVM=m). Add the x86 no-op
> override in arch/x86/include/asm/kvm_types.h for that case.
>
> Architectures must also define EXPORT_STATIC_CALL_FOR_KVM when they define
> EXPORT_SYMBOL_FOR_KVM.

This isn't a bug, but the arch-override rationale appears twice in nearly
identical form.  The third paragraph (starting "As with
EXPORT_SYMBOL_FOR_KVM(), allow architectures to override both macros") and
the fourth paragraph (starting "As with EXPORT_SYMBOL_FOR_KVM(), allow
architectures to override EXPORT_STATIC_CALL_FOR_KVM definition") both
explain suppressing the export when kvm.ko will not be built and both
mention adding the x86 no-op override.

The requirement that an arch defining EXPORT_SYMBOL_FOR_KVM must also
define EXPORT_STATIC_CALL_FOR_KVM also shows up twice: once inside the
third paragraph ("an arch that defines EXPORT_SYMBOL_FOR_KVM must also
define EXPORT_STATIC_CALL_FOR_KVM or the build will fail") and again as the
standalone final sentence ("Architectures must also define
EXPORT_STATIC_CALL_FOR_KVM when they define EXPORT_SYMBOL_FOR_KVM").

Could the duplicate paragraphs be folded into a single statement?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/28046221590

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Eric Biggers @ 2026-06-23 18:05 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour, Andy Lutomirski
In-Reply-To: <20260623165208.GB1793@sol>

On Tue, Jun 23, 2026 at 09:52:08AM -0700, Eric Biggers wrote:
> On Tue, Jun 23, 2026 at 11:04:14AM -0400, Luiz Augusto von Dentz wrote:
> > > +===  ==================================================================
> > > +0    AF_ALG is unrestricted.
> > > +
> > > +1    AF_ALG is supported with a limited list of algorithms. The list
> > > +     is designed for compatibility with known users such as iwd and
> > > +     bluez that haven't yet been fixed to use userspace crypto code.
> > 
> > Is the expectation that we go shopping for userspace crypto here?
> 
> Yes, same as what 99% of userspace already does.  Probably you'll just
> want to link to OpenSSL, but it could be something else if you want.
> 
> - Eric

By the way you do know that bluez already has a local implementation of
ECDH, right?  See src/shared/ecc.c.

- Eric

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Eric Biggers @ 2026-06-23 17:58 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Andy Lutomirski
In-Reply-To: <34e20a64-17d3-4994-89f1-6084cd4fae36@gmail.com>

On Tue, Jun 23, 2026 at 01:24:01PM -0400, Demi Marie Obenour wrote:
> I think there is room for something in-between the allowlist provided
> here and "no restrictions".  For instance, I think it makes sense
> to have a mode that allows modern¸ widely-used algorithms (AES-GCM,
> ChaCha20-Poly1305, SHA-3, HMAC, etc) to all users.
> 
> This makes it less likely someone turns off all restrictions.

Would it be worth the complexity when AF_ALG as a whole is deprecated?

> XFRM allows providing an arbitrary algorithm name, and it appears to
> be accessible in unprivileged user namespaces.  That also needs an
> allowlist.

Doesn't XFRM have a fixed list already?  The algorithm comes in as a
string, but it has to be in the list.

> > +	{ "hmac(sha512)", true }, /* iwd, sha512hmac */
> 
> Should this entry have privileged = false?  sha512hmac doesn't
> need privileges.

It's for the FIPS 140-3 integrity check use case, which runs as root.

> > diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
> > index 4dfe7899f8fa..bd522915d56d 100644
> > --- a/crypto/algif_rng.c
> > +++ b/crypto/algif_rng.c
> > @@ -48,10 +48,14 @@
> >  
> >  MODULE_LICENSE("GPL");
> >  MODULE_AUTHOR("Stephan Mueller <smueller@chronox.de>");
> >  MODULE_DESCRIPTION("User-space interface for random number generators");
> >  
> > +static const struct af_alg_allowlist_entry rng_allowlist[] = {
> > +	{},
> > +};
> 
> Can this whole file be deleted?  You wrote that it isn't actually used.

That's not quite what I said.  There's still one known use case: CAVP
testing for FIPS 140-3 certifications.  But that only accesses the drbg
and jitterentropy RNGs and can run as root.

We could consider deleting it anyway and require that the FIPS people
use custom kernel modules.  But that is out of the scope of this patch.

> > +static const struct af_alg_allowlist_entry skcipher_allowlist[] = {
> > +	{ "adiantum(xchacha12,aes)", false }, /* cryptsetup */
> > +	{ "adiantum(xchacha20,aes)", false }, /* cryptsetup */
> > +	{ "cbc(aes)", true }, /* iwd */
> > +	{ "cbc(des)", true }, /* iwd */
> > +	{ "cbc(des3_ede)", true }, /* iwd */
> > +	{ "ctr(aes)", true }, /* iwd */
> > +	{ "ecb(aes)", true }, /* iwd, bluez */
> > +	{ "ecb(des)", true }, /* iwd */
> > +	{ "hctr2(aes)", false }, /* cryptsetup */
> > +	{ "xts(aes)", false }, /* cryptsetup benchmark */
> > +	{},
> > +};
> 
> Do the cryptsetup ones really need to be accessible to unprivileged users?

If we don't allow them, people running 'cryptsetup benchmark' will
notice the change.  Formatting could also be affected.

Requiring sudo in these cases might be feasible, but we need to strike
the right balance.  Maybe Milan has an opinion.

- Eric

^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Eric Biggers @ 2026-06-23 17:45 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Luiz Augusto von Dentz, linux-crypto, Herbert Xu, linux-kernel,
	linux-doc, linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Andy Lutomirski
In-Reply-To: <01b73988-b35c-4ce8-8463-4589fe18b0b0@gmail.com>

On Tue, Jun 23, 2026 at 01:29:20PM -0400, Demi Marie Obenour wrote:
> On 6/23/26 12:52, Eric Biggers wrote:
> > On Tue, Jun 23, 2026 at 11:04:14AM -0400, Luiz Augusto von Dentz wrote:
> >>> +===  ==================================================================
> >>> +0    AF_ALG is unrestricted.
> >>> +
> >>> +1    AF_ALG is supported with a limited list of algorithms. The list
> >>> +     is designed for compatibility with known users such as iwd and
> >>> +     bluez that haven't yet been fixed to use userspace crypto code.
> >>
> >> Is the expectation that we go shopping for userspace crypto here?
> > 
> > Yes, same as what 99% of userspace already does.  Probably you'll just
> > want to link to OpenSSL, but it could be something else if you want.
> 
> Hard disagree on OpenSSL.  It's not a good library.
> 
> See <https://cryptography.io/en/latest/statements/state-of-openssl/>.
> 
> Distributions should ship AWS-LC and either rebuild reverse
> dependencies when needed, or work with upstream to catch ABI breaks.

I don't like OpenSSL either, but it's the de facto standard on most
distros.  While perhaps distros should make that switch, there's no need
to wait for that to move away from AF_ALG.

- Eric

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox