Linux kernel -stable discussions
 help / color / mirror / Atom feed
* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
       [not found] <7dc143fa-4a48-440b-b624-ac57a361ac74@oracle.com>
@ 2025-01-29  8:33 ` Harshvardhan Jha
  2025-01-29  8:35   ` Greg KH
  0 siblings, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-29  8:33 UTC (permalink / raw)
  To: Konrad Wilk, Boris Ostrovsky, jgross@suse.com
  Cc: sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable

[-- Attachment #1: Type: text/plain, Size: 4570 bytes --]

Hi All,

+stable

There seems to be some formatting issues in my log output. I have
attached it as a file.

Thanks & Regards,
Harshvardhan

On 29/01/25 1:57 PM, Harshvardhan Jha wrote:
> Hello there,
>
> The stable tag v5.4.289 seems to fail to boot with following trace:
>
> [ OK ] Created slice system-serial\x2dgetty.slice. [ OK ] Listening on
> udev Control Socket. [ OK ] Reached target Local Encrypted Volumes. [ OK
> ] Listening on /dev/initctl Compatibility Named Pipe. [ OK ] Listening
> on Delayed Shutdown Socket. [ OK ] Created slice
> system-selinux\x2dpol...grate\x2dlocal\x2dchanges.slice. [ OK ] Stopped
> target Initrd File Systems. [ OK ] Listening on LVM2 metadata daemon
> socket. Mounting Debug File System... [ OK ] Listening on networkd
> rtnetlink socket. [ OK ] Listening on Device-mapper event daemon FIFOs.
> Starting Monitoring of LVM2 mirrors... dmeventd or progress polling... [
> OK ] Created slice User and Session Slice. Starting Read and set NIS
> domainname from /etc/sysconfig/network... Starting Load legacy module
> configuration... [ OK ] Set up automount Arbitrary Executab...ats File
> System Automount Point. [ OK ] Created slice
> system-rdma\x2dload\x2dmodules.slice. [ OK ] Started Forward Password
> Requests to Wall Directory Watch. [ OK ] Reached target Paths. Starting
> Remount Root and Kernel File Systems... [ OK ] Created slice
> system-getty.slice. Starting Create list of required st... nodes for the
> current kernel... Starting Set Up Additional Binary Formats... [ OK ]
> Stopped target Initrd Root File System. [ OK ] Reached target Slices. [
> OK ] Created slice system-systemd\x2dfsck.slice. [ OK ] Mounted POSIX
> Message Queue File System. [ OK ] Mounted Debug File System. [ OK ]
> Started Read and set NIS domainname from /etc/sysconfig/network.
> Mounting Arbitrary Executable File Formats File System... [ OK ] Started
> Journal Service. [ OK ] Started Create list of required sta...ce nodes
> for the current kernel. Starting Create Static Device Nodes in /dev... [
> OK ] Started LVM2 metadata daemon. [ OK ] Started Remount Root and
> Kernel File Systems. Starting Load/Save Random Seed... Starting udev
> Coldplug all Devices... Starting Flush Journal to Persistent Storage...
> [FAILED] Failed to start Load Kernel Modules. See 'systemctl status
> systemd-modules-load.service' for details. Starting Apply Kernel
> Variables... [ OK ] Started Load/Save Random Seed. [ OK ] Mounted
> Arbitrary Executable File Formats File System. [ OK ] Started Set Up
> Additional Binary Formats. [ OK ] Started Create Static Device Nodes in
> /dev. Starting udev Kernel Device Manager... [ OK ] Started Apply Kernel
> Variables. [ OK ] Started Load legacy module configuration. [ OK ]
> Started udev Coldplug all Devices. Starting udev Wait for Complete
> Device Initialization... [ 24.427217] megaraid_sas 0000:65:00.0:
> megasas_build_io_fusion 3273 sge_count (-12) is out of range. Range is:
> 0-256
>
> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273
> sge_count (-12) is out of range. Range is:  0-256 kept showing
> infinitely. It kept popping until I reset the system.
>
> Reverting the following patch seems to fix the issue:
>
> commit 07c9cccc4c3fecba175a7e5aafba6370758f5ce2
> Author: Juergen Gross <jgross@suse.com>
> Date:   Fri Sep 13 12:05:02 2024 +0200
>
>     xen/swiotlb: add alignment check for dma buffers
>
>     [ Upstream commit 9f40ec84a7976d95c34e7cc070939deb103652b0 ]
>
>     When checking a memory buffer to be consecutive in machine memory,
>     the alignment needs to be checked, too. Failing to do so might result
>     in DMA memory not being aligned according to its requested size,
>     leading to error messages like:
>
>       4xxx 0000:2b:00.0: enabling device (0140 -> 0142)
>       4xxx 0000:2b:00.0: Ring address not aligned
>       4xxx 0000:2b:00.0: Failed to initialise service qat_crypto
>       4xxx 0000:2b:00.0: Resetting device qat_dev0
>       4xxx: probe of 0000:2b:00.0 failed with error -14
>
>     Fixes: 9435cce87950 ("xen/swiotlb: Add support for 64KB page
> granularity")
>     Signed-off-by: Juergen Gross <jgross@suse.com>
>     Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>     Signed-off-by: Juergen Gross <jgross@suse.com>
>     Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> I tried changing swiotlb grub command line arguments but that didn't
> seem to help much unfortunately and the error was seen again.
>
> Thanks & Regards,
> Harshvardhan
>

[-- Attachment #2: history.txt --]
[-- Type: text/plain, Size: 2812 bytes --]

[  OK  ] Created slice system-serial\x2dgetty.slice.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Created slice system-selinux\x2dpol...grate\x2dlocal\x2dchanges.slice.
[  OK  ] Stopped target Initrd File Systems.
[  OK  ] Listening on LVM2 metadata daemon socket.
         Mounting Debug File System...
[  OK  ] Listening on networkd rtnetlink socket.
[  OK  ] Listening on Device-mapper event daemon FIFOs.
         Starting Monitoring of LVM2 mirrors... dmeventd or progress polling...
[  OK  ] Created slice User and Session Slice.
         Starting Read and set NIS domainname from /etc/sysconfig/network...
         Starting Load legacy module configuration...
[  OK  ] Set up automount Arbitrary Executab...ats File System Automount Point.
[  OK  ] Created slice system-rdma\x2dload\x2dmodules.slice.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Reached target Paths.
         Starting Remount Root and Kernel File Systems...
[  OK  ] Created slice system-getty.slice.
         Starting Create list of required st... nodes for the current kernel...
         Starting Set Up Additional Binary Formats...
[  OK  ] Stopped target Initrd Root File System.
[  OK  ] Reached target Slices.
[  OK  ] Created slice system-systemd\x2dfsck.slice.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Debug File System.
[  OK  ] Started Read and set NIS domainname from /etc/sysconfig/network.
         Mounting Arbitrary Executable File Formats File System...
[  OK  ] Started Journal Service.
[  OK  ] Started Create list of required sta...ce nodes for the current kernel.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started LVM2 metadata daemon.
[  OK  ] Started Remount Root and Kernel File Systems.
         Starting Load/Save Random Seed...
         Starting udev Coldplug all Devices...
         Starting Flush Journal to Persistent Storage...
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
         Starting Apply Kernel Variables...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Mounted Arbitrary Executable File Formats File System.
[  OK  ] Started Set Up Additional Binary Formats.
[  OK  ] Started Create Static Device Nodes in /dev.
         Starting udev Kernel Device Manager...
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Load legacy module configuration.
[  OK  ] Started udev Coldplug all Devices.
         Starting udev Wait for Complete Device Initialization...
[   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273 sge_count (-12) is out of range. Range is:  0-256

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29  8:33 ` v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range Harshvardhan Jha
@ 2025-01-29  8:35   ` Greg KH
  2025-01-29  8:43     ` Harshvardhan Jha
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2025-01-29  8:35 UTC (permalink / raw)
  To: Harshvardhan Jha
  Cc: Konrad Wilk, Boris Ostrovsky, jgross@suse.com,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable

On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
> Hi All,
> 
> +stable
> 
> There seems to be some formatting issues in my log output. I have
> attached it as a file.

Confused, what are you wanting us to do here in the stable tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29  8:35   ` Greg KH
@ 2025-01-29  8:43     ` Harshvardhan Jha
  2025-01-29  8:48       ` Greg KH
  0 siblings, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-29  8:43 UTC (permalink / raw)
  To: Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, jgross@suse.com,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable

Hi there,

On 29/01/25 2:05 PM, Greg KH wrote:
> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>> Hi All,
>>
>> +stable
>>
>> There seems to be some formatting issues in my log output. I have
>> attached it as a file.
> Confused, what are you wanting us to do here in the stable tree?
>
> thanks,
>
> greg k-h

Since, this is reproducible on 5.4.y I have added stable. The culprit
commit which upon getting reverted fixes this issue is also present in
5.4.y stable.

Thanks & Regards,
Harshvardhan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29  8:43     ` Harshvardhan Jha
@ 2025-01-29  8:48       ` Greg KH
  2025-01-29  8:59         ` Harshvardhan Jha
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2025-01-29  8:48 UTC (permalink / raw)
  To: Harshvardhan Jha
  Cc: Konrad Wilk, Boris Ostrovsky, jgross@suse.com,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable

On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
> Hi there,
> 
> On 29/01/25 2:05 PM, Greg KH wrote:
> > On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
> >> Hi All,
> >>
> >> +stable
> >>
> >> There seems to be some formatting issues in my log output. I have
> >> attached it as a file.
> > Confused, what are you wanting us to do here in the stable tree?
> >
> > thanks,
> >
> > greg k-h
> 
> Since, this is reproducible on 5.4.y I have added stable. The culprit
> commit which upon getting reverted fixes this issue is also present in
> 5.4.y stable.

What culprit commit?  I see no information here :(

Remember, top-posting is evil...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29  8:48       ` Greg KH
@ 2025-01-29  8:59         ` Harshvardhan Jha
  2025-01-29  9:04           ` Greg KH
  0 siblings, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-29  8:59 UTC (permalink / raw)
  To: Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, jgross@suse.com,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	linux-kernel@vger.kernel.org, Harshit Mogalapalli, stable

Hi Greg,

On 29/01/25 2:18 PM, Greg KH wrote:
> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>> Hi there,
>>
>> On 29/01/25 2:05 PM, Greg KH wrote:
>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>>>> Hi All,
>>>>
>>>> +stable
>>>>
>>>> There seems to be some formatting issues in my log output. I have
>>>> attached it as a file.
>>> Confused, what are you wanting us to do here in the stable tree?
>>>
>>> thanks,
>>>
>>> greg k-h
>> Since, this is reproducible on 5.4.y I have added stable. The culprit
>> commit which upon getting reverted fixes this issue is also present in
>> 5.4.y stable.
> What culprit commit?  I see no information here :(
>
> Remember, top-posting is evil...

My apologies,

The stable tag v5.4.289 seems to fail to boot with the following prompt in an infinite loop:
[   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273 sge_count (-12) is out of range. Range is:  0-256

Reverting the following patch seems to fix the issue:

stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb: add
alignment check for dma buffers

I tried changing swiotlb grub command line arguments but that didn't
seem to help much unfortunately and the error was seen again.

Thanks & Regards,
Harshvardhan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29  8:59         ` Harshvardhan Jha
@ 2025-01-29  9:04           ` Greg KH
  2025-01-29  9:15             ` Harshvardhan Jha
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2025-01-29  9:04 UTC (permalink / raw)
  To: Harshvardhan Jha
  Cc: Konrad Wilk, Boris Ostrovsky, jgross@suse.com,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	linux-kernel@vger.kernel.org, Harshit Mogalapalli, stable

On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
> Hi Greg,
> 
> On 29/01/25 2:18 PM, Greg KH wrote:
> > On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
> >> Hi there,
> >>
> >> On 29/01/25 2:05 PM, Greg KH wrote:
> >>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
> >>>> Hi All,
> >>>>
> >>>> +stable
> >>>>
> >>>> There seems to be some formatting issues in my log output. I have
> >>>> attached it as a file.
> >>> Confused, what are you wanting us to do here in the stable tree?
> >>>
> >>> thanks,
> >>>
> >>> greg k-h
> >> Since, this is reproducible on 5.4.y I have added stable. The culprit
> >> commit which upon getting reverted fixes this issue is also present in
> >> 5.4.y stable.
> > What culprit commit?  I see no information here :(
> >
> > Remember, top-posting is evil...
> 
> My apologies,
> 
> The stable tag v5.4.289 seems to fail to boot with the following prompt in an infinite loop:
> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273 sge_count (-12) is out of range. Range is:  0-256
> 
> Reverting the following patch seems to fix the issue:
> 
> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb: add
> alignment check for dma buffers
> 
> I tried changing swiotlb grub command line arguments but that didn't
> seem to help much unfortunately and the error was seen again.
> 

Ok, can you submit this revert with the information about why it should
not be included in the 5.4.y tree and cc: everyone involved and then we
will be glad to queue it up.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29  9:04           ` Greg KH
@ 2025-01-29  9:15             ` Harshvardhan Jha
  2025-01-29 11:22               ` Juergen Gross
  0 siblings, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-29  9:15 UTC (permalink / raw)
  To: Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, jgross@suse.com,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	linux-kernel@vger.kernel.org, Harshit Mogalapalli, stable


On 29/01/25 2:34 PM, Greg KH wrote:
> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>> Hi Greg,
>>
>> On 29/01/25 2:18 PM, Greg KH wrote:
>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>> Hi there,
>>>>
>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> +stable
>>>>>>
>>>>>> There seems to be some formatting issues in my log output. I have
>>>>>> attached it as a file.
>>>>> Confused, what are you wanting us to do here in the stable tree?
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>> Since, this is reproducible on 5.4.y I have added stable. The culprit
>>>> commit which upon getting reverted fixes this issue is also present in
>>>> 5.4.y stable.
>>> What culprit commit?  I see no information here :(
>>>
>>> Remember, top-posting is evil...
>> My apologies,
>>
>> The stable tag v5.4.289 seems to fail to boot with the following prompt in an infinite loop:
>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273 sge_count (-12) is out of range. Range is:  0-256
>>
>> Reverting the following patch seems to fix the issue:
>>
>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb: add
>> alignment check for dma buffers
>>
>> I tried changing swiotlb grub command line arguments but that didn't
>> seem to help much unfortunately and the error was seen again.
>>
> Ok, can you submit this revert with the information about why it should
> not be included in the 5.4.y tree and cc: everyone involved and then we
> will be glad to queue it up.
>
> thanks,
>
> greg k-h

This might be reproducible on other stable trees and mainline as well so
we will get it fixed there and I will submit the necessary fix to stable
when everything is sorted out on mainline.

Thanks & Regards,
Harshvardhan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29  9:15             ` Harshvardhan Jha
@ 2025-01-29 11:22               ` Juergen Gross
  2025-01-29 18:35                 ` Harshvardhan Jha
  0 siblings, 1 reply; 19+ messages in thread
From: Juergen Gross @ 2025-01-29 11:22 UTC (permalink / raw)
  To: Harshvardhan Jha, Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, sstabellini@kernel.org,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 2624 bytes --]

On 29.01.25 10:15, Harshvardhan Jha wrote:
> 
> On 29/01/25 2:34 PM, Greg KH wrote:
>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>> Hi Greg,
>>>
>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>> Hi there,
>>>>>
>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> +stable
>>>>>>>
>>>>>>> There seems to be some formatting issues in my log output. I have
>>>>>>> attached it as a file.
>>>>>> Confused, what are you wanting us to do here in the stable tree?
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>> Since, this is reproducible on 5.4.y I have added stable. The culprit
>>>>> commit which upon getting reverted fixes this issue is also present in
>>>>> 5.4.y stable.
>>>> What culprit commit?  I see no information here :(
>>>>
>>>> Remember, top-posting is evil...
>>> My apologies,
>>>
>>> The stable tag v5.4.289 seems to fail to boot with the following prompt in an infinite loop:
>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273 sge_count (-12) is out of range. Range is:  0-256
>>>
>>> Reverting the following patch seems to fix the issue:
>>>
>>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb: add
>>> alignment check for dma buffers
>>>
>>> I tried changing swiotlb grub command line arguments but that didn't
>>> seem to help much unfortunately and the error was seen again.
>>>
>> Ok, can you submit this revert with the information about why it should
>> not be included in the 5.4.y tree and cc: everyone involved and then we
>> will be glad to queue it up.
>>
>> thanks,
>>
>> greg k-h
> 
> This might be reproducible on other stable trees and mainline as well so
> we will get it fixed there and I will submit the necessary fix to stable
> when everything is sorted out on mainline.

Right. Just reverting my patch will trade one error with another one (the
one which triggered me to write the patch).

There are two possible ways to fix the issue:

- allow larger DMA buffers in xen/swiotlb (today 2MB are the max. supported
   size, the megaraid_sas driver seems to effectively request 4MB)

- fix the megaraid_sas driver by splitting up the allocated DMA buffer (it is
   requesting 2.3MB, which will be rounded up to 4MB - it is probably not needed
   to be in one chunk, so a split would result in max. 2MB chunk size)

Both variants have their pros and cons, though.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29 11:22               ` Juergen Gross
@ 2025-01-29 18:35                 ` Harshvardhan Jha
  2025-01-29 18:43                   ` Jürgen Groß
  0 siblings, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-29 18:35 UTC (permalink / raw)
  To: Juergen Gross, Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, sstabellini@kernel.org,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


On 29/01/25 4:52 PM, Juergen Gross wrote:
> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>
>> On 29/01/25 2:34 PM, Greg KH wrote:
>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>> Hi Greg,
>>>>
>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>> Hi there,
>>>>>>
>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> +stable
>>>>>>>>
>>>>>>>> There seems to be some formatting issues in my log output. I have
>>>>>>>> attached it as a file.
>>>>>>> Confused, what are you wanting us to do here in the stable tree?
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>> culprit
>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>> present in
>>>>>> 5.4.y stable.
>>>>> What culprit commit?  I see no information here :(
>>>>>
>>>>> Remember, top-posting is evil...
>>>> My apologies,
>>>>
>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>> prompt in an infinite loop:
>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>
>>>> Reverting the following patch seems to fix the issue:
>>>>
>>>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb: add
>>>> alignment check for dma buffers
>>>>
>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>> seem to help much unfortunately and the error was seen again.
>>>>
>>> Ok, can you submit this revert with the information about why it should
>>> not be included in the 5.4.y tree and cc: everyone involved and then we
>>> will be glad to queue it up.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> This might be reproducible on other stable trees and mainline as well so
>> we will get it fixed there and I will submit the necessary fix to stable
>> when everything is sorted out on mainline.
>
> Right. Just reverting my patch will trade one error with another one (the
> one which triggered me to write the patch).
>
> There are two possible ways to fix the issue:
>
> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
> supported
>   size, the megaraid_sas driver seems to effectively request 4MB)

This seems relatively simpler to implement but I'm not sure whether it's
the most optimal approach

>
> - fix the megaraid_sas driver by splitting up the allocated DMA buffer
> (it is
>   requesting 2.3MB, which will be rounded up to 4MB - it is probably
> not needed
>   to be in one chunk, so a split would result in max. 2MB chunk size)
>
> Both variants have their pros and cons, though.
>
>
> Juergen
Harshvardhan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29 18:35                 ` Harshvardhan Jha
@ 2025-01-29 18:43                   ` Jürgen Groß
  2025-01-29 18:46                     ` Harshvardhan Jha
  2025-01-29 22:01                     ` Stefano Stabellini
  0 siblings, 2 replies; 19+ messages in thread
From: Jürgen Groß @ 2025-01-29 18:43 UTC (permalink / raw)
  To: Harshvardhan Jha, Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, sstabellini@kernel.org,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 2979 bytes --]

On 29.01.25 19:35, Harshvardhan Jha wrote:
> 
> On 29/01/25 4:52 PM, Juergen Gross wrote:
>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>
>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>> Hi Greg,
>>>>>
>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>> Hi there,
>>>>>>>
>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> +stable
>>>>>>>>>
>>>>>>>>> There seems to be some formatting issues in my log output. I have
>>>>>>>>> attached it as a file.
>>>>>>>> Confused, what are you wanting us to do here in the stable tree?
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> greg k-h
>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>> culprit
>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>> present in
>>>>>>> 5.4.y stable.
>>>>>> What culprit commit?  I see no information here :(
>>>>>>
>>>>>> Remember, top-posting is evil...
>>>>> My apologies,
>>>>>
>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>> prompt in an infinite loop:
>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>
>>>>> Reverting the following patch seems to fix the issue:
>>>>>
>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb: add
>>>>> alignment check for dma buffers
>>>>>
>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>> seem to help much unfortunately and the error was seen again.
>>>>>
>>>> Ok, can you submit this revert with the information about why it should
>>>> not be included in the 5.4.y tree and cc: everyone involved and then we
>>>> will be glad to queue it up.
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>
>>> This might be reproducible on other stable trees and mainline as well so
>>> we will get it fixed there and I will submit the necessary fix to stable
>>> when everything is sorted out on mainline.
>>
>> Right. Just reverting my patch will trade one error with another one (the
>> one which triggered me to write the patch).
>>
>> There are two possible ways to fix the issue:
>>
>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>> supported
>>    size, the megaraid_sas driver seems to effectively request 4MB)
> 
> This seems relatively simpler to implement but I'm not sure whether it's
> the most optimal approach

Just making the static array larger used to hold the frame numbers for the
buffer seems to be a waste of memory for most configurations.

I'm thinking of an allocated array using the max needed size (replace a
former buffer with a larger one if needed).


Juergen

Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29 18:43                   ` Jürgen Groß
@ 2025-01-29 18:46                     ` Harshvardhan Jha
  2025-01-30 12:35                       ` Jürgen Groß
  2025-01-29 22:01                     ` Stefano Stabellini
  1 sibling, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-29 18:46 UTC (permalink / raw)
  To: Jürgen Groß, Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, sstabellini@kernel.org,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


On 30/01/25 12:13 AM, Jürgen Groß wrote:
> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>
>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>
>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>> Hi there,
>>>>>>>>
>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> +stable
>>>>>>>>>>
>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>> have
>>>>>>>>>> attached it as a file.
>>>>>>>>> Confused, what are you wanting us to do here in the stable tree?
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> greg k-h
>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>> culprit
>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>> present in
>>>>>>>> 5.4.y stable.
>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>
>>>>>>> Remember, top-posting is evil...
>>>>>> My apologies,
>>>>>>
>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>> prompt in an infinite loop:
>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>
>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>
>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a
>>>>>> xen/swiotlb: add
>>>>>> alignment check for dma buffers
>>>>>>
>>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>
>>>>> Ok, can you submit this revert with the information about why it
>>>>> should
>>>>> not be included in the 5.4.y tree and cc: everyone involved and
>>>>> then we
>>>>> will be glad to queue it up.
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> This might be reproducible on other stable trees and mainline as
>>>> well so
>>>> we will get it fixed there and I will submit the necessary fix to
>>>> stable
>>>> when everything is sorted out on mainline.
>>>
>>> Right. Just reverting my patch will trade one error with another one
>>> (the
>>> one which triggered me to write the patch).
>>>
>>> There are two possible ways to fix the issue:
>>>
>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>> supported
>>>    size, the megaraid_sas driver seems to effectively request 4MB)
>>
>> This seems relatively simpler to implement but I'm not sure whether it's
>> the most optimal approach
>
> Just making the static array larger used to hold the frame numbers for
> the
> buffer seems to be a waste of memory for most configurations.
Yep definitely not required in most cases.
>
> I'm thinking of an allocated array using the max needed size (replace a
> former buffer with a larger one if needed).

This seems like the right way to go.

Harshvardhan

>
>
> Juergen
>
> Juergen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29 18:43                   ` Jürgen Groß
  2025-01-29 18:46                     ` Harshvardhan Jha
@ 2025-01-29 22:01                     ` Stefano Stabellini
  2025-01-30  5:27                       ` Harshvardhan Jha
  1 sibling, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2025-01-29 22:01 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Harshvardhan Jha, Greg KH, Konrad Wilk, Boris Ostrovsky,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	linux-kernel@vger.kernel.org, Harshit Mogalapalli, stable

[-- Attachment #1: Type: text/plain, Size: 3625 bytes --]

On Wed, 29 Jan 2025, Jürgen Groß wrote:
> On 29.01.25 19:35, Harshvardhan Jha wrote:
> > 
> > On 29/01/25 4:52 PM, Juergen Gross wrote:
> > > On 29.01.25 10:15, Harshvardhan Jha wrote:
> > > > 
> > > > On 29/01/25 2:34 PM, Greg KH wrote:
> > > > > On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
> > > > > > Hi Greg,
> > > > > > 
> > > > > > On 29/01/25 2:18 PM, Greg KH wrote:
> > > > > > > On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
> > > > > > > > Hi there,
> > > > > > > > 
> > > > > > > > On 29/01/25 2:05 PM, Greg KH wrote:
> > > > > > > > > On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
> > > > > > > > > wrote:
> > > > > > > > > > Hi All,
> > > > > > > > > > 
> > > > > > > > > > +stable
> > > > > > > > > > 
> > > > > > > > > > There seems to be some formatting issues in my log output. I
> > > > > > > > > > have
> > > > > > > > > > attached it as a file.
> > > > > > > > > Confused, what are you wanting us to do here in the stable
> > > > > > > > > tree?
> > > > > > > > > 
> > > > > > > > > thanks,
> > > > > > > > > 
> > > > > > > > > greg k-h
> > > > > > > > Since, this is reproducible on 5.4.y I have added stable. The
> > > > > > > > culprit
> > > > > > > > commit which upon getting reverted fixes this issue is also
> > > > > > > > present in
> > > > > > > > 5.4.y stable.
> > > > > > > What culprit commit?  I see no information here :(
> > > > > > > 
> > > > > > > Remember, top-posting is evil...
> > > > > > My apologies,
> > > > > > 
> > > > > > The stable tag v5.4.289 seems to fail to boot with the following
> > > > > > prompt in an infinite loop:
> > > > > > [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
> > > > > > 3273 sge_count (-12) is out of range. Range is:  0-256
> > > > > > 
> > > > > > Reverting the following patch seems to fix the issue:
> > > > > > 
> > > > > > stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb:
> > > > > > add
> > > > > > alignment check for dma buffers
> > > > > > 
> > > > > > I tried changing swiotlb grub command line arguments but that didn't
> > > > > > seem to help much unfortunately and the error was seen again.
> > > > > > 
> > > > > Ok, can you submit this revert with the information about why it
> > > > > should
> > > > > not be included in the 5.4.y tree and cc: everyone involved and then
> > > > > we
> > > > > will be glad to queue it up.
> > > > > 
> > > > > thanks,
> > > > > 
> > > > > greg k-h
> > > > 
> > > > This might be reproducible on other stable trees and mainline as well so
> > > > we will get it fixed there and I will submit the necessary fix to stable
> > > > when everything is sorted out on mainline.
> > > 
> > > Right. Just reverting my patch will trade one error with another one (the
> > > one which triggered me to write the patch).
> > > 
> > > There are two possible ways to fix the issue:
> > > 
> > > - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
> > > supported
> > >    size, the megaraid_sas driver seems to effectively request 4MB)
> > 
> > This seems relatively simpler to implement but I'm not sure whether it's
> > the most optimal approach
> 
> Just making the static array larger used to hold the frame numbers for the
> buffer seems to be a waste of memory for most configurations.
> 
> I'm thinking of an allocated array using the max needed size (replace a
> former buffer with a larger one if needed).

You are referring to discontig_frames and MAX_CONTIG_ORDER in
arch/x86/xen/mmu_pv.c, right? I am not super familiar with that code but
it looks like a good way to go.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29 22:01                     ` Stefano Stabellini
@ 2025-01-30  5:27                       ` Harshvardhan Jha
  2025-01-30  6:59                         ` Jürgen Groß
  0 siblings, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-30  5:27 UTC (permalink / raw)
  To: Stefano Stabellini, Jürgen Groß
  Cc: Greg KH, Konrad Wilk, Boris Ostrovsky,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


On 30/01/25 3:31 AM, Stefano Stabellini wrote:
> On Wed, 29 Jan 2025, Jürgen Groß wrote:
>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> +stable
>>>>>>>>>>>
>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>> have
>>>>>>>>>>> attached it as a file.
>>>>>>>>>> Confused, what are you wanting us to do here in the stable
>>>>>>>>>> tree?
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> greg k-h
>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>> culprit
>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>> present in
>>>>>>>>> 5.4.y stable.
>>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>>
>>>>>>>> Remember, top-posting is evil...
>>>>>>> My apologies,
>>>>>>>
>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>> prompt in an infinite loop:
>>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>>
>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>
>>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb:
>>>>>>> add
>>>>>>> alignment check for dma buffers
>>>>>>>
>>>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>
>>>>>> Ok, can you submit this revert with the information about why it
>>>>>> should
>>>>>> not be included in the 5.4.y tree and cc: everyone involved and then
>>>>>> we
>>>>>> will be glad to queue it up.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>> This might be reproducible on other stable trees and mainline as well so
>>>>> we will get it fixed there and I will submit the necessary fix to stable
>>>>> when everything is sorted out on mainline.
>>>> Right. Just reverting my patch will trade one error with another one (the
>>>> one which triggered me to write the patch).
>>>>
>>>> There are two possible ways to fix the issue:
>>>>
>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>> supported
>>>>    size, the megaraid_sas driver seems to effectively request 4MB)
>>> This seems relatively simpler to implement but I'm not sure whether it's
>>> the most optimal approach
>> Just making the static array larger used to hold the frame numbers for the
>> buffer seems to be a waste of memory for most configurations.
>>
>> I'm thinking of an allocated array using the max needed size (replace a
>> former buffer with a larger one if needed).
> You are referring to discontig_frames and MAX_CONTIG_ORDER in
> arch/x86/xen/mmu_pv.c, right? I am not super familiar with that code but
> it looks like a good way to go.

This rejected patch works on MAX_CONTIG_ORDER and doubles the buffer
size but that is undesirable in most situations:

https://lore.kernel.org/lkml/28947d4f-ab32-4a57-8dbb-e37fa4183a69@suse.com/t/

What needs to be done is the buffer size will only be doubled when needed.


Harshvardhan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-30  5:27                       ` Harshvardhan Jha
@ 2025-01-30  6:59                         ` Jürgen Groß
  0 siblings, 0 replies; 19+ messages in thread
From: Jürgen Groß @ 2025-01-30  6:59 UTC (permalink / raw)
  To: Harshvardhan Jha, Stefano Stabellini
  Cc: Greg KH, Konrad Wilk, Boris Ostrovsky,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 3875 bytes --]

On 30.01.25 06:27, Harshvardhan Jha wrote:
> 
> On 30/01/25 3:31 AM, Stefano Stabellini wrote:
>> On Wed, 29 Jan 2025, Jürgen Groß wrote:
>>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>>> Hi Greg,
>>>>>>>>
>>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> +stable
>>>>>>>>>>>>
>>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>>> have
>>>>>>>>>>>> attached it as a file.
>>>>>>>>>>> Confused, what are you wanting us to do here in the stable
>>>>>>>>>>> tree?
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>>
>>>>>>>>>>> greg k-h
>>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>>> culprit
>>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>>> present in
>>>>>>>>>> 5.4.y stable.
>>>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>>>
>>>>>>>>> Remember, top-posting is evil...
>>>>>>>> My apologies,
>>>>>>>>
>>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>>> prompt in an infinite loop:
>>>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>>>
>>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>>
>>>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb:
>>>>>>>> add
>>>>>>>> alignment check for dma buffers
>>>>>>>>
>>>>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>>
>>>>>>> Ok, can you submit this revert with the information about why it
>>>>>>> should
>>>>>>> not be included in the 5.4.y tree and cc: everyone involved and then
>>>>>>> we
>>>>>>> will be glad to queue it up.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>> This might be reproducible on other stable trees and mainline as well so
>>>>>> we will get it fixed there and I will submit the necessary fix to stable
>>>>>> when everything is sorted out on mainline.
>>>>> Right. Just reverting my patch will trade one error with another one (the
>>>>> one which triggered me to write the patch).
>>>>>
>>>>> There are two possible ways to fix the issue:
>>>>>
>>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>>> supported
>>>>>     size, the megaraid_sas driver seems to effectively request 4MB)
>>>> This seems relatively simpler to implement but I'm not sure whether it's
>>>> the most optimal approach
>>> Just making the static array larger used to hold the frame numbers for the
>>> buffer seems to be a waste of memory for most configurations.
>>>
>>> I'm thinking of an allocated array using the max needed size (replace a
>>> former buffer with a larger one if needed).
>> You are referring to discontig_frames and MAX_CONTIG_ORDER in
>> arch/x86/xen/mmu_pv.c, right? I am not super familiar with that code but
>> it looks like a good way to go.
> 
> This rejected patch works on MAX_CONTIG_ORDER and doubles the buffer
> size but that is undesirable in most situations:
> 
> https://lore.kernel.org/lkml/28947d4f-ab32-4a57-8dbb-e37fa4183a69@suse.com/t/
> 
> What needs to be done is the buffer size will only be doubled when needed.

I'll write a patch.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-29 18:46                     ` Harshvardhan Jha
@ 2025-01-30 12:35                       ` Jürgen Groß
  2025-01-30 20:28                         ` Stefano Stabellini
  2025-01-31 12:05                         ` Harshvardhan Jha
  0 siblings, 2 replies; 19+ messages in thread
From: Jürgen Groß @ 2025-01-30 12:35 UTC (permalink / raw)
  To: Harshvardhan Jha, Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, sstabellini@kernel.org,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 3470 bytes --]

On 29.01.25 19:46, Harshvardhan Jha wrote:
> 
> On 30/01/25 12:13 AM, Jürgen Groß wrote:
>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>>
>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>>
>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> +stable
>>>>>>>>>>>
>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>> have
>>>>>>>>>>> attached it as a file.
>>>>>>>>>> Confused, what are you wanting us to do here in the stable tree?
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> greg k-h
>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>> culprit
>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>> present in
>>>>>>>>> 5.4.y stable.
>>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>>
>>>>>>>> Remember, top-posting is evil...
>>>>>>> My apologies,
>>>>>>>
>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>> prompt in an infinite loop:
>>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>>
>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>
>>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a
>>>>>>> xen/swiotlb: add
>>>>>>> alignment check for dma buffers
>>>>>>>
>>>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>
>>>>>> Ok, can you submit this revert with the information about why it
>>>>>> should
>>>>>> not be included in the 5.4.y tree and cc: everyone involved and
>>>>>> then we
>>>>>> will be glad to queue it up.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>>
>>>>> This might be reproducible on other stable trees and mainline as
>>>>> well so
>>>>> we will get it fixed there and I will submit the necessary fix to
>>>>> stable
>>>>> when everything is sorted out on mainline.
>>>>
>>>> Right. Just reverting my patch will trade one error with another one
>>>> (the
>>>> one which triggered me to write the patch).
>>>>
>>>> There are two possible ways to fix the issue:
>>>>
>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>> supported
>>>>     size, the megaraid_sas driver seems to effectively request 4MB)
>>>
>>> This seems relatively simpler to implement but I'm not sure whether it's
>>> the most optimal approach
>>
>> Just making the static array larger used to hold the frame numbers for
>> the
>> buffer seems to be a waste of memory for most configurations.
> Yep definitely not required in most cases.
>>
>> I'm thinking of an allocated array using the max needed size (replace a
>> former buffer with a larger one if needed).
> 
> This seems like the right way to go.

Can you try the attached patch, please? I don't have a system at hand
showing the problem.


Juergen

[-- Attachment #1.1.2: 0001-x86-xen-allow-larger-contiguous-memory-regions-in-PV.patch --]
[-- Type: text/x-patch, Size: 4191 bytes --]

From cff43e997f79a95dc44e02debaeafe5f127f40bb Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 30 Jan 2025 09:56:57 +0100
Subject: [PATCH] x86/xen: allow larger contiguous memory regions in PV guests

Today a PV guest (including dom0) can create 2MB contiguous memory
regions for DMA buffers at max. This has led to problems at least
with the megaraid_sas driver, which wants to allocate a 2.3MB DMA
buffer.

The limiting factor is the frame array used to do the hypercall for
making the memory contiguous, which has 512 entries and is just a
static array in mmu_pv.c.

In case a contiguous memory area larger than the initially supported
2MB is requested, allocate a larger buffer for the frame list. Note
that such an allocation is tried only after memory management has been
initialized properly, which is tested via the early_boot_irqs_disabled
flag.

Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
Note that the "Fixes:" tag is not really correct, as that patch didn't
introduce the problem, but rather made it visible. OTOH it is the best
indicator we have to identify kernel versions this patch should be
backported to.
---
 arch/x86/xen/mmu_pv.c | 44 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 55a4996d0c04..62aec29b8174 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2200,8 +2200,10 @@ void __init xen_init_mmu_ops(void)
 }
 
 /* Protected by xen_reservation_lock. */
-#define MAX_CONTIG_ORDER 9 /* 2MB */
-static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+#define MIN_CONTIG_ORDER 9 /* 2MB */
+static unsigned int discontig_frames_order = MIN_CONTIG_ORDER;
+static unsigned long discontig_frames_early[1UL << MIN_CONTIG_ORDER];
+static unsigned long *discontig_frames = discontig_frames_early;
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))
 static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
@@ -2319,18 +2321,44 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
 				 unsigned int address_bits,
 				 dma_addr_t *dma_handle)
 {
-	unsigned long *in_frames = discontig_frames, out_frame;
+	unsigned long *in_frames, out_frame;
+	unsigned long *new_array, *old_array;
 	unsigned long  flags;
 	int            success;
 	unsigned long vstart = (unsigned long)phys_to_virt(pstart);
 
-	if (unlikely(order > MAX_CONTIG_ORDER))
-		return -ENOMEM;
+	if (unlikely(order > discontig_frames_order)) {
+		if (early_boot_irqs_disabled)
+			return -ENOMEM;
+
+		new_array = vmalloc(sizeof(unsigned long) * (1UL << order));
+
+		if (!new_array)
+			return -ENOMEM;
+
+		spin_lock_irqsave(&xen_reservation_lock, flags);
+
+		if (order > discontig_frames_order) {
+			if (discontig_frames == discontig_frames_early)
+				old_array = NULL;
+			else
+				old_array = discontig_frames;
+			discontig_frames = new_array;
+			discontig_frames_order = order;
+		} else
+			old_array = new_array;
+
+		spin_unlock_irqrestore(&xen_reservation_lock, flags);
+
+		vfree(old_array);
+	}
 
 	memset((void *) vstart, 0, PAGE_SIZE << order);
 
 	spin_lock_irqsave(&xen_reservation_lock, flags);
 
+	in_frames = discontig_frames;
+
 	/* 1. Zap current PTEs, remembering MFNs. */
 	xen_zap_pfn_range(vstart, order, in_frames, NULL);
 
@@ -2354,12 +2382,12 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
 
 void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 {
-	unsigned long *out_frames = discontig_frames, in_frame;
+	unsigned long *out_frames, in_frame;
 	unsigned long  flags;
 	int success;
 	unsigned long vstart;
 
-	if (unlikely(order > MAX_CONTIG_ORDER))
+	if (unlikely(order > discontig_frames_order))
 		return;
 
 	vstart = (unsigned long)phys_to_virt(pstart);
@@ -2367,6 +2395,8 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 
 	spin_lock_irqsave(&xen_reservation_lock, flags);
 
+	out_frames = discontig_frames;
+
 	/* 1. Find start MFN of contiguous extent. */
 	in_frame = virt_to_mfn((void *)vstart);
 
-- 
2.43.0


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-30 12:35                       ` Jürgen Groß
@ 2025-01-30 20:28                         ` Stefano Stabellini
  2025-01-31  6:38                           ` Jürgen Groß
  2025-01-31 12:05                         ` Harshvardhan Jha
  1 sibling, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2025-01-30 20:28 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Harshvardhan Jha, Greg KH, Konrad Wilk, Boris Ostrovsky,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	linux-kernel@vger.kernel.org, Harshit Mogalapalli, stable

[-- Attachment #1: Type: text/plain, Size: 4668 bytes --]

On Thu, 30 Jan 2025, Jürgen Groß wrote:
> Can you try the attached patch, please? I don't have a system at hand
> showing the problem.
>
> From cff43e997f79a95dc44e02debaeafe5f127f40bb Mon Sep 17 00:00:00 2001
> From: Juergen Gross <jgross@suse.com>
> Date: Thu, 30 Jan 2025 09:56:57 +0100
> Subject: [PATCH] x86/xen: allow larger contiguous memory regions in PV guests
> 
> Today a PV guest (including dom0) can create 2MB contiguous memory
> regions for DMA buffers at max. This has led to problems at least
> with the megaraid_sas driver, which wants to allocate a 2.3MB DMA
> buffer.
> 
> The limiting factor is the frame array used to do the hypercall for
> making the memory contiguous, which has 512 entries and is just a
> static array in mmu_pv.c.
> 
> In case a contiguous memory area larger than the initially supported
> 2MB is requested, allocate a larger buffer for the frame list. Note
> that such an allocation is tried only after memory management has been
> initialized properly, which is tested via the early_boot_irqs_disabled
> flag.
> 
> Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> Note that the "Fixes:" tag is not really correct, as that patch didn't
> introduce the problem, but rather made it visible. OTOH it is the best
> indicator we have to identify kernel versions this patch should be
> backported to.
> ---
>  arch/x86/xen/mmu_pv.c | 44 ++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 37 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index 55a4996d0c04..62aec29b8174 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -2200,8 +2200,10 @@ void __init xen_init_mmu_ops(void)
>  }
>  
>  /* Protected by xen_reservation_lock. */
> -#define MAX_CONTIG_ORDER 9 /* 2MB */
> -static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
> +#define MIN_CONTIG_ORDER 9 /* 2MB */
> +static unsigned int discontig_frames_order = MIN_CONTIG_ORDER;
> +static unsigned long discontig_frames_early[1UL << MIN_CONTIG_ORDER];
> +static unsigned long *discontig_frames = discontig_frames_early;
>  
>  #define VOID_PTE (mfn_pte(0, __pgprot(0)))
>  static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
> @@ -2319,18 +2321,44 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
>  				 unsigned int address_bits,
>  				 dma_addr_t *dma_handle)
>  {
> -	unsigned long *in_frames = discontig_frames, out_frame;
> +	unsigned long *in_frames, out_frame;
> +	unsigned long *new_array, *old_array;
>  	unsigned long  flags;
>  	int            success;
>  	unsigned long vstart = (unsigned long)phys_to_virt(pstart);
>  
> -	if (unlikely(order > MAX_CONTIG_ORDER))
> -		return -ENOMEM;
> +	if (unlikely(order > discontig_frames_order)) {
> +		if (early_boot_irqs_disabled)
> +			return -ENOMEM;
> +
> +		new_array = vmalloc(sizeof(unsigned long) * (1UL << order));
> +
> +		if (!new_array)
> +			return -ENOMEM;
> +
> +		spin_lock_irqsave(&xen_reservation_lock, flags);
> +
> +		if (order > discontig_frames_order) {


This second if check should not be needed because it is the same as the
outer if check.



> +			if (discontig_frames == discontig_frames_early)
> +				old_array = NULL;
> +			else
> +				old_array = discontig_frames;
> +			discontig_frames = new_array;
> +			discontig_frames_order = order;
> +		} else
> +			old_array = new_array;
> +
> +		spin_unlock_irqrestore(&xen_reservation_lock, flags);
> +
> +		vfree(old_array);
> +	}
>  
>  	memset((void *) vstart, 0, PAGE_SIZE << order);
>  
>  	spin_lock_irqsave(&xen_reservation_lock, flags);
>  
> +	in_frames = discontig_frames;
> +
>  	/* 1. Zap current PTEs, remembering MFNs. */
>  	xen_zap_pfn_range(vstart, order, in_frames, NULL);
>  
> @@ -2354,12 +2382,12 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
>  
>  void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
>  {
> -	unsigned long *out_frames = discontig_frames, in_frame;
> +	unsigned long *out_frames, in_frame;
>  	unsigned long  flags;
>  	int success;
>  	unsigned long vstart;
>  
> -	if (unlikely(order > MAX_CONTIG_ORDER))
> +	if (unlikely(order > discontig_frames_order))
>  		return;
>  
>  	vstart = (unsigned long)phys_to_virt(pstart);
> @@ -2367,6 +2395,8 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
>  
>  	spin_lock_irqsave(&xen_reservation_lock, flags);
>  
> +	out_frames = discontig_frames;
> +
>  	/* 1. Find start MFN of contiguous extent. */
>  	in_frame = virt_to_mfn((void *)vstart);
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-30 20:28                         ` Stefano Stabellini
@ 2025-01-31  6:38                           ` Jürgen Groß
  0 siblings, 0 replies; 19+ messages in thread
From: Jürgen Groß @ 2025-01-31  6:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Harshvardhan Jha, Greg KH, Konrad Wilk, Boris Ostrovsky,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 3617 bytes --]

On 30.01.25 21:28, Stefano Stabellini wrote:
> On Thu, 30 Jan 2025, Jürgen Groß wrote:
>> Can you try the attached patch, please? I don't have a system at hand
>> showing the problem.
>>
>>  From cff43e997f79a95dc44e02debaeafe5f127f40bb Mon Sep 17 00:00:00 2001
>> From: Juergen Gross <jgross@suse.com>
>> Date: Thu, 30 Jan 2025 09:56:57 +0100
>> Subject: [PATCH] x86/xen: allow larger contiguous memory regions in PV guests
>>
>> Today a PV guest (including dom0) can create 2MB contiguous memory
>> regions for DMA buffers at max. This has led to problems at least
>> with the megaraid_sas driver, which wants to allocate a 2.3MB DMA
>> buffer.
>>
>> The limiting factor is the frame array used to do the hypercall for
>> making the memory contiguous, which has 512 entries and is just a
>> static array in mmu_pv.c.
>>
>> In case a contiguous memory area larger than the initially supported
>> 2MB is requested, allocate a larger buffer for the frame list. Note
>> that such an allocation is tried only after memory management has been
>> initialized properly, which is tested via the early_boot_irqs_disabled
>> flag.
>>
>> Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> Note that the "Fixes:" tag is not really correct, as that patch didn't
>> introduce the problem, but rather made it visible. OTOH it is the best
>> indicator we have to identify kernel versions this patch should be
>> backported to.
>> ---
>>   arch/x86/xen/mmu_pv.c | 44 ++++++++++++++++++++++++++++++++++++-------
>>   1 file changed, 37 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
>> index 55a4996d0c04..62aec29b8174 100644
>> --- a/arch/x86/xen/mmu_pv.c
>> +++ b/arch/x86/xen/mmu_pv.c
>> @@ -2200,8 +2200,10 @@ void __init xen_init_mmu_ops(void)
>>   }
>>   
>>   /* Protected by xen_reservation_lock. */
>> -#define MAX_CONTIG_ORDER 9 /* 2MB */
>> -static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
>> +#define MIN_CONTIG_ORDER 9 /* 2MB */
>> +static unsigned int discontig_frames_order = MIN_CONTIG_ORDER;
>> +static unsigned long discontig_frames_early[1UL << MIN_CONTIG_ORDER];
>> +static unsigned long *discontig_frames = discontig_frames_early;
>>   
>>   #define VOID_PTE (mfn_pte(0, __pgprot(0)))
>>   static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
>> @@ -2319,18 +2321,44 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
>>   				 unsigned int address_bits,
>>   				 dma_addr_t *dma_handle)
>>   {
>> -	unsigned long *in_frames = discontig_frames, out_frame;
>> +	unsigned long *in_frames, out_frame;
>> +	unsigned long *new_array, *old_array;
>>   	unsigned long  flags;
>>   	int            success;
>>   	unsigned long vstart = (unsigned long)phys_to_virt(pstart);
>>   
>> -	if (unlikely(order > MAX_CONTIG_ORDER))
>> -		return -ENOMEM;
>> +	if (unlikely(order > discontig_frames_order)) {
>> +		if (early_boot_irqs_disabled)
>> +			return -ENOMEM;
>> +
>> +		new_array = vmalloc(sizeof(unsigned long) * (1UL << order));
>> +
>> +		if (!new_array)
>> +			return -ENOMEM;
>> +
>> +		spin_lock_irqsave(&xen_reservation_lock, flags);
>> +
>> +		if (order > discontig_frames_order) {
> 
> 
> This second if check should not be needed because it is the same as the
> outer if check.

It is needed, as inside the locked region I need to verify that no
concurrent call did already update the buffer, maybe with an even
larger size.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-30 12:35                       ` Jürgen Groß
  2025-01-30 20:28                         ` Stefano Stabellini
@ 2025-01-31 12:05                         ` Harshvardhan Jha
  2025-02-04 11:20                           ` Harshvardhan Jha
  1 sibling, 1 reply; 19+ messages in thread
From: Harshvardhan Jha @ 2025-01-31 12:05 UTC (permalink / raw)
  To: Jürgen Groß, Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, sstabellini@kernel.org,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable


On 30/01/25 6:05 PM, Jürgen Groß wrote:
> On 29.01.25 19:46, Harshvardhan Jha wrote:
>>
>> On 30/01/25 12:13 AM, Jürgen Groß wrote:
>>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>>>
>>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>>>
>>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>>> Hi Greg,
>>>>>>>>
>>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> +stable
>>>>>>>>>>>>
>>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>>> have
>>>>>>>>>>>> attached it as a file.
>>>>>>>>>>> Confused, what are you wanting us to do here in the stable
>>>>>>>>>>> tree?
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>>
>>>>>>>>>>> greg k-h
>>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>>> culprit
>>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>>> present in
>>>>>>>>>> 5.4.y stable.
>>>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>>>
>>>>>>>>> Remember, top-posting is evil...
>>>>>>>> My apologies,
>>>>>>>>
>>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>>> prompt in an infinite loop:
>>>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>>>
>>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>>
>>>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a
>>>>>>>> xen/swiotlb: add
>>>>>>>> alignment check for dma buffers
>>>>>>>>
>>>>>>>> I tried changing swiotlb grub command line arguments but that
>>>>>>>> didn't
>>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>>
>>>>>>> Ok, can you submit this revert with the information about why it
>>>>>>> should
>>>>>>> not be included in the 5.4.y tree and cc: everyone involved and
>>>>>>> then we
>>>>>>> will be glad to queue it up.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>>
>>>>>> This might be reproducible on other stable trees and mainline as
>>>>>> well so
>>>>>> we will get it fixed there and I will submit the necessary fix to
>>>>>> stable
>>>>>> when everything is sorted out on mainline.
>>>>>
>>>>> Right. Just reverting my patch will trade one error with another one
>>>>> (the
>>>>> one which triggered me to write the patch).
>>>>>
>>>>> There are two possible ways to fix the issue:
>>>>>
>>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>>> supported
>>>>>     size, the megaraid_sas driver seems to effectively request 4MB)
>>>>
>>>> This seems relatively simpler to implement but I'm not sure whether
>>>> it's
>>>> the most optimal approach
>>>
>>> Just making the static array larger used to hold the frame numbers for
>>> the
>>> buffer seems to be a waste of memory for most configurations.
>> Yep definitely not required in most cases.
>>>
>>> I'm thinking of an allocated array using the max needed size (replace a
>>> former buffer with a larger one if needed).
>>
>> This seems like the right way to go.
>
> Can you try the attached patch, please? I don't have a system at hand
> showing the problem.
I tried this and got this error in an infinite loop again:
[   25.827922] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273
sge_count (-12) is out of range. Range is:  0-256
[   25.828447] megaraid_sas 0000:65:00.0: Error building command
>
>
> Juergen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range
  2025-01-31 12:05                         ` Harshvardhan Jha
@ 2025-02-04 11:20                           ` Harshvardhan Jha
  0 siblings, 0 replies; 19+ messages in thread
From: Harshvardhan Jha @ 2025-02-04 11:20 UTC (permalink / raw)
  To: Jürgen Groß, Greg KH
  Cc: Konrad Wilk, Boris Ostrovsky, sstabellini@kernel.org,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	Harshit Mogalapalli, stable

Hi there,

On 31/01/25 5:35 PM, Harshvardhan Jha wrote:
> On 30/01/25 6:05 PM, Jürgen Groß wrote:
>> On 29.01.25 19:46, Harshvardhan Jha wrote:
>>> On 30/01/25 12:13 AM, Jürgen Groß wrote:
>>>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>>>> Hi Greg,
>>>>>>>>>
>>>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>>>> Hi there,
>>>>>>>>>>>
>>>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> +stable
>>>>>>>>>>>>>
>>>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>>>> have
>>>>>>>>>>>>> attached it as a file.
>>>>>>>>>>>> Confused, what are you wanting us to do here in the stable
>>>>>>>>>>>> tree?
>>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> greg k-h
>>>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>>>> culprit
>>>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>>>> present in
>>>>>>>>>>> 5.4.y stable.
>>>>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>>>>
>>>>>>>>>> Remember, top-posting is evil...
>>>>>>>>> My apologies,
>>>>>>>>>
>>>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>>>> prompt in an infinite loop:
>>>>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>>>>
>>>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>>>
>>>>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a
>>>>>>>>> xen/swiotlb: add
>>>>>>>>> alignment check for dma buffers
>>>>>>>>>
>>>>>>>>> I tried changing swiotlb grub command line arguments but that
>>>>>>>>> didn't
>>>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>>>
>>>>>>>> Ok, can you submit this revert with the information about why it
>>>>>>>> should
>>>>>>>> not be included in the 5.4.y tree and cc: everyone involved and
>>>>>>>> then we
>>>>>>>> will be glad to queue it up.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> greg k-h
>>>>>>> This might be reproducible on other stable trees and mainline as
>>>>>>> well so
>>>>>>> we will get it fixed there and I will submit the necessary fix to
>>>>>>> stable
>>>>>>> when everything is sorted out on mainline.
>>>>>> Right. Just reverting my patch will trade one error with another one
>>>>>> (the
>>>>>> one which triggered me to write the patch).
>>>>>>
>>>>>> There are two possible ways to fix the issue:
>>>>>>
>>>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>>>> supported
>>>>>>     size, the megaraid_sas driver seems to effectively request 4MB)
>>>>> This seems relatively simpler to implement but I'm not sure whether
>>>>> it's
>>>>> the most optimal approach
>>>> Just making the static array larger used to hold the frame numbers for
>>>> the
>>>> buffer seems to be a waste of memory for most configurations.
>>> Yep definitely not required in most cases.
>>>> I'm thinking of an allocated array using the max needed size (replace a
>>>> former buffer with a larger one if needed).
>>> This seems like the right way to go.
>> Can you try the attached patch, please? I don't have a system at hand
>> showing the problem.
> I tried this and got this error in an infinite loop again:
> [   25.827922] megaraid_sas 0000:65:00.0: megasas_build_io_fusion 3273
> sge_count (-12) is out of range. Range is:  0-256
> [   25.828447] megaraid_sas 0000:65:00.0: Error building command


Would this require a change in the megasas driver also as simply
changing xen code isn't fixing the issue?

Harshvardhan


>>
>> Juergen

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-02-04 11:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <7dc143fa-4a48-440b-b624-ac57a361ac74@oracle.com>
2025-01-29  8:33 ` v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range Harshvardhan Jha
2025-01-29  8:35   ` Greg KH
2025-01-29  8:43     ` Harshvardhan Jha
2025-01-29  8:48       ` Greg KH
2025-01-29  8:59         ` Harshvardhan Jha
2025-01-29  9:04           ` Greg KH
2025-01-29  9:15             ` Harshvardhan Jha
2025-01-29 11:22               ` Juergen Gross
2025-01-29 18:35                 ` Harshvardhan Jha
2025-01-29 18:43                   ` Jürgen Groß
2025-01-29 18:46                     ` Harshvardhan Jha
2025-01-30 12:35                       ` Jürgen Groß
2025-01-30 20:28                         ` Stefano Stabellini
2025-01-31  6:38                           ` Jürgen Groß
2025-01-31 12:05                         ` Harshvardhan Jha
2025-02-04 11:20                           ` Harshvardhan Jha
2025-01-29 22:01                     ` Stefano Stabellini
2025-01-30  5:27                       ` Harshvardhan Jha
2025-01-30  6:59                         ` Jürgen Groß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox