* Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts [not found] <CA+eeCSPUDpUg76ZO8dszSbAGn+UHjcyv8F1J-CUPVARAzEtW9w@mail.gmail.com> @ 2024-03-14 21:46 ` Conor Dooley 2024-03-15 3:33 ` Eva Kurchatova 2024-03-17 21:27 ` Nam Cao 1 sibling, 1 reply; 7+ messages in thread From: Conor Dooley @ 2024-03-14 21:46 UTC (permalink / raw) To: Eva Kurchatova Cc: linux-riscv, bugs, linux-i2c, Thomas Gleixner, linux-kernel, Samuel Holland [-- Attachment #1: Type: text/plain, Size: 5183 bytes --] Hey, I'm not really all that familar with the plic driver itself, so adding Samuel and Thomas who will (hopefully) understand this better than me. On Thu, Mar 14, 2024 at 09:12:40AM +0200, Eva Kurchatova wrote: > If an I2C-HID controller level-triggered IRQ line is routed directly as > a PLIC IRQ, and we spam input early enough in kernel boot process > (Somewhere between initializing NET, ALSA subsystems and before > i2c-hid driver init), then there is a chance of kernel locking up > completely and not going any further. > > There are no kernel messages printed with all the IRQ, task hang > debugging enabled - other than (sometimes) it reports sched RT > throttling after a few seconds. Basic timer interrupt handling is > intact - fbdev tty cursor is still blinking. > > It appears that in such a case the I2C-HID IRQ line is raised; PLIC > notifies the (single) boot system hart, kernel claims the IRQ and > immediately completes it by writing to CLAIM/COMPLETE register. > No access to the I2C controller (OpenCores) or I2C-HID registers > is made, This immediately seemed odd to me, but I have no reason to disbelieve you, given you say this was discovered in RVVM which is an emulator and you should know whether or not registers are accessed. The very first action taken by the ocores i2c controller driver when it gets an interrupt though is to read a register: u8 stat = oc_getreg(i2c, OCI2C_STATUS); I would expect that this handler would be called, and therefore you'd see the register read, had the probe function of that driver run to completion. I'd also expect that the interrupt would not even be unmasked if that probe function had failed. In your case though, you can see that the interrupt is not masked, since it is being raised and handled repeatedly by the PLIC driver. Has the i2c controller driver probed in the period of boot that you say this problem manifests? > so the HID report is never consumed and IRQ line stays > raised forever. The kernel endlessly claims & completes IRQs > without doing any work with the device. It doesn't always end up this > way; sometimes boot process completes and there are no signs of > interrupt storm or stuck IRQ processing afterwards. > > There was a suspicion this has to do with SiFive PLIC being > not-so-explicit about level triggered interrupts. The research of this > issue led this way: There is another DT PLIC binding; a THead one, > and it has a flag `PLIC_QUIRK_EDGE_INTERRUPT` which allows > to define IRQ source behavior as 2-cells in DT; and has some other > changes to the logic (more on that below). > When attempting to mimic a THead PLIC in kernel DT, and rewriting > all DT interrupt sources to use 2-cell description, the hang ceases to > happen. Curious as to what are the kernel side implications of this, > I went to see what `PLIC_QUIRK_EDGE_INTERRUPT` actually does and > bit-by-bit disabled the actual differences this flag makes in the > driver logic. > > This return path in irq-sifive-plic.c@223 > (https://elixir.bootlin.com/linux/latest/source/drivers/irqchip/irq-sifive-plic.c#L223) > is only enabled for SiFive PLIC, but not for THead one. Removing > those 2 lines of code from the driver (whilst keeping the DT binding > properly reporting a SiFive PLIC) fixes the hang. I am not an expert > on the PLIC driver to debug further or determine what would be a > proper fix to this, but this probably gets more experienced devs > somewhere (I hope). I'm not really familiar with this code either, but just checking what the affect of your changes are, AFAICT it just sets the handler to be handle_fasteoi_irq(), I noticed that that function has some code that will mask the irq if there's no handler registered for that particular interrupt: https://elixir.bootlin.com/linux/latest/source/kernel/irq/chip.c#L710 It seems like in your case there might not be one registered (as the i2c controller's interrupt handler never performs it's first access), so I'm wondering if that masking of the interrupt when no action is registered is what solves the problem for you. That's mostly just speculation though, because I am not an expert on the PLIC driver either. > This is reproducible at least from Linux 6.4.1 to Linux 6.7.9 on RVVM; I clearly cannot make any definitive statements because I'm just speculating here after all based on this mail, as there's no logs and I have not tried to reproduce this, but this does seem like the interrupt is unmasked before the i2c controller driver has even requested it. Ordinarily (at least on the hardware I have done any testing of interrupts on) the interrupts are masked by default and only get unmasked when there's a user for it in the kernel. Are interrupts unmasked by default on RVVM? > Affects any hardware that would have SiFive PLIC + I2C-HID combination; Have you checked that this actually affects any actual hardware? Thanks, Conor. > Most likely this is reproducible on QEMU as well if it had i2c-hid emulation, > or if we passthrough physical I2C-HID device & inject PLIC IRQs from > it's IRQ line. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts 2024-03-14 21:46 ` Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts Conor Dooley @ 2024-03-15 3:33 ` Eva Kurchatova 2024-03-15 12:45 ` Conor Dooley 0 siblings, 1 reply; 7+ messages in thread From: Eva Kurchatova @ 2024-03-15 3:33 UTC (permalink / raw) Cc: linux-riscv, bugs, linux-i2c, linux-kernel On Thu, Mar 14, 2024 at 11:46 PM Conor Dooley <conor@kernel.org> wrote: > This immediately seemed odd to me, but I have no reason to disbelieve > you, given you say this was discovered in RVVM which is an emulator and > you should know whether or not registers are accessed. > The very first action taken by the ocores i2c controller driver when it > gets an interrupt though is to read a register: > > u8 stat = oc_getreg(i2c, OCI2C_STATUS); > > I would expect that this handler would be called, and therefore you'd > see the register read, had the probe function of that driver run to > completion. I'd also expect that the interrupt would not even be > unmasked if that probe function had failed. > In your case though, you can see that the interrupt is not masked, > since it is being raised and handled repeatedly by the PLIC driver. > Has the i2c controller driver probed in the period of boot that you say > this problem manifests? There is not a single problem with the ocores I2C driver. I2C-HID device itself has a separate IRQ line which is level-triggered. This is to signal the host that there is input available without polling, since I2C is a master-driven bus with no "data available" notifications. So in reality the I2C-HID driver should handle the interrupt; Then it uses the I2C controller to access I2C-HID slave registers (via I2C) to read the incoming HID input report. I2C controller interrupts are unrelated; it's the link between the HID device and the host and it doesn't seem to be touched at all inside the I2C-HID IRQ handler (So it's just a pair of Claim/Complete actions). I2C ocores interrupts are not generated (and shouldn't) at that point, because no I2C transfer was initiated at all. There is a way to make I2C-HID device edge-triggered, in RVVM emulation code, but it's not actually spec compliant. It gets rid of the hang too; However the same Claim/Complete actions without any handling inside the IRQ handler are observed at least once, which technically means a lost interrupt (Pressing a key somewhere early in boot thus doesn't propagate the keypress to the guest until you press another key later, after which both HID reports are read), so it's not a way how I'd like to mitigate this in the emulator code. I, and another developer from Haiku OS team (X512), are almost sure this is a kernel bug related to level triggered IRQs with PLIC or a specific incompatibility of PLIC/I2C-HID (Not the ocores I2C controller). This hang is not reproducible with a Haiku OS guest in any way and most of the drivers involved seem to be FreeBSD based or written by X512 (Specifically the PLIC and I2C-HID drivers are). This person also believes that this Claim/Complete behavior on PLIC side without any other actions made in between is erroneous kernel behavior too. I am open to discussions what specifically could be wrong with the VM since one of my end goals is to just make HID devices work without issues there; However if a simple 2-line patch (which I'm not entirely sure of it's implications) that removes return path at line 223 in PLIC driver resolves the issue (I kept a guest in a 24 hr reboot loop whilst spamming fake I2C-HID input and no hang was observed), then it does lead me to belief that it's at least not some blatant emulation issue. I came here to collect some kernel devs opinions since we are debugging this for some 2 weeks already. Your initial understanding that something is wrong with ocores I2C controller is not what I meant, sorry for lacking in my explanation. >Are interrupts unmasked by default on RVVM? By default all PLIC ENABLE registers are set to zero. All PRIO, THRESHOLD registers are zero on reset. So all PLIC state is simply zeroed on reset, as can be seen here: https://github.com/LekKit/RVVM/blob/f81df57a2af77cbae25fd3cc65d07106d9505e23/src/devices/plic.c#L265 > Have you checked that this actually affects any actual hardware? I might very soon if no one has immediate ideas what is wrong; Problem is that I don't have hardware that exposes PLIC IRQ lines to the user. It might be possible to use some FPGA or at least reproduce in other simulators. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts 2024-03-15 3:33 ` Eva Kurchatova @ 2024-03-15 12:45 ` Conor Dooley 0 siblings, 0 replies; 7+ messages in thread From: Conor Dooley @ 2024-03-15 12:45 UTC (permalink / raw) To: Eva Kurchatova Cc: linux-riscv, bugs, linux-i2c, linux-kernel, Thomas Gleixner, Samuel Holland [-- Attachment #1: Type: text/plain, Size: 4883 bytes --] On Fri, Mar 15, 2024 at 05:33:06AM +0200, Eva Kurchatova wrote: > On Thu, Mar 14, 2024 at 11:46 PM Conor Dooley <conor@kernel.org> wrote: > > This immediately seemed odd to me, but I have no reason to disbelieve > > you, given you say this was discovered in RVVM which is an emulator and > > you should know whether or not registers are accessed. > > The very first action taken by the ocores i2c controller driver when it > > gets an interrupt though is to read a register: > > > > u8 stat = oc_getreg(i2c, OCI2C_STATUS); > > > > I would expect that this handler would be called, and therefore you'd > > see the register read, had the probe function of that driver run to > > completion. I'd also expect that the interrupt would not even be > > unmasked if that probe function had failed. > > In your case though, you can see that the interrupt is not masked, > > since it is being raised and handled repeatedly by the PLIC driver. > > Has the i2c controller driver probed in the period of boot that you say > > this problem manifests? > > There is not a single problem with the ocores I2C driver. I2C-HID device > itself has a separate IRQ line which is level-triggered. Ah, I see. I am still unsure as to how that interrupt is handled by the PLIC driver before the user for it (i2c-hid) requests it. > This is to signal the host that there is input available without polling, > since I2C is a master-driven bus with no "data available" notifications. > So in reality the I2C-HID driver should handle the interrupt; Then it > uses the I2C controller to access I2C-HID slave registers (via I2C) to > read the incoming HID input report. I2C controller interrupts are unrelated; > it's the link between the HID device and the host and it doesn't seem > to be touched at all inside the I2C-HID IRQ handler (So it's just a pair > of Claim/Complete actions). I2C ocores interrupts are not generated > (and shouldn't) at that point, because no I2C transfer was initiated at all. > > There is a way to make I2C-HID device edge-triggered, in RVVM > emulation code, but it's not actually spec compliant. It gets rid of the > hang too; However the same Claim/Complete actions without any > handling inside the IRQ handler are observed at least once, which > technically means a lost interrupt (Pressing a key somewhere early in > boot thus doesn't propagate the keypress to the guest until you press > another key later, after which both HID reports are read), so it's not > a way how I'd like to mitigate this in the emulator code. > I, and another developer from Haiku OS team (X512), are almost sure > this is a kernel bug related to level triggered IRQs with PLIC or a > specific incompatibility of PLIC/I2C-HID (Not the ocores I2C controller). > > This hang is not reproducible with a Haiku OS guest in any way and > most of the drivers involved seem to be FreeBSD based or written by > X512 (Specifically the PLIC and I2C-HID drivers are). This person also > believes that this Claim/Complete behavior on PLIC side without any > other actions made in between is erroneous kernel behavior too. Maybe it is, I've re-added Thomas and Samuel to CC as I said before that they know much more than I about the code in question. I was just pointing out that the behaviour you were seeing is inconsistent with what I have come across myself for interrupts that have not had a user request them. > I am open to discussions what specifically could be wrong with the VM > since one of my end goals is to just make HID devices work without > issues there; However if a simple 2-line patch (which I'm not entirely > sure of it's implications) that removes return path at line 223 in PLIC > driver resolves the issue (I kept a guest in a 24 hr reboot loop whilst > spamming fake I2C-HID input and no hang was observed), then it does > lead me to belief that it's at least not some blatant emulation issue. > > I came here to collect some kernel devs opinions since we are > debugging this for some 2 weeks already. Your initial understanding > that something is wrong with ocores I2C controller is not what I meant, > sorry for lacking in my explanation. > > >Are interrupts unmasked by default on RVVM? > > By default all PLIC ENABLE registers are set to zero. All PRIO, > THRESHOLD registers are zero on reset. So all PLIC state is > simply zeroed on reset, as can be seen here: > https://github.com/LekKit/RVVM/blob/f81df57a2af77cbae25fd3cc65d07106d9505e23/src/devices/plic.c#L265 > > > Have you checked that this actually affects any actual hardware? > > I might very soon if no one has immediate ideas what is wrong; > Problem is that I don't have hardware that exposes PLIC IRQ lines to > the user. It might be possible to use some FPGA or at least reproduce > in other simulators. > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts [not found] <CA+eeCSPUDpUg76ZO8dszSbAGn+UHjcyv8F1J-CUPVARAzEtW9w@mail.gmail.com> 2024-03-14 21:46 ` Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts Conor Dooley @ 2024-03-17 21:27 ` Nam Cao 2024-03-18 8:48 ` Eva Kurchatova 1 sibling, 1 reply; 7+ messages in thread From: Nam Cao @ 2024-03-17 21:27 UTC (permalink / raw) To: Eva Kurchatova Cc: linux-riscv, bugs, linux-i2c, jikos, benjamin.tissoires, dianders, mripard, johan+linaro, linux-input, linux-kernel Cc: HID folks On 14/Mar/2024 Eva Kurchatova wrote: > If an I2C-HID controller level-triggered IRQ line is routed directly as > a PLIC IRQ, and we spam input early enough in kernel boot process > (Somewhere between initializing NET, ALSA subsystems and before > i2c-hid driver init), then there is a chance of kernel locking up > completely and not going any further. > > There are no kernel messages printed with all the IRQ, task hang > debugging enabled - other than (sometimes) it reports sched RT > throttling after a few seconds. Basic timer interrupt handling is > intact - fbdev tty cursor is still blinking. > > It appears that in such a case the I2C-HID IRQ line is raised; PLIC > notifies the (single) boot system hart, kernel claims the IRQ and > immediately completes it by writing to CLAIM/COMPLETE register. > No access to the I2C controller (OpenCores) or I2C-HID registers > is made, so the HID report is never consumed and IRQ line stays > raised forever. The kernel endlessly claims & completes IRQs > without doing any work with the device. It doesn't always end up this > way; sometimes boot process completes and there are no signs of > interrupt storm or stuck IRQ processing afterwards. It seems I2C HID's interrupt handler (i2c_hid_irq) returns immediately if I2C_HID_READ_PENDING is set. This flag is supposed to be cleared in i2c_hid_xfer(), but since the (threaded) interrupt handler runs at higher priority, the flag is never cleared. So we have a lock-up: interrupt handler won't do anything unless the flag is cleared, but the clearing of this flag is done in a lower priority task which never gets scheduled while the interrupt handler is active. There is RT throttling to prevent RT tasks from locking up the system like this. I don't know much about scheduling stuffs, so I am not really sure why RT throttling does not work. I think because RT throttling triggers when RT tasks take too much CPU time, but in this case hard interrupt handlers take lots of CPU time too (~50% according to my measurement), so RT throttling doesn't trigger often enough (in this case, it triggers once and never again). Again, I don't know much about scheduler so I may be talking nonsense here. The flag I2C_HID_READ_PENDING seems to be used to make sure that only 1 I2C operation can happen at a time. But this seems pointless, because I2C subsystem already takes care of this. So I think we can just remove it. Can you give the below patch a try? diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c index 2735cd585af0..799ad0ef9c4a 100644 --- a/drivers/hid/i2c-hid/i2c-hid-core.c +++ b/drivers/hid/i2c-hid/i2c-hid-core.c @@ -64,7 +64,6 @@ /* flags */ #define I2C_HID_STARTED 0 #define I2C_HID_RESET_PENDING 1 -#define I2C_HID_READ_PENDING 2 #define I2C_HID_PWR_ON 0x00 #define I2C_HID_PWR_SLEEP 0x01 @@ -190,15 +189,10 @@ static int i2c_hid_xfer(struct i2c_hid *ihid, msgs[n].len = recv_len; msgs[n].buf = recv_buf; n++; - - set_bit(I2C_HID_READ_PENDING, &ihid->flags); } ret = i2c_transfer(client->adapter, msgs, n); - if (recv_len) - clear_bit(I2C_HID_READ_PENDING, &ihid->flags); - if (ret != n) return ret < 0 ? ret : -EIO; @@ -566,9 +560,6 @@ static irqreturn_t i2c_hid_irq(int irq, void *dev_id) { struct i2c_hid *ihid = dev_id; - if (test_bit(I2C_HID_READ_PENDING, &ihid->flags)) - return IRQ_HANDLED; - i2c_hid_get_input(ihid); return IRQ_HANDLED; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts 2024-03-17 21:27 ` Nam Cao @ 2024-03-18 8:48 ` Eva Kurchatova 2024-03-18 9:01 ` Nam Cao 0 siblings, 1 reply; 7+ messages in thread From: Eva Kurchatova @ 2024-03-18 8:48 UTC (permalink / raw) To: Nam Cao Cc: linux-riscv, bugs, linux-i2c, jikos, benjamin.tissoires, dianders, mripard, johan+linaro, linux-input, linux-kernel On Sun, Mar 17, 2024 at 11:27 PM Nam Cao <namcao@linutronix.de> wrote: > > Cc: HID folks > > On 14/Mar/2024 Eva Kurchatova wrote: > > If an I2C-HID controller level-triggered IRQ line is routed directly as > > a PLIC IRQ, and we spam input early enough in kernel boot process > > (Somewhere between initializing NET, ALSA subsystems and before > > i2c-hid driver init), then there is a chance of kernel locking up > > completely and not going any further. > > > > There are no kernel messages printed with all the IRQ, task hang > > debugging enabled - other than (sometimes) it reports sched RT > > throttling after a few seconds. Basic timer interrupt handling is > > intact - fbdev tty cursor is still blinking. > > > > It appears that in such a case the I2C-HID IRQ line is raised; PLIC > > notifies the (single) boot system hart, kernel claims the IRQ and > > immediately completes it by writing to CLAIM/COMPLETE register. > > No access to the I2C controller (OpenCores) or I2C-HID registers > > is made, so the HID report is never consumed and IRQ line stays > > raised forever. The kernel endlessly claims & completes IRQs > > without doing any work with the device. It doesn't always end up this > > way; sometimes boot process completes and there are no signs of > > interrupt storm or stuck IRQ processing afterwards. > > It seems I2C HID's interrupt handler (i2c_hid_irq) returns immediately if > I2C_HID_READ_PENDING is set. This flag is supposed to be cleared in > i2c_hid_xfer(), but since the (threaded) interrupt handler runs at higher > priority, the flag is never cleared. So we have a lock-up: interrupt > handler won't do anything unless the flag is cleared, but the clearing of > this flag is done in a lower priority task which never gets scheduled while > the interrupt handler is active. > > There is RT throttling to prevent RT tasks from locking up the system like > this. I don't know much about scheduling stuffs, so I am not really sure > why RT throttling does not work. I think because RT throttling triggers > when RT tasks take too much CPU time, but in this case hard interrupt > handlers take lots of CPU time too (~50% according to my measurement), so > RT throttling doesn't trigger often enough (in this case, it triggers once > and never again). Again, I don't know much about scheduler so I may be > talking nonsense here. > > The flag I2C_HID_READ_PENDING seems to be used to make sure that only 1 > I2C operation can happen at a time. But this seems pointless, because I2C > subsystem already takes care of this. So I think we can just remove it. > > Can you give the below patch a try? > > diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c > index 2735cd585af0..799ad0ef9c4a 100644 > --- a/drivers/hid/i2c-hid/i2c-hid-core.c > +++ b/drivers/hid/i2c-hid/i2c-hid-core.c > @@ -64,7 +64,6 @@ > /* flags */ > #define I2C_HID_STARTED 0 > #define I2C_HID_RESET_PENDING 1 > -#define I2C_HID_READ_PENDING 2 > > #define I2C_HID_PWR_ON 0x00 > #define I2C_HID_PWR_SLEEP 0x01 > @@ -190,15 +189,10 @@ static int i2c_hid_xfer(struct i2c_hid *ihid, > msgs[n].len = recv_len; > msgs[n].buf = recv_buf; > n++; > - > - set_bit(I2C_HID_READ_PENDING, &ihid->flags); > } > > ret = i2c_transfer(client->adapter, msgs, n); > > - if (recv_len) > - clear_bit(I2C_HID_READ_PENDING, &ihid->flags); > - > if (ret != n) > return ret < 0 ? ret : -EIO; > > @@ -566,9 +560,6 @@ static irqreturn_t i2c_hid_irq(int irq, void *dev_id) > { > struct i2c_hid *ihid = dev_id; > > - if (test_bit(I2C_HID_READ_PENDING, &ihid->flags)) > - return IRQ_HANDLED; > - > i2c_hid_get_input(ihid); > > return IRQ_HANDLED; Patch applied cleanly on top of 6.7.9, builds OK (No warns, etc). This indeed fixes the hang completely. I modified RVVM to send millions of keystroke events per second, and put `reboot` as a service hook in the guest. It has been continuously rebooting without a hitch for the last 30 minutes or so (Full boot takes around 2 seconds), whereas unpatched 6.7.9 hangs almost immediately in such conditions (Reverted your patch & rebuilt to be sure). Thank you very much for this! Hope to see it upstreamed soon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts 2024-03-18 8:48 ` Eva Kurchatova @ 2024-03-18 9:01 ` Nam Cao 2024-03-18 10:24 ` Eva Kurchatova 0 siblings, 1 reply; 7+ messages in thread From: Nam Cao @ 2024-03-18 9:01 UTC (permalink / raw) To: Eva Kurchatova Cc: linux-riscv, bugs, linux-i2c, jikos, benjamin.tissoires, dianders, mripard, johan+linaro, linux-input, linux-kernel On 18/Mar/2024 Eva Kurchatova wrote: > On Sun, Mar 17, 2024 at 11:27 PM Nam Cao <namcao@linutronix.de> wrote: > > It seems I2C HID's interrupt handler (i2c_hid_irq) returns immediately if > > I2C_HID_READ_PENDING is set. This flag is supposed to be cleared in > > i2c_hid_xfer(), but since the (threaded) interrupt handler runs at higher > > priority, the flag is never cleared. So we have a lock-up: interrupt > > handler won't do anything unless the flag is cleared, but the clearing of > > this flag is done in a lower priority task which never gets scheduled while > > the interrupt handler is active. > > > > There is RT throttling to prevent RT tasks from locking up the system like > > this. I don't know much about scheduling stuffs, so I am not really sure > > why RT throttling does not work. I think because RT throttling triggers > > when RT tasks take too much CPU time, but in this case hard interrupt > > handlers take lots of CPU time too (~50% according to my measurement), so > > RT throttling doesn't trigger often enough (in this case, it triggers once > > and never again). Again, I don't know much about scheduler so I may be > > talking nonsense here. > > > > The flag I2C_HID_READ_PENDING seems to be used to make sure that only 1 > > I2C operation can happen at a time. But this seems pointless, because I2C > > subsystem already takes care of this. So I think we can just remove it. > > > > Can you give the below patch a try? > > > > diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c > > index 2735cd585af0..799ad0ef9c4a 100644 > > --- a/drivers/hid/i2c-hid/i2c-hid-core.c > > +++ b/drivers/hid/i2c-hid/i2c-hid-core.c > > @@ -64,7 +64,6 @@ > > /* flags */ > > #define I2C_HID_STARTED 0 > > #define I2C_HID_RESET_PENDING 1 > > -#define I2C_HID_READ_PENDING 2 > > > > #define I2C_HID_PWR_ON 0x00 > > #define I2C_HID_PWR_SLEEP 0x01 > > @@ -190,15 +189,10 @@ static int i2c_hid_xfer(struct i2c_hid *ihid, > > msgs[n].len = recv_len; > > msgs[n].buf = recv_buf; > > n++; > > - > > - set_bit(I2C_HID_READ_PENDING, &ihid->flags); > > } > > > > ret = i2c_transfer(client->adapter, msgs, n); > > > > - if (recv_len) > > - clear_bit(I2C_HID_READ_PENDING, &ihid->flags); > > - > > if (ret != n) > > return ret < 0 ? ret : -EIO; > > > > @@ -566,9 +560,6 @@ static irqreturn_t i2c_hid_irq(int irq, void *dev_id) > > { > > struct i2c_hid *ihid = dev_id; > > > > - if (test_bit(I2C_HID_READ_PENDING, &ihid->flags)) > > - return IRQ_HANDLED; > > - > > i2c_hid_get_input(ihid); > > > > return IRQ_HANDLED; > > Patch applied cleanly on top of 6.7.9, builds OK (No warns, etc). > > This indeed fixes the hang completely. Nice! I assume I can add Reported-and-tested-by: Eva Kurchatova <nyandarknessgirl@gmail.com> to the patch? > I modified RVVM to send millions of keystroke events per second, > and put `reboot` as a service hook in the guest. It has been continuously > rebooting without a hitch for the last 30 minutes or so (Full boot takes > around 2 seconds), whereas unpatched 6.7.9 hangs almost immediately > in such conditions (Reverted your patch & rebuilt to be sure). > > Thank you very much for this! Hope to see it upstreamed soon Thank you for the report, your investigation helped a lot. I am still confused why RT throttling doesn't unstuck the kernel in this case. I will consult some people and investigate more on this. But I think this patch is good on its own, so I will send a proper patch shortly. Best regards, Nam ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts 2024-03-18 9:01 ` Nam Cao @ 2024-03-18 10:24 ` Eva Kurchatova 0 siblings, 0 replies; 7+ messages in thread From: Eva Kurchatova @ 2024-03-18 10:24 UTC (permalink / raw) To: Nam Cao Cc: linux-riscv, bugs, linux-i2c, jikos, benjamin.tissoires, dianders, mripard, johan+linaro, linux-input, linux-kernel On Mon, Mar 18, 2024 at 11:01 AM Nam Cao <namcao@linutronix.de> wrote: > Nice! I assume I can add > Reported-and-tested-by: Eva Kurchatova <nyandarknessgirl@gmail.com> > to the patch? > Yes. You can also add link to upstream RVVM repo if anyone is interested in reproduction. This RVVM patch applied to 0.6 makes a keystroke storm: (window_update() is called on each display redraw and has access to hid_kb) diff --git a/src/devices/fb_window.c b/src/devices/fb_window.c index f170e2d..17e2519 100644 --- a/src/devices/fb_window.c +++ b/src/devices/fb_window.c @@ -77,6 +77,11 @@ static const uint8_t rvvm_logo_pix[] = { static void window_update(rvvm_mmio_dev_t* device) { + fb_window_t* win = device->data; + for (size_t i=0; i<100000; ++i) { + hid_keyboard_press(win->keyboard, HID_KEY_LEFTCTRL); + hid_keyboard_release(win->keyboard, HID_KEY_LEFTCTRL); + } fb_window_update((fb_window_t*)device->data); } > I am still confused why RT throttling doesn't unstuck the kernel in this > case. I will consult some people and investigate more on this. But I think > this patch is good on its own, so I will send a proper patch shortly. > > Best regards, > Nam RT throttling kicked in *very* rarely, in most cases where the unpatched kernel was actually stuck RT throttling wasn't reported at all. It didn't hang every time either, so it's possible that RT throttling helped sometimes, but not enough to always recover from such a loop: [incoming IRQ]->[IRQ claimed]->[no handling]->[IRQ completion (which immediately triggers phase 1 again)] Best regards, Eva ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-03-18 10:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+eeCSPUDpUg76ZO8dszSbAGn+UHjcyv8F1J-CUPVARAzEtW9w@mail.gmail.com>
2024-03-14 21:46 ` Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts Conor Dooley
2024-03-15 3:33 ` Eva Kurchatova
2024-03-15 12:45 ` Conor Dooley
2024-03-17 21:27 ` Nam Cao
2024-03-18 8:48 ` Eva Kurchatova
2024-03-18 9:01 ` Nam Cao
2024-03-18 10:24 ` Eva Kurchatova
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox