* [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() @ 2022-08-25 16:18 Shameer Kolothum via 2022-08-26 11:59 ` Laszlo Ersek 2022-09-07 8:52 ` Igor Mammedov 0 siblings, 2 replies; 8+ messages in thread From: Shameer Kolothum via @ 2022-08-25 16:18 UTC (permalink / raw) To: qemu-devel, qemu-arm Cc: imammedo, peter.maydell, lersek, linuxarm, chenxiang66 Hi On arm/virt platform, Chen Xiang reported a Guest crash while attempting the below steps, 1. Launch the Guest with nvdimm=on 2. Hot-add a NVDIMM dev 3. Reboot 4. Guest boots fine. 5. Reboot again. 6. Guest boot fails. QEMU_EFI reports the below error: ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" OnRootBridgesConnected: InstallAcpiTables: Protocol Error Debugging shows that on first reboot(after hot-adding NVDIMM), Qemu updates the etc/table-loader len, qemu_ram_resize() fw_cfg_modify_file() fw_cfg_modify_bytes_read() And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for the "key" entry to NULL. Because of this, on the second reboot, virt_acpi_build_update() is called with a NULL "build_state" and returns without updating the ACPI tables. This seems to be upsetting the firmware. To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read(). Reported-by: chenxiang <chenxiang66@hisilicon.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> --- I am still not very convinced this is the root cause of the issue. Though it looks like setting callback_opaque to NULL while updating the file size is wrong, what puzzles me is that on the second reboot we don't have any ACPI table size changes and ideally firmware should see the updated tables from the first reboot itself. Please take a look and let me know. Thanks, Shameer --- hw/nvram/fw_cfg.c | 1 - 1 file changed, 1 deletion(-) diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index d605f3f45a..dfe8404c01 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, ptr = s->entries[arch][key].data; s->entries[arch][key].data = data; s->entries[arch][key].len = len; - s->entries[arch][key].callback_opaque = NULL; s->entries[arch][key].allow_write = false; return ptr; -- 2.17.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() 2022-08-25 16:18 [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() Shameer Kolothum via @ 2022-08-26 11:59 ` Laszlo Ersek 2022-08-26 12:06 ` Laszlo Ersek 2022-09-07 8:52 ` Igor Mammedov 1 sibling, 1 reply; 8+ messages in thread From: Laszlo Ersek @ 2022-08-26 11:59 UTC (permalink / raw) To: Shameer Kolothum, qemu-devel, qemu-arm Cc: imammedo, peter.maydell, linuxarm, chenxiang66 On 08/25/22 18:18, Shameer Kolothum wrote: > Hi > > On arm/virt platform, Chen Xiang reported a Guest crash while > attempting the below steps, > > 1. Launch the Guest with nvdimm=on > 2. Hot-add a NVDIMM dev > 3. Reboot > 4. Guest boots fine. > 5. Reboot again. > 6. Guest boot fails. > > QEMU_EFI reports the below error: > ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" > OnRootBridgesConnected: InstallAcpiTables: Protocol Error > > Debugging shows that on first reboot(after hot-adding NVDIMM), > Qemu updates the etc/table-loader len, > > qemu_ram_resize() > fw_cfg_modify_file() > fw_cfg_modify_bytes_read() > > And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for > the "key" entry to NULL. Because of this, on the second reboot, > virt_acpi_build_update() is called with a NULL "build_state" and > returns without updating the ACPI tables. This seems to be > upsetting the firmware. > > To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read(). > > Reported-by: chenxiang <chenxiang66@hisilicon.com> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > --- > I am still not very convinced this is the root cause of the issue. > Though it looks like setting callback_opaque to NULL while updating > the file size is wrong, what puzzles me is that on the second reboot > we don't have any ACPI table size changes and ideally firmware should > see the updated tables from the first reboot itself. > > Please take a look and let me know. > > Thanks, > Shameer > > --- > hw/nvram/fw_cfg.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c > index d605f3f45a..dfe8404c01 100644 > --- a/hw/nvram/fw_cfg.c > +++ b/hw/nvram/fw_cfg.c > @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, > ptr = s->entries[arch][key].data; > s->entries[arch][key].data = data; > s->entries[arch][key].len = len; > - s->entries[arch][key].callback_opaque = NULL; > s->entries[arch][key].allow_write = false; > > return ptr; > I vaguely recall seeing the same issue report years ago (also in relation to hot-adding NVDIMM). However, I have no capacity to participate in the discussion. Making this remark just for clarity. Laszlo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() 2022-08-26 11:59 ` Laszlo Ersek @ 2022-08-26 12:06 ` Laszlo Ersek 2022-08-26 12:15 ` Shameerali Kolothum Thodi via 2022-08-30 6:43 ` Shameerali Kolothum Thodi via 0 siblings, 2 replies; 8+ messages in thread From: Laszlo Ersek @ 2022-08-26 12:06 UTC (permalink / raw) To: Shameer Kolothum, qemu-devel, qemu-arm Cc: imammedo, peter.maydell, linuxarm, chenxiang66, Ard Biesheuvel (kernel.org address), Gerd Hoffmann +Ard +Gerd, one pointer at the bottom On 08/26/22 13:59, Laszlo Ersek wrote: > On 08/25/22 18:18, Shameer Kolothum wrote: >> Hi >> >> On arm/virt platform, Chen Xiang reported a Guest crash while >> attempting the below steps, >> >> 1. Launch the Guest with nvdimm=on >> 2. Hot-add a NVDIMM dev >> 3. Reboot >> 4. Guest boots fine. >> 5. Reboot again. >> 6. Guest boot fails. >> >> QEMU_EFI reports the below error: >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error >> >> Debugging shows that on first reboot(after hot-adding NVDIMM), >> Qemu updates the etc/table-loader len, >> >> qemu_ram_resize() >> fw_cfg_modify_file() >> fw_cfg_modify_bytes_read() >> >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for >> the "key" entry to NULL. Because of this, on the second reboot, >> virt_acpi_build_update() is called with a NULL "build_state" and >> returns without updating the ACPI tables. This seems to be >> upsetting the firmware. >> >> To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read(). >> >> Reported-by: chenxiang <chenxiang66@hisilicon.com> >> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> >> --- >> I am still not very convinced this is the root cause of the issue. >> Though it looks like setting callback_opaque to NULL while updating >> the file size is wrong, what puzzles me is that on the second reboot >> we don't have any ACPI table size changes and ideally firmware should >> see the updated tables from the first reboot itself. >> >> Please take a look and let me know. >> >> Thanks, >> Shameer >> >> --- >> hw/nvram/fw_cfg.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c >> index d605f3f45a..dfe8404c01 100644 >> --- a/hw/nvram/fw_cfg.c >> +++ b/hw/nvram/fw_cfg.c >> @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, >> ptr = s->entries[arch][key].data; >> s->entries[arch][key].data = data; >> s->entries[arch][key].len = len; >> - s->entries[arch][key].callback_opaque = NULL; >> s->entries[arch][key].allow_write = false; >> >> return ptr; >> > > I vaguely recall seeing the same issue report years ago (also in > relation to hot-adding NVDIMM). However, I have no capacity to > participate in the discussion. Making this remark just for clarity. The earlier report I've had in mind was from Shameer as well: http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3FB328@lhreml524-mbs.china.huawei.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() 2022-08-26 12:06 ` Laszlo Ersek @ 2022-08-26 12:15 ` Shameerali Kolothum Thodi via 2022-08-30 6:43 ` Shameerali Kolothum Thodi via 1 sibling, 0 replies; 8+ messages in thread From: Shameerali Kolothum Thodi via @ 2022-08-26 12:15 UTC (permalink / raw) To: Laszlo Ersek, qemu-devel@nongnu.org, qemu-arm@nongnu.org Cc: imammedo@redhat.com, peter.maydell@linaro.org, Linuxarm, chenxiang (M), Ard Biesheuvel (kernel.org address), Gerd Hoffmann > -----Original Message----- > From: Laszlo Ersek [mailto:lersek@redhat.com] > Sent: 26 August 2022 13:07 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > qemu-devel@nongnu.org; qemu-arm@nongnu.org > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; Ard > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann > <kraxel@redhat.com> > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in > fw_cfg_modify_bytes_read() > > +Ard +Gerd, one pointer at the bottom > > On 08/26/22 13:59, Laszlo Ersek wrote: > > On 08/25/22 18:18, Shameer Kolothum wrote: > >> Hi > >> > >> On arm/virt platform, Chen Xiang reported a Guest crash while > >> attempting the below steps, > >> > >> 1. Launch the Guest with nvdimm=on > >> 2. Hot-add a NVDIMM dev > >> 3. Reboot > >> 4. Guest boots fine. > >> 5. Reboot again. > >> 6. Guest boot fails. > >> > >> QEMU_EFI reports the below error: > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error > >> > >> Debugging shows that on first reboot(after hot-adding NVDIMM), > >> Qemu updates the etc/table-loader len, > >> > >> qemu_ram_resize() > >> fw_cfg_modify_file() > >> fw_cfg_modify_bytes_read() > >> > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for > >> the "key" entry to NULL. Because of this, on the second reboot, > >> virt_acpi_build_update() is called with a NULL "build_state" and > >> returns without updating the ACPI tables. This seems to be > >> upsetting the firmware. > >> > >> To fix this, don't change the callback_opaque in > fw_cfg_modify_bytes_read(). > >> > >> Reported-by: chenxiang <chenxiang66@hisilicon.com> > >> Signed-off-by: Shameer Kolothum > <shameerali.kolothum.thodi@huawei.com> > >> --- > >> I am still not very convinced this is the root cause of the issue. > >> Though it looks like setting callback_opaque to NULL while updating > >> the file size is wrong, what puzzles me is that on the second reboot > >> we don't have any ACPI table size changes and ideally firmware should > >> see the updated tables from the first reboot itself. > >> > >> Please take a look and let me know. > >> > >> Thanks, > >> Shameer > >> > >> --- > >> hw/nvram/fw_cfg.c | 1 - > >> 1 file changed, 1 deletion(-) > >> > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c > >> index d605f3f45a..dfe8404c01 100644 > >> --- a/hw/nvram/fw_cfg.c > >> +++ b/hw/nvram/fw_cfg.c > >> @@ -728,7 +728,6 @@ static void > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, > >> ptr = s->entries[arch][key].data; > >> s->entries[arch][key].data = data; > >> s->entries[arch][key].len = len; > >> - s->entries[arch][key].callback_opaque = NULL; > >> s->entries[arch][key].allow_write = false; > >> > >> return ptr; > >> > > > > I vaguely recall seeing the same issue report years ago (also in > > relation to hot-adding NVDIMM). However, I have no capacity to > > participate in the discussion. Making this remark just for clarity. > > The earlier report I've had in mind was from Shameer as well: > > http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F > B328@lhreml524-mbs.china.huawei.com Right. That was a slightly different issue though. It was basically ACPI table size not getting updated on the first reboot of Guest after we hot-add NVDIMM dev. The error from firmware was different in that case, ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables" OnRootBridgesConnected: InstallAcpiTables: Protocol Error And it was fixed with this series here, https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.30664-1-shameerali.kolothum.thodi@huawei.com/ The current issue only happens on the second reboot of the Guest as described in the steps above. Thanks, Shameer ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() 2022-08-26 12:06 ` Laszlo Ersek 2022-08-26 12:15 ` Shameerali Kolothum Thodi via @ 2022-08-30 6:43 ` Shameerali Kolothum Thodi via 2022-08-30 19:45 ` Christian A. Ehrhardt 1 sibling, 1 reply; 8+ messages in thread From: Shameerali Kolothum Thodi via @ 2022-08-30 6:43 UTC (permalink / raw) To: Laszlo Ersek, qemu-devel@nongnu.org, qemu-arm@nongnu.org Cc: imammedo@redhat.com, peter.maydell@linaro.org, Linuxarm, chenxiang (M), Ard Biesheuvel (kernel.org address), Gerd Hoffmann, Christian A. Ehrhardt > -----Original Message----- > From: Shameerali Kolothum Thodi > Sent: 26 August 2022 13:15 > To: 'Laszlo Ersek' <lersek@redhat.com>; qemu-devel@nongnu.org; > qemu-arm@nongnu.org > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; Ard > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann > <kraxel@redhat.com> > Subject: RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in > fw_cfg_modify_bytes_read() > > > > > -----Original Message----- > > From: Laszlo Ersek [mailto:lersek@redhat.com] > > Sent: 26 August 2022 13:07 > > To: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; > > qemu-devel@nongnu.org; qemu-arm@nongnu.org > > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm > > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; > Ard > > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann > > <kraxel@redhat.com> > > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in > > fw_cfg_modify_bytes_read() > > > > +Ard +Gerd, one pointer at the bottom > > > > On 08/26/22 13:59, Laszlo Ersek wrote: > > > On 08/25/22 18:18, Shameer Kolothum wrote: > > >> Hi > > >> > > >> On arm/virt platform, Chen Xiang reported a Guest crash while > > >> attempting the below steps, > > >> > > >> 1. Launch the Guest with nvdimm=on > > >> 2. Hot-add a NVDIMM dev > > >> 3. Reboot > > >> 4. Guest boots fine. > > >> 5. Reboot again. > > >> 6. Guest boot fails. > > >> > > >> QEMU_EFI reports the below error: > > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" > > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error > > >> > > >> Debugging shows that on first reboot(after hot-adding NVDIMM), > > >> Qemu updates the etc/table-loader len, > > >> > > >> qemu_ram_resize() > > >> fw_cfg_modify_file() > > >> fw_cfg_modify_bytes_read() > > >> > > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for > > >> the "key" entry to NULL. Because of this, on the second reboot, > > >> virt_acpi_build_update() is called with a NULL "build_state" and > > >> returns without updating the ACPI tables. This seems to be > > >> upsetting the firmware. > > >> > > >> To fix this, don't change the callback_opaque in > > fw_cfg_modify_bytes_read(). > > >> > > >> Reported-by: chenxiang <chenxiang66@hisilicon.com> > > >> Signed-off-by: Shameer Kolothum > > <shameerali.kolothum.thodi@huawei.com> > > >> --- > > >> I am still not very convinced this is the root cause of the issue. > > >> Though it looks like setting callback_opaque to NULL while updating > > >> the file size is wrong, what puzzles me is that on the second reboot > > >> we don't have any ACPI table size changes and ideally firmware should > > >> see the updated tables from the first reboot itself. > > >> > > >> Please take a look and let me know. > > >> > > >> Thanks, > > >> Shameer > > >> > > >> --- > > >> hw/nvram/fw_cfg.c | 1 - > > >> 1 file changed, 1 deletion(-) > > >> > > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c > > >> index d605f3f45a..dfe8404c01 100644 > > >> --- a/hw/nvram/fw_cfg.c > > >> +++ b/hw/nvram/fw_cfg.c > > >> @@ -728,7 +728,6 @@ static void > > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, > > >> ptr = s->entries[arch][key].data; > > >> s->entries[arch][key].data = data; > > >> s->entries[arch][key].len = len; > > >> - s->entries[arch][key].callback_opaque = NULL; > > >> s->entries[arch][key].allow_write = false; > > >> > > >> return ptr; > > >> > > > > > > I vaguely recall seeing the same issue report years ago (also in > > > relation to hot-adding NVDIMM). However, I have no capacity to > > > participate in the discussion. Making this remark just for clarity. > > > > The earlier report I've had in mind was from Shameer as well: > > > > > http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F > > B328@lhreml524-mbs.china.huawei.com > > Right. That was a slightly different issue though. It was basically ACPI table > size not > getting updated on the first reboot of Guest after we hot-add NVDIMM dev. > The error > from firmware was different in that case, > > ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables" > OnRootBridgesConnected: InstallAcpiTables: Protocol Error > > And it was fixed with this series here, > https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.3 > 0664-1-shameerali.kolothum.thodi@huawei.com/ > > The current issue only happens on the second reboot of the Guest as > described in > the steps above. > [+Christian] I just found that a similar issue was reported here sometime back on Q35/Windows setup, https://patchew.org/QEMU/YldFMTbFLUcdFIfa@cae.in-ulm.de/ But there are no further discussions on that thread. Thanks, Shameer ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() 2022-08-30 6:43 ` Shameerali Kolothum Thodi via @ 2022-08-30 19:45 ` Christian A. Ehrhardt 0 siblings, 0 replies; 8+ messages in thread From: Christian A. Ehrhardt @ 2022-08-30 19:45 UTC (permalink / raw) To: Shameerali Kolothum Thodi Cc: Laszlo Ersek, qemu-devel@nongnu.org, qemu-arm@nongnu.org, imammedo@redhat.com, peter.maydell@linaro.org, Linuxarm, chenxiang (M), Ard Biesheuvel (kernel.org address), Gerd Hoffmann Hi, Shameer: Thanks for bringing this to my attention. Some comments inline. On Tue, Aug 30, 2022 at 06:43:56AM +0000, Shameerali Kolothum Thodi wrote: > > > > -----Original Message----- > > From: Shameerali Kolothum Thodi > > Sent: 26 August 2022 13:15 > > To: 'Laszlo Ersek' <lersek@redhat.com>; qemu-devel@nongnu.org; > > qemu-arm@nongnu.org > > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm > > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; Ard > > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann > > <kraxel@redhat.com> > > Subject: RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in > > fw_cfg_modify_bytes_read() > > > > > > > > > -----Original Message----- > > > From: Laszlo Ersek [mailto:lersek@redhat.com] > > > Sent: 26 August 2022 13:07 > > > To: Shameerali Kolothum Thodi > > <shameerali.kolothum.thodi@huawei.com>; > > > qemu-devel@nongnu.org; qemu-arm@nongnu.org > > > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm > > > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; > > Ard > > > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann > > > <kraxel@redhat.com> > > > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in > > > fw_cfg_modify_bytes_read() > > > > > > +Ard +Gerd, one pointer at the bottom > > > > > > On 08/26/22 13:59, Laszlo Ersek wrote: > > > > On 08/25/22 18:18, Shameer Kolothum wrote: > > > >> Hi > > > >> > > > >> On arm/virt platform, Chen Xiang reported a Guest crash while > > > >> attempting the below steps, > > > >> > > > >> 1. Launch the Guest with nvdimm=on > > > >> 2. Hot-add a NVDIMM dev > > > >> 3. Reboot > > > >> 4. Guest boots fine. > > > >> 5. Reboot again. > > > >> 6. Guest boot fails. > > > >> > > > >> QEMU_EFI reports the below error: > > > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" > > > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error > > > >> > > > >> Debugging shows that on first reboot(after hot-adding NVDIMM), > > > >> Qemu updates the etc/table-loader len, > > > >> > > > >> qemu_ram_resize() > > > >> fw_cfg_modify_file() > > > >> fw_cfg_modify_bytes_read() > > > >> > > > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for > > > >> the "key" entry to NULL. Because of this, on the second reboot, > > > >> virt_acpi_build_update() is called with a NULL "build_state" and > > > >> returns without updating the ACPI tables. This seems to be > > > >> upsetting the firmware. > > > >> > > > >> To fix this, don't change the callback_opaque in > > > fw_cfg_modify_bytes_read(). > > > >> > > > >> Reported-by: chenxiang <chenxiang66@hisilicon.com> > > > >> Signed-off-by: Shameer Kolothum > > > <shameerali.kolothum.thodi@huawei.com> > > > >> --- > > > >> I am still not very convinced this is the root cause of the issue. > > > >> Though it looks like setting callback_opaque to NULL while updating > > > >> the file size is wrong, what puzzles me is that on the second reboot > > > >> we don't have any ACPI table size changes and ideally firmware should > > > >> see the updated tables from the first reboot itself. > > > >> > > > >> Please take a look and let me know. > > > >> > > > >> Thanks, > > > >> Shameer > > > >> > > > >> --- > > > >> hw/nvram/fw_cfg.c | 1 - > > > >> 1 file changed, 1 deletion(-) > > > >> > > > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c > > > >> index d605f3f45a..dfe8404c01 100644 > > > >> --- a/hw/nvram/fw_cfg.c > > > >> +++ b/hw/nvram/fw_cfg.c > > > >> @@ -728,7 +728,6 @@ static void > > > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, > > > >> ptr = s->entries[arch][key].data; > > > >> s->entries[arch][key].data = data; > > > >> s->entries[arch][key].len = len; > > > >> - s->entries[arch][key].callback_opaque = NULL; > > > >> s->entries[arch][key].allow_write = false; > > > >> > > > >> return ptr; > > > >> The code as it stands clears callback_opaque (the data pointer of the callbacks) while leaving the actual callbacks in place. I think it is obvious that this cannot be correct. IMHO the change to allow_write is wrong for similar reasons but I don't think that this matters in practice. If this path is hit for the table-loader file the ACPI tables in the guest will be corrupt. > > > > I vaguely recall seeing the same issue report years ago (also in > > > > relation to hot-adding NVDIMM). However, I have no capacity to > > > > participate in the discussion. Making this remark just for clarity. > > > > > > The earlier report I've had in mind was from Shameer as well: > > > > > > > > http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F > > > B328@lhreml524-mbs.china.huawei.com > > > > Right. That was a slightly different issue though. It was basically ACPI table > > size not > > getting updated on the first reboot of Guest after we hot-add NVDIMM dev. > > The error > > from firmware was different in that case, > > > > ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables" > > OnRootBridgesConnected: InstallAcpiTables: Protocol Error > > > > And it was fixed with this series here, > > https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.3 > > 0664-1-shameerali.kolothum.thodi@huawei.com/ > > > > The current issue only happens on the second reboot of the Guest as > > described in > > the steps above. > > > > [+Christian] > > I just found that a similar issue was reported here sometime back on Q35/Windows > setup, > https://patchew.org/QEMU/YldFMTbFLUcdFIfa@cae.in-ulm.de/ > > But there are no further discussions on that thread. I convinced myself that this cannot happen upstream as the number of entries in the table-loader is always small. However, it does happen for us and the suggested patch fixes the issue for us. Given that Shameer independantly came to the same conclusion the patch should by considered for inclusion. thanks Christian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() 2022-08-25 16:18 [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() Shameer Kolothum via 2022-08-26 11:59 ` Laszlo Ersek @ 2022-09-07 8:52 ` Igor Mammedov 2022-09-07 9:36 ` Gerd Hoffmann 1 sibling, 1 reply; 8+ messages in thread From: Igor Mammedov @ 2022-09-07 8:52 UTC (permalink / raw) To: Shameer Kolothum Cc: qemu-devel, qemu-arm, peter.maydell, lersek, linuxarm, chenxiang66, kraxel On Thu, 25 Aug 2022 17:18:42 +0100 Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote: > Hi > > On arm/virt platform, Chen Xiang reported a Guest crash while > attempting the below steps, > > 1. Launch the Guest with nvdimm=on > 2. Hot-add a NVDIMM dev > 3. Reboot > 4. Guest boots fine. > 5. Reboot again. > 6. Guest boot fails. > > QEMU_EFI reports the below error: > ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" > OnRootBridgesConnected: InstallAcpiTables: Protocol Error > > Debugging shows that on first reboot(after hot-adding NVDIMM), > Qemu updates the etc/table-loader len, > > qemu_ram_resize() > fw_cfg_modify_file() > fw_cfg_modify_bytes_read() > > And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for > the "key" entry to NULL. Because of this, on the second reboot, > virt_acpi_build_update() is called with a NULL "build_state" and > returns without updating the ACPI tables. This seems to be > upsetting the firmware. > > To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read(). Fixes: bdbb5b1706d165 ("fw_cfg: add fw_cfg_machine_reset function") Acked-by: Igor Mammedov <imammedo@redhat.com> CCing Gerd to have a second set of eyes on it > Reported-by: chenxiang <chenxiang66@hisilicon.com> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > --- > I am still not very convinced this is the root cause of the issue. > Though it looks like setting callback_opaque to NULL while updating > the file size is wrong, what puzzles me is that on the second reboot > we don't have any ACPI table size changes and ideally firmware should > see the updated tables from the first reboot itself. > > Please take a look and let me know. > > Thanks, > Shameer > > --- > hw/nvram/fw_cfg.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c > index d605f3f45a..dfe8404c01 100644 > --- a/hw/nvram/fw_cfg.c > +++ b/hw/nvram/fw_cfg.c > @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, > ptr = s->entries[arch][key].data; > s->entries[arch][key].data = data; > s->entries[arch][key].len = len; > - s->entries[arch][key].callback_opaque = NULL; > s->entries[arch][key].allow_write = false; As Christian have mentioned, this also looks bogus. perhaps another patch to fix that as well. > return ptr; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() 2022-09-07 8:52 ` Igor Mammedov @ 2022-09-07 9:36 ` Gerd Hoffmann 0 siblings, 0 replies; 8+ messages in thread From: Gerd Hoffmann @ 2022-09-07 9:36 UTC (permalink / raw) To: Igor Mammedov Cc: Shameer Kolothum, qemu-devel, qemu-arm, peter.maydell, lersek, linuxarm, chenxiang66 > > QEMU_EFI reports the below error: > > ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" > > OnRootBridgesConnected: InstallAcpiTables: Protocol Error > > > > Debugging shows that on first reboot(after hot-adding NVDIMM), > > Qemu updates the etc/table-loader len, > > > > qemu_ram_resize() > > fw_cfg_modify_file() > > fw_cfg_modify_bytes_read() > > > > And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for > > the "key" entry to NULL. Because of this, on the second reboot, > > virt_acpi_build_update() is called with a NULL "build_state" and > > returns without updating the ACPI tables. This seems to be > > upsetting the firmware. > > > > To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read(). > > Fixes: bdbb5b1706d165 ("fw_cfg: add fw_cfg_machine_reset function") > Acked-by: Igor Mammedov <imammedo@redhat.com> > > CCing Gerd to have a second set of eyes on it Hmm. Original patch clears both 'callback_opaque' and 'callback' (where 'callback' used to be what 'select_cb' is today I think). Not fully sure what the motivation was for that. Maybe because using both fw_cfg_modify*() calls and a callback for update-on-read for a given entry looks pointless. Should that be the case there are better ways to catch that, like having fw_cfg_modify_bytes_read() throw an error in case select_cb is not NULL instead of silently clearing the callback. In any case clearing callback_opaque only is obviously wrong, so Acked-by: Gerd Hoffmann <kraxel@redhat.com> take care, Gerd ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-09-07 9:40 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-25 16:18 [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read() Shameer Kolothum via 2022-08-26 11:59 ` Laszlo Ersek 2022-08-26 12:06 ` Laszlo Ersek 2022-08-26 12:15 ` Shameerali Kolothum Thodi via 2022-08-30 6:43 ` Shameerali Kolothum Thodi via 2022-08-30 19:45 ` Christian A. Ehrhardt 2022-09-07 8:52 ` Igor Mammedov 2022-09-07 9:36 ` Gerd Hoffmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).