From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34480) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XNRfW-000366-6e for qemu-devel@nongnu.org; Fri, 29 Aug 2014 15:17:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XNRfR-0005oF-HS for qemu-devel@nongnu.org; Fri, 29 Aug 2014 15:17:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37601) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XNRfR-0005nH-9u for qemu-devel@nongnu.org; Fri, 29 Aug 2014 15:17:01 -0400 Message-ID: <5400D1A5.8020709@redhat.com> Date: Fri, 29 Aug 2014 15:16:53 -0400 From: John Snow MIME-Version: 1.0 References: <1406743889-2473-1-git-send-email-eniacz@hp.com> In-Reply-To: <1406743889-2473-1-git-send-email-eniacz@hp.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] AHCI read/write corruption with int13h List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eniac Zhang Cc: Kevin Wolf , Paolo Bonzini , qemu-devel@nongnu.org, Stefan Hajnoczi On 07/30/2014 02:11 PM, Eniac Zhang wrote: > The AHCI controller code in Qemu has a bug that it will use the > wrong LBA address when Seabios tries to access LBA>128GB > (aka 127.5GB limit http://www.hardwaresecrets.com/printpage/Hard-Disk= -Drives-Capacity-Limits/482). > When we needs to access the LBA>0xfffffff, 28bit LBA is not sufficien= t > thus AHCI code needs to convert that into an LBA48 command, but it > didn=E2=80=99t set all the flags correctly, so low level code ends up= reading a sector at different address. > > how to duplicate: > turn off the workaround in ahci.c, leaving the debug logs in core.c, > compile your qemu-system-x86_64 and then run: > ./ qemu-system-x86_64 -fda dos622.img -drive if=3Dnone,file=3D./blank.q= cow2,id=3Dhdc,media=3Ddisk -device ide-hd,drive=3Dhdc,bus=3Dide.0 -M q35 = -m 256M -vnc :1 -boot a > > Blank.qcow2 is a 300GB virtual disk file I pre-created, you can leave i= t blank cause what=E2=80=99s on disk doesn=E2=80=99t matter in this test.= dos622.img is the dos622 floppy image with debug.com and a batch file: > a 100 > mov si, 0200 > mov ax, 4200 > mov dx, 0080 > int 13 > ret > > ; 0x3: length > e 200 10 00 7f 00 00 00 00 50 > ; lba > e 208 88 69 e2 11 00 00 00 00 > > r ip > 100 > g > r > d 5000:0 > q > > Connect vncviewer, once dos boot is completed, type debug --- > hw/ide/ahci.c | 15 +++++++++++++++ > hw/ide/core.c | 20 ++++++++++++++++++++ > 2 files changed, 35 insertions(+) > > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c > index 604152a..3e86953 100644 > --- a/hw/ide/ahci.c > +++ b/hw/ide/ahci.c > @@ -938,6 +938,21 @@ static int handle_cmd(AHCIState *s, int port, int = slot) > * do, I simply assume non-used fields as reserved and OR= everything > * together, independent of the command. > */ > + // enable lba and lba48 mode, otherwise the bit won't get= set until the command is completed, cause read/write corruption > + ide_state->lba48 =3D (cmd_fis[2] =3D=3D WIN_READDMA_EXT > + || cmd_fis[2] =3D=3D WIN_READ_EXT > + || cmd_fis[2] =3D=3D WIN_READDMA_QUEUED_EXT > + || cmd_fis[2] =3D=3D WIN_READ_NATIVE_MAX_EXT > + || cmd_fis[2] =3D=3D WIN_MULTREAD_EXT > + || cmd_fis[2] =3D=3D WIN_WRITE_EXT > + || cmd_fis[2] =3D=3D WIN_WRITEDMA_EXT > + || cmd_fis[2] =3D=3D WIN_WRITEDMA_QUEUED_EXT > + || cmd_fis[2] =3D=3D WIN_SET_MAX_EXT > + || cmd_fis[2] =3D=3D WIN_MULTWRITE_EXT > + || cmd_fis[2] =3D=3D WIN_VERIFY_EXT > + || cmd_fis[2] =3D=3D WIN_FLUSH_CACHE_EXT > + ); > + ide_state->select |=3D 0x40; > ide_set_sector(ide_state, ((uint64_t)cmd_fis[10] << 40) > | ((uint64_t)cmd_fis[9] << 32) > /* This is used for LBA48 command= s */ I finally did my research on what is going on here and I understand the=20 problem now. LBA48, LBA28 and CHS commands all treat these registers=20 slightly differently, so our usage of "ide_set_sector" is a problem=20 because we are not providing enough information to the IDE core layer=20 for it to properly convert the sector number we give it back to the=20 proper registers. ide_set_sector cannot fill the registers properly without other guiding information to help it choose between LBA48, LBA28=20 and CHS modes of operation. The approach you have taken is to adjust the register values (manually)=20 by inspecting the CMD type and coercing ide_set_sector to behave. I think we can probably do this a different way: ide_set_sector takes an=20 LBA value and decomposes it into its constituent registers. The way we=20 use it here is to combine registers manually, then decompose them again=20 in ide_set_sector. What we should do here instead is to set all of the component IDE=20 registers directly from the incoming FIS packet and then we can skip=20 both this part of the patch /and/ delete the call to ide_set_sector=20 entirely, which should clean the code up a good deal. Normally the FIS packet decomposition would be a function of a SATA=20 compliance layer, but we don't have a SATA layer, so it falls on AHCI to=20 do so. Specifically, we should be able to set ide_state->select equal to=20 cmd_fis[7] which will handle enabling LBA over CHS for us (provided the=20 packet received from the guest is correct). We can also set the sector,=20 lcyl, hcyl, hob_sector, hob_lcyl and hob_hcyl fields. Then, the IDE core layer will handle most of the necessary=20 back-and-forth conversions for us automatically. Please let me know if you'd like to take over this fix for you or not; I=20 am adjusting a lot of the AHCI code and I could adopt your patch into a=20 forthcoming patchset fairly easily. Thank you very much for reporting and fixing this; if you find issues=20 with AHCI misbehaving in the future, please CC me on any future patches. > diff --git a/hw/ide/core.c b/hw/ide/core.c > index db191a6..988a935 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -445,9 +445,23 @@ void ide_transfer_stop(IDEState *s) > s->status &=3D ~DRQ_STAT; > } > > +#define DEBUG_SECTOR 1 > + Please don't submit patches that turn on debug statements invariably.=20 You can just omit this definition if you feel the other prints are worth=20 keeping. > +#if DEBUG_SECTOR > +#define DPRINTF(fmt, ...) \ > + do { printf("debug_sector: " fmt , ## __VA_ARGS__); } while (0) > +#else > +#define DPRINTF(fmt, ...) > +#endif > + > +/* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.= */ > +#define ACPI_DATA_SIZE 0x10000 > +#define BIOS_CFG_IOPORT 0x510 Did this get pulled in from elsewhere? > + > int64_t ide_get_sector(IDEState *s) > { > int64_t sector_num; > + > if (s->select & 0x40) { > /* lba */ > if (!s->lba48) { > @@ -464,12 +478,18 @@ int64_t ide_get_sector(IDEState *s) > sector_num =3D ((s->hcyl << 8) | s->lcyl) * s->heads * s->sec= tors + > (s->select & 0x0f) * s->sectors + (s->sector - 1); > } > +#if DEBUG_SECTOR > + DPRINTF("get_sector: %lx\n", sector_num); > +#endif > return sector_num; > } > > void ide_set_sector(IDEState *s, int64_t sector_num) > { > unsigned int cyl, r; > +#if DEBUG_SECTOR > + DPRINTF("set_sector: %lx\n", sector_num); > +#endif > if (s->select & 0x40) { > if (!s->lba48) { > s->select =3D (s->select & 0xf0) | (sector_num >> 24); > Once we confirm the bug is fixed, most of these debug prints are perhaps=20 no longer necessary. Thanks! --=20 =E2=80=94js