From: Blue Swirl <blauwirbel@gmail.com>
To: Artyom Tarasenko <atar4qemu@googlemail.com>
Cc: qemu-devel@nongnu.org, Paul Brook <paul@codesourcery.com>
Subject: [Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4)
Date: Tue, 25 May 2010 19:56:03 +0000 [thread overview]
Message-ID: <AANLkTinD5gS4cSlxeR_-bxlmErNT3uvdEgibTtFCxtOo@mail.gmail.com> (raw)
In-Reply-To: <AANLkTilx4POmkd_KNL5PsOgW6MYbAtuYHpPf4LRye96D@mail.gmail.com>
On Tue, May 25, 2010 at 5:00 PM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
> 2010/5/21 Blue Swirl <blauwirbel@gmail.com>:
>> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko
>> <atar4qemu@googlemail.com> wrote:
>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>> On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>>>
>>>>> > On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>> >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>> >> > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>> >> >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>> >> >>
>>>>> >> >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>> >> >> >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>>>> >> >> >> Software shouldn't use aliased addresses, neither should it crash
>>>>> >> >> >> when it uses (on the real hardware it wouldn't). Using empty_slot
>>>>> >> >> >> instead of aliasing can help with debugging such accesses.
>>>>> >> >> >
>>>>> >> >> > TurboSPARC Microprocessor User's Manual shows that there are
>>>>> >> >> > additional pages after the main IOMMU for AFX registers. So this is
>>>>> >> >> > not board specific, but depends on CPU/IOMMU versions.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>>>> >> >> SS-20 doesn't have any aliasing.
>>>>> >> >
>>>>> >> > But are your machines equipped with TurboSPARC or some other CPU?
>>>>> >>
>>>>> >>
>>>>> >> Good point, I must confess, I missed the word "Turbo" in your first
>>>>> >> answer. LX and SS-20 don't.
>>>>> >> But SS-5 must have a TurboSPARC CPU:
>>>>> >>
>>>>> >> ok cd /FMI,MB86904
>>>>> >> ok .attributes
>>>>> >> context-table 00 00 00 00 03 ff f0 00 00 00 10 00
>>>>> >> psr-implementation 00000000
>>>>> >> psr-version 00000004
>>>>> >> implementation 00000000
>>>>> >> version 00000004
>>>>> >> cache-line-size 00000020
>>>>> >> cache-nlines 00000200
>>>>> >> page-size 00001000
>>>>> >> dcache-line-size 00000010
>>>>> >> dcache-nlines 00000200
>>>>> >> dcache-associativity 00000001
>>>>> >> icache-line-size 00000020
>>>>> >> icache-nlines 00000200
>>>>> >> icache-associativity 00000001
>>>>> >> ncaches 00000002
>>>>> >> mmu-nctx 00000100
>>>>> >> sparc-version 00000008
>>>>> >> mask_rev 00000026
>>>>> >> device_type cpu
>>>>> >> name FMI,MB86904
>>>>> >>
>>>>> >> and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>>>>> >>
>>>>> >> ok 10000000 20 spacel@ .
>>>>> >> 4000009
>>>>> >> ok 14000000 20 spacel@ .
>>>>> >> 4000009
>>>>> >> ok 14000004 20 spacel@ .
>>>>> >> 23000
>>>>> >> ok 1f000004 20 spacel@ .
>>>>> >> 23000
>>>>> >> ok 10000008 20 spacel@ .
>>>>> >> 4000009
>>>>> >> ok 14000028 20 spacel@ .
>>>>> >> 4000009
>>>>> >> ok 1000000c 20 spacel@ .
>>>>> >> 23000
>>>>> >> ok 10000010 20 spacel@ .
>>>>> >> 4000009
>>>>> >>
>>>>> >>
>>>>> >> LX is the same except for the IOMMU-version:
>>>>> >>
>>>>> >> ok 10000000 20 spacel@ .
>>>>> >> 4000005
>>>>> >> ok 14000000 20 spacel@ .
>>>>> >> 4000005
>>>>> >> ok 18000000 20 spacel@ .
>>>>> >> 4000005
>>>>> >> ok 1f000000 20 spacel@ .
>>>>> >> 4000005
>>>>> >> ok 1ff00000 20 spacel@ .
>>>>> >> 4000005
>>>>> >> ok 1fff0004 20 spacel@ .
>>>>> >> 1fe000
>>>>> >> ok 10000004 20 spacel@ .
>>>>> >> 1fe000
>>>>> >> ok 10000108 20 spacel@ .
>>>>> >> 41000005
>>>>> >> ok 10000040 20 spacel@ .
>>>>> >> 41000005
>>>>> >> ok 1fff0040 20 spacel@ .
>>>>> >> 41000005
>>>>> >> ok 1fff0044 20 spacel@ .
>>>>> >> 1fe000
>>>>> >> ok 1fff0024 20 spacel@ .
>>>>> >> 1fe000
>>>>> >>
>>>>> >>
>>>>> >> >> At what address the additional AFX registers are located?
>>>>> >> >
>>>>> >> > Here's complete TurboSPARC IOMMU address map:
>>>>> >> > PA[30:0] Register Access
>>>>> >> > 1000_0000 IOMMU Control R/W
>>>>> >> > 1000_0004 IOMMU Base Address R/W
>>>>> >> > 1000_0014 Flush All IOTLB Entries W
>>>>> >> > 1000_0018 Address Flush W
>>>>> >> > 1000_1000 Asynchronous Fault Status R/W
>>>>> >> > 1000_1004 Asynchronous Fault Address R/W
>>>>> >> > 1000_1010 SBus Slot Configuration 0 R/W
>>>>> >> > 1000_1014 SBus Slot Configuration 1 R/W
>>>>> >> > 1000_1018 SBus Slot Configuration 2 R/W
>>>>> >> > 1000_101C SBus Slot Configuration 3 R/W
>>>>> >> > 1000_1020 SBus Slot Configuration 4 R/W
>>>>> >> > 1000_1050 Memory Fault Status R/W
>>>>> >> > 1000_1054 Memory Fault Address R/W
>>>>> >> > 1000_2000 Module Identification R/W
>>>>> >> > 1000_3018 Mask Identification R
>>>>> >> > 1000_4000 AFX Queue Level W
>>>>> >> > 1000_6000 AFX Queue Level R
>>>>> >> > 1000_7000 AFX Queue Status R
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>>>>> >> well above this limit.
>>>>> >
>>>>> > Oh, so I also misread something. You are not talking about the
>>>>> > adjacent pages, but 16MB increments.
>>>>> >
>>>>> > Earlier I sent a patch for a generic address alias device, would it be
>>>>> > useful for this?
>>>>>
>>>>>
>>>>> Should do as well. But I thought empty_slot is less overhead and
>>>>> easier to debug.
>>>>>
>>>
>>> Also the aliasing patch would require one more parameter: the size of
>>> area which has to be aliased. Except we implement stubs for all
>>> missing devices and and do aliasing of the connected port ranges. And
>>> then again, SS-20 doesn't have aliasing in this area at all.
>>>
>>> What do you think about this (empty_slot) solution (except that I
>>> missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.
>>
>> I'm slightly against it, of course it would help for this but I think
>> we may be missing a bigger problem.
>>
>>>>>> Maybe we have a general design problem, perhaps unassigned access
>>>>>> faults should only be triggered inside SBus slots and ignored
>>>>>> elsewhere. If this is true, generic Sparc32 unassigned access handler
>>>>>> should just ignore the access and special fault generating slots
>>>>>> should be installed for empty SBus address ranges.
>>>
>>> Agreed that they should be special for SBus, because SS-20 OBP is
>>> not happy with the fault we are currently generating. But otherwise I think qemu
>>> does it correct. On SS-5:
>>>
>>> ok f7ff0000 2f spacel@ .
>>> Data Access Error
>>> ok sfar@ .
>>> f7ff0000
>>> ok 20000000 2f spacel@ .
>>> Data Access Error
>>> ok sfar@ .
>>> 20000000
>>> ok 40000000 20 spacel@ .
>>> Data Access Error
>>> ok sfar@ .
>>> 40000000
>>>
>>> Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range, right?
>>
>> 40000000 is on SS-5.
>
> Ah. I was only aware of the control space. What ranges does SBus take?
On SS-5, 30000000 to 7fffffff, each slot taking 10000000. There's AFX
bus on 20000000.
OBP property '/iommu/sbus/ranges' shows these (also other ranges)
>
>> So is the SBus Control Space in 0x10000000 to
>> 0x1fffffff the only area besides DRAM where the accesses won't trap?
>
> At least some area after ROM is aliased too. Also on SS-10 with a
> non-active frame buffer
> writing to SX registers makes no visible effect and reading from them
> produces no fault but a NMI.
Then we should cover the whole area after IOMMU with empty slot
device. ROM probably doesn't matter.
>>>>> My impression was that SS-5 and SS-20 do unassigned accesses a bit differently.
>>>>> The current IOMMU implementation fits SS-20, which has no aliasing.
>>>>
>>>> It's probably rather the board design than just IOMMU.
>>>
>>> Agreed. That's why I bound the patch to machine hwdef and not to iommu.
>>>
>>>>> >> >> > One approach would be that IOMMU_NREGS would be increased to cover
>>>>> >> >> > these registers (with the bump in savevm version field) and
>>>>> >> >> > iommu_init1() should check the version field to see how much MMIO to
>>>>> >> >> > provide.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> The problem I see here is that we already have too much registers: we
>>>>> >> >> emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>>>> >> >> 0x20 registers which are aliased all the way.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> > But in order to avoid the savevm version change, iommu_init1() could
>>>>> >> >> > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>>>>> >> >> > if the read back data matches what has been written earlier. Because
>>>>> >> >> > from OBP point of view this is identical to what your patch results
>>>>> >> >> > in, I'd suppose this approach would also work.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>>>>> >> >> SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>>>>> >> >> during the installation, so it is currently not possible to install 4.1.4.
>>>>> >> >> Surprisingly "GENERIC" kernel which is on the disk after the
>>>>> >> >> installation doesn't
>>>>> >> >> try to access these address ranges either, so a disk image taken from a live
>>>>> >> >> system works.
>>>>> >> >>
>>>>> >> >> Actually access to the non-connected/aliased addresses may also be a
>>>>> >> >> consequence of phys_page_find bug I mentioned before. When I run
>>>>> >> >> install with -m 64 and -m 256 it tries to access different
>>>>> >> >> non-connected addresses. May also be a SunOS bug of course. 256m used
>>>>> >> >> to be a lot back then.
>>>>> >> >
>>>>> >> > Perhaps with 256MB, memory probing advances blindly from memory to
>>>>> >> > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>>>>> >> > results :-). If this is true, 64M, 128M and 192M should show identical
>>>>> >> > results and only with close or equal to 256M the accesses happen.
>>>>> >>
>>>>> >>
>>>>> >> 32m: 0x12fff294
>>>>> >> 64m: 0x14fff294
>>>>> >> 192m:0x1cfff294
>>>>> >> 256m:0x20fff294
>>>>> >>
>>>>> >> Memory probing? It would be strange that OS would do it itself. The OS
>>>>> >> could just
>>>>> >> ask OBP how much does it have. Here is the listing where it happens:
>>>>> >>
>>>>> >> _swift_vac_rgnflush: rd %psr, %g2
>>>>> >> _swift_vac_rgnflush+4: andn %g2, 0x20, %g5
>>>>> >> _swift_vac_rgnflush+8: mov %g5, %psr
>>>>> >> _swift_vac_rgnflush+0xc: nop
>>>>> >> _swift_vac_rgnflush+0x10: nop
>>>>> >> _swift_vac_rgnflush+0x14: mov 0x100, %g5
>>>>> >> _swift_vac_rgnflush+0x18: lda [%g5] 0x4, %g5
>>>>> >> _swift_vac_rgnflush+0x1c: sll %o2, 0x2, %g1
>>>>> >> _swift_vac_rgnflush+0x20: sll %g5, 0x4, %g5
>>>>> >> _swift_vac_rgnflush+0x24: add %g5, %g1, %g5
>>>>> >> _swift_vac_rgnflush+0x28: lda [%g5] 0x20, %g5
>>>>> >>
>>>>> >> _swift_vac_rgnflush+0x28: is the fatal one.
>>>>> >>
>>>>> >> kadb> $c
>>>>> >> _swift_vac_rgnflush(?)
>>>>> >> _vac_rgnflush() + 4
>>>>> >> _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>>>>> >> _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>>>>> >> _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>>>> >>
>>>>> >> Unfortunately (but not surprisingly) kadb doesn't allow debugging
>>>>> >> cache-flush code, so I can't check what is in
>>>>> >> [%g5] (aka sfar) on the real machine when this happens.
>>>>> >
>>>>> > Linux code for Swift/TurboSPARC VAC flush should be similar.
>>>
>>> Do you have an idea why would anyone try reading a value referenced in sfar?
>>> Especially during flushing? I can't imagine a case where it wouldn't
>>> produce a fault.
>>
>> No idea, the fault should be inevitable. An explanation how VAC
>> (Virtually Addressed Cache?) works could help.
>
> Is it available somewhere? An explanation how PAC works is interesting
> too, cause when emulating SS-20, Solaris boots hangs where it normally
> says that PAC is initialized.
>
>>>>> >> But the bug in phys_page_find would explain this accesses: sfar gets
>>>>> >> the wrong address, and then the secondary access happens on this wrong
>>>>> >> address instead of the original one.
>>>>> >
>>>>> > I doubt phys_page_find can be buggy, it is so vital for all architecture.
>>>>>
>>>>>
>>>>> But you've seen the example of buggy behaviour I posted last Friday, right?
>>>>> If it's not phys_page_find, it's either cpu_physical_memory_rw (which
>>>>> is also pretty generic), or
>>>>> the way SS-20 registers devices. Can it be that all the pages must be
>>>>> registered in the proper order?
>>>>
>>>> How about unassigned access handler, could it be suspected?
>>>
>>> Doesn't look like it: it gets a physical address as a parameter. How
>>> would it know the address is wrong?
>>
>> It wouldn't, but IIRC Paul claimed earlier that the unassigned memory
>> handling in QEMU could have problems.
>
> But I thought Paul also fixed the problems? There was a patch from him.
>
>>>>> I think it's a pretty rare use case where you have a memory fault (not
>>>>> a translation fault) on an unknown address. You may have such fault
>>>>> during device probing, but in such case you know what address you are
>>>>> probing, so you don't care about the sync fault address register.
>>>>>
>>>>> Besides, do all architectures have sync fault address register?
>>>>
>>>> No, I think system level checks like that and IOMMU-like controls on
>>>> most architectures are very poor compared to Sparc32. Server and
>>>> mainframe systems may be a bit better.
>>>
>>> And do we have any mainframe emulated good enough to have a user base
>>> and hence bug reports?
>>
>> The only IOMMU implemented is Sparc32 one so far. I don't know about
>> S390x architecture, that should definitely be mainframe class. AMD
>> IOMMU may be in QEMU one day.
>>
>> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU.
>
> What does indicate it? It happens where the disk sizes are normally
> reported, so it could be a scsi/dma/irq/fpu issue as well.
IIRC the DVMA address was 0xfc004000, but the mapped entries were for
0xfc000000 to 0xfc003fff.
>
>>>>> >> fwiw the routine is called only once on the real hardware. It sort of
>>>>> >> speaks for your hypothesis about the memory probing. Although it may
>>>>> >> not necessarily probe for memory...
>>>>> >>
>
>
> --
> Regards,
> Artyom Tarasenko
>
> solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
next prev parent reply other threads:[~2010-05-25 19:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-08 14:10 [Qemu-devel] [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4) Artyom Tarasenko
2010-05-08 14:10 ` [Qemu-devel] [PATCH 2/2] reformatted SS-5 and LX definitions Artyom Tarasenko
2010-05-09 7:44 ` [Qemu-devel] " Blue Swirl
2010-05-09 7:30 ` [Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4) Blue Swirl
2010-05-09 8:29 ` Artyom Tarasenko
2010-05-09 8:48 ` Blue Swirl
2010-05-09 22:32 ` Artyom Tarasenko
2010-05-10 18:27 ` Blue Swirl
2010-05-10 20:51 ` Artyom Tarasenko
2010-05-10 21:05 ` Blue Swirl
2010-05-21 17:23 ` Artyom Tarasenko
2010-05-21 21:12 ` Blue Swirl
2010-05-25 17:00 ` Artyom Tarasenko
2010-05-25 19:56 ` Blue Swirl [this message]
2010-05-26 19:04 ` Artyom Tarasenko
2010-05-27 16:34 ` Bob Breuer
2010-05-28 21:53 ` Artyom Tarasenko
2010-05-29 8:23 ` Blue Swirl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTinD5gS4cSlxeR_-bxlmErNT3uvdEgibTtFCxtOo@mail.gmail.com \
--to=blauwirbel@gmail.com \
--cc=atar4qemu@googlemail.com \
--cc=paul@codesourcery.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).