[Qemu-devel] ARM Cortex-M issues

Discussion:

Bill Paul

2016-08-29 17:59:50 UTC

I recently started tinkering with ChibiOS as part of a small personal project
and wanted to test some of the demo configurations that it has with a machine
simulator before going all in on a reference board. I decided to try QEMU
2.6.0. I was mainly interested in an ARM Cortex-M machine, so I figured I
would try the Netduino 2 or Stellaris board models.

Unfortunately it's been a frustrating experience because there seem to be
several key places where QEMU's hardware emulation diverges from reality. The
ChibiOS examples often seem to depend on behavior that is valid for actual
hardware but which is either broken or just missing in QEMU. Some of these
issues are board-specific, but the last one seems a bit more general.

The Netduino 2 is built using an ST Micro STM32F205 chip. This is a Cortex-M3
with ST Micro-specific peripherals. ChibiOS has an example for the STM32F207
which I figured was close enough to try.

The first problem is that the stm32f2xx_usart.c does not fully emulate the
hardware. In particular it's missing support for the TX fifo empty state
(TXE). There is no definition for the TXEIE bit and the logic for handling the
TXE interrupt is broken. ChibiOS waits to see when the TXE bit is set before
writing a character out; it's basically waiting to make sure the transmitter
is ready before sending any new data. Since this logic is not implemented
correctly, ChibiOS stalls when trying to output anything to the serial port.

I managed to hack my way around that, but then I ran into the second problem:
the stm32f2xx_timer.c also doesn't fully emulate the hardware. It's missing
support for output compare mode, which ChibiOS also depends on. Unfortunately
I wasn't quite able to fix this myself because the ST Micro documentation
concerning the STM32F2xx timer implementation is a bit hard to follow, so it
wasn't clear to me just what the correct logic should be. (Implementing timers
in QEMU seems a bit complicated too.) I got it to limp along, but couldn't
quite get it exactly right. I actually got to the point where the ChibiOS
tests would run, but not reliably.

So I gave up on the STM32F2xx model and decided to try just the generic
Cortex-M mode. This example uses just the systick counter which is part of the
system control block (SCB) in the Cortex-M core and runs a couple of simple
threads that just wake up to count seconds and minutes.

When I tried to run this example with the Netduino 2 machine, QEMU crashed and
dumped core as soon as the example finished initializing the systick
registers.

After some research, I discovered the following:

https://bugs.launchpad.net/qemu/+bug/696094

Note that it mentions the system_clock_scale global variable in armv7m_nvic.c
not being initialized. However this bug specifically references the Stellaris
lm3s811evb board mode. It seems this has been fixed for the Stellaris machine
simulation, but not for the Netduino 2.

So I switched to the Stellaris lm3s811evb machine model for my testing. Now
QEMU wouldn't dump core anymore, but ChibiOS would hang after only one timer
interrupt. I found it was stuck in _port_exit_from_isr.

Apparently what happens here is that ChibiOS/RT is using an interesting
preemption model where it uses a software-triggered non-maskable interrupt to
trigger a context switch. Prior to calling _port_exit_from_isr, ChibiOS locks
out interrupts (wuth a "cspid i" instruction), which the comments indicate is
to ensure atomic operation. The _port_exit_from_isr then triggers a software
NMI by writing to the ICSR register in the SCB to set the NMIPENDSET bit, and
then it falls through to an infinite loop. The expectation of course is that
the NMI handler will run. but it doesn't. Instead ChibiOS just stalls forever
at the loop instruction.

Apparently the simulation doesn't honor the notion that the non-maskable
interrupt should be non-maskable. The NMI interrupt is handled by arm_gic.c,
which calls qemu_set_irq(), however the target-arm/cpu.c code ignores it
because the PSTATE_I bit is not set in the CPU state. Yes, we have executed a
"cspid i" instruction to mask interrupts, but no that shouldn't prevent the
NMI from being delivered.

This problem in particular has me baffled, and I'm forced to ask questions. It
seems like the ARM CPU simulation doesn't support the notion of NMI at all. (I
think it would be a separate input from IRQ and FIQ, no?) How is this supposed
to work? It seems like the Cortex-M hardware allows for software-generated
NMIs, so why isn't this supported? How is this supposed to work on other ARM
CPUs like Cortex-A? Is it really the case that nobody has tried this before?

ChibiOS/RT seems to support an alternate preemption scheme that uses the
PendSV interrupt instead of NMI, and if I compile it to use that mode, then
the example works with the Stellaris machine model. That feels like a hack
though: shouldn't a software NMI just work?

-Bill

--
=============================================================================
-Bill Paul (510) 749-2329 | Senior Member of Technical Staff,
***@windriver.com | Master of Unix-Fu - Wind River Systems
=============================================================================
"I put a dollar in a change machine. Nothing changed." - George Carlin
=============================================================================

Liviu Ionescu

2016-08-29 19:19:42 UTC

Permalink

I recently started tinkering with ChibiOS as part of a small personal project ...

I did most of the development for the µOS++/CMSIS++ (http://micro-os-plus.github.io) on STM32F4DISCOVERY board, emulated by GNU ARM Eclipse QEMU, which implements even animated LEDs on a graphical image of the board.

FreeRTOS also works properly on the emulator, both the M0 and M3 ports.

As for Cortex-M implementation, there are many improvements in the GNU ARM Eclipse QEMU fork (http://gnuarmeclipse.github.io/qemu/), including an Eclipse debug plug-in to start it; it may be worth giving it a try for ChibiOS too.

Regards,

Liviu

Bill Paul

2016-08-29 20:30:51 UTC

Permalink

Of all the gin joints in all the towns in all the world, Liviu Ionescu had to

Post by Liviu Ionescu

I recently started tinkering with ChibiOS as part of a small personal project ...

I did most of the development for the µOS++/CMSIS++
(http://micro-os-plus.github.io) on STM32F4DISCOVERY board, emulated by
GNU ARM Eclipse QEMU, which implements even animated LEDs on a graphical
image of the board.
FreeRTOS also works properly on the emulator, both the M0 and M3 ports.
As for Cortex-M implementation, there are many improvements in the GNU ARM
Eclipse QEMU fork (http://gnuarmeclipse.github.io/qemu/), including an
Eclipse debug plug-in to start it; it may be worth giving it a try for
ChibiOS too.

I think I've been down this road already with Xilinx ARM support. ("We have
our own fork of QEMU For the MPSoC parts!" "It seems to have diverged a lot of
from the mainline. Also you've only been testing the ARM builds and those only
on Linux hosts, and now the code has bitrotted." "Yeah... but... we're going
to submit our changes to upstream Real Soon Now (tm).")

Note that I'm not suggesting the ARM Eclipse code suffers from bitrot. I'll
give it a try. I just wish all of this was in once place.

Also, from a cursory look at the code, it doesn't look like this fork handles
the NMI interrupt any better than the mainline.

The Cortex-M model has an explicit NMI vector in its vector table (#2). It's
possible to trigger this interrupt via software by writing a 1 to the
NMIPENDSET bit in the ICSR register in the System Control Block (which seems
to be a Cortex-M-specific thing).

Currently this vector is treated just like any other IRQ. The problem is that
means it is also subject to the case where CPSR_I is masked off in the CPU,
which for the NMI is wrong. (How can you mask that which is unmaskabkle?)

From looking at how things are structured, I think the only way to make it
work is to give the target-arm/cpu.c code a separate external NMI pin (e.g.
CPU_INTERRUPT_NMI) and make the arm_gic.c smart enough to trigger that pin
instead of the IRQ or FIQ pins when the NMI is triggered. The handling for
that pin could then be special-cased not to ignore the state of CPSR_I.

But that was just from a quick look last night while I was experimenting. I
don't know if maybe there's a better way. This is why I'm here asking
questions. :)

-Bill

Post by Liviu Ionescu
Regards,
Liviu

Liviu Ionescu

2016-08-29 20:25:25 UTC

Permalink

Post by Bill Paul
I just wish all of this was in once place.

me too.

my sources are public, and I support anyone who wants to take parts of them to improve the main source tree.

unfortunately I do not have the resources to do this. :-(

Post by Bill Paul
Currently this vector is treated just like any other IRQ. The problem is that
means it is also subject to the case where CPSR_I is masked off in the CPU,
which for the NMI is wrong. (How can you mask that which is unmaskabkle?)

I guess you are right, I did not test NMI.

but I added support for BASEPRI, so I had make some improvements to interrupt processing. I guess a small patch can be applied to allow NMI to pass, even if interrupts are disabled.

regards,

Liviu

Fabien Chouteau

2016-08-30 08:23:29 UTC

Permalink

Post by Liviu Ionescu

I recently started tinkering with ChibiOS as part of a small personal project ...

I did most of the development for the µOS++/CMSIS++ (http://micro-os-plus.github.io) on STM32F4DISCOVERY board, emulated by GNU ARM Eclipse QEMU, which implements even animated LEDs on a graphical image of the board.
FreeRTOS also works properly on the emulator, both the M0 and M3 ports.
As for Cortex-M implementation, there are many improvements in the GNU ARM Eclipse QEMU fork (http://gnuarmeclipse.github.io/qemu/), including an Eclipse debug plug-in to start it; it may be worth giving it a try for ChibiOS too.

There's also the fork from Pebble (the smartwatch): https://github.com/pebble/qemu
They seem to have a pretty good Cortex-M support and even I2C, SPI, GPIO...

Peter Maydell

2016-08-29 19:51:04 UTC

Permalink

Post by Bill Paul
Unfortunately it's been a frustrating experience because there seem to be
several key places where QEMU's hardware emulation diverges from reality. The
ChibiOS examples often seem to depend on behavior that is valid for actual
hardware but which is either broken or just missing in QEMU. Some of these
issues are board-specific, but the last one seems a bit more general.

Yes, our Cortex-M support is a bit undermaintained at the moment.
If you'd like to write patches to fix some of the bugs you're
encountering I'd be happy to review them, but I'm not aware of anybody
actively working on M profile right now. We could really use a
contributor who cares about it and has time to tackle improving it.
(A-profile ARM emulation is in much better shape.)

Post by Bill Paul
This problem in particular has me baffled, and I'm forced to ask questions. It
seems like the ARM CPU simulation doesn't support the notion of NMI at all. (I
think it would be a separate input from IRQ and FIQ, no?) How is this supposed
to work? It seems like the Cortex-M hardware allows for software-generated
NMIs, so why isn't this supported? How is this supposed to work on other ARM
CPUs like Cortex-A? Is it really the case that nobody has tried this before?

Our M profile interrupt code is really badly mismodelled. It was
written many years ago as a hack based on modifying the A profile
support and GIC code, but really M profile is different (the interrupt
and exception model is an integrated part of the CPU). A lot of
the bugs stem from this basic problem.

For A profile, IRQ and FIQ work fine -- there is no such thing as
NMI in A profile (sometimes OSes like Linux have an "NMI" concept
that is named from x86 that is mapped onto FIQ).

For M profile, NMI is just one of the many interrupt/exceptions;
we don't really get it right because we're mis-modelling this with
interrupts in an external interrupt controller and exceptions in
the CPU model.

There were some patches posted to the list last year (?) which had
a go at fixing this, but unfortunately they got stalled in code
review and the original submitter ran out of time/energy to finish
the job. Getting those sorted out and into master would be a good
first step.

Post by Bill Paul
ChibiOS/RT seems to support an alternate preemption scheme that uses the
PendSV interrupt instead of NMI, and if I compile it to use that mode, then
the example works with the Stellaris machine model. That feels like a hack
though: shouldn't a software NMI just work?

The reason for this kind of thing is that the original support was
done to support a specific RTOS, and so bugs which resulted in that
RTOS not working were found and fixed. Bugs which weren't exercised
by that RTOS remain lurking in the code, and if you try to use a
different RTOS guest then you can run into them. (This is less
obvious on the A profile cores because to a first approximation
nobody runs anything but Linux on them, but in the embedded world
there's still a fairly rich diversity of RTOSes which take different
approaches to how they prod the hardware.)

thanks
-- PMM

Peter Maydell

2016-08-29 20:26:31 UTC

Permalink

Post by Peter Maydell
There were some patches posted to the list last year (?) which had
a go at fixing this, but unfortunately they got stalled in code
review and the original submitter ran out of time/energy to finish
the job. Getting those sorted out and into master would be a good
first step.

Digging through my mail archive this is the patchset I meant:

https://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg00504.html

thanks
-- PMM

Bill Paul

2016-08-30 00:12:56 UTC

Permalink

Of all the gin joints in all the towns in all the world, Peter Maydell had to

Post by Peter Maydell

Post by Bill Paul
Unfortunately it's been a frustrating experience because there seem to be
several key places where QEMU's hardware emulation diverges from reality.
The ChibiOS examples often seem to depend on behavior that is valid for
actual hardware but which is either broken or just missing in QEMU. Some
of these issues are board-specific, but the last one seems a bit more
general.

I had a feeling you were going to say that. But I already fell for this trick
once when I started using FreeBSD, and then I ended up being a developer for
about 10 years. I'm older and wiser now. (Also I have a day job that consumes
most of my time.)

The best I might be able to do is patch the STM32 SUART driver so that it
supports the TX fifo empty interrupt. I'm really not sure how to fix the STM32
timer driver (like I said, the ST Micro documentation is really hard to
follow) and I'm not sure that any attempt to get the NMI to work would be any
less of a hack then what's there now.

[...]

Post by Peter Maydell
The reason for this kind of thing is that the original support was
done to support a specific RTOS, and so bugs which resulted in that
RTOS not working were found and fixed. Bugs which weren't exercised
by that RTOS remain lurking in the code, and if you try to use a
different RTOS guest then you can run into them. (This is less
obvious on the A profile cores because to a first approximation
nobody runs anything but Linux on them, but in the embedded world
there's still a fairly rich diversity of RTOSes which take different
approaches to how they prod the hardware.)

In other words it's half-baked. :(

-Bill

Post by Peter Maydell
thanks
-- PMM

Peter Maydell

2016-08-30 07:38:21 UTC

Permalink

Post by Bill Paul

In other words it's half-baked. :(

It's not like there's a publicly available test suite out there
that thoroughly exercises all the capabilities of the CPU and
devices for these SoCs; in the absence of that the only thing
you can do is test it on the RTOSes you have to hand.

thanks
-- PMM

Liviu Ionescu

2016-08-30 08:50:08 UTC

Permalink

... the only thing
you can do is test it on the RTOSes you have to hand.

which, in the end, is a highly effective way. it does not identifies bugs related to features not used by the RTOS, but otherwise it is even more effective in catching bugs, than when using real hardware, especially when the RTOS is under construction.

some will probably argue agains this statement, but in the last 6 moths I used QEMU **extensively** to develop µOS++, finally running 8h+ endurance tests.

not only that QEMU is incredibly robust, but due to the inherent high jitter of the timers (SysTick in my case), combined with the very high speed of the emulation, it increases the chance for interrupts to occur in "unexpected" moments, and so identify badly placed critical sections inside the RTOS (the common source of concern in young RTOSes).

it is true that I spent quite a lot of time on GNU ARM Eclipse QEMU, but it was worth every penny; I could not imagine developing µOS++ without the scripts running multiple semihosted tests in a loop, for hours and hours. you simply cannot do this effectively on a real hardware.

so, Bill, if NMI is the only missing feature for your use case, I suggest fixing it and running your RTOS tests on QEMU.

regards,

Liviu