Discussion:
[En-Nut-Discussion] STM32F756 strange effect
Holger Mai
2016-04-08 08:09:29 UTC
Permalink
Hello all
since a few days i had a working NutOS 5.2.4 (from a repository snapshot ~3
weeks old) with a webserver app, based on an Olimex STM32E407 board, MCU
replaced by a STM32F576. All own application parts works fine, but the OS has a
strange effect: only every second network request will be answered. I.e., you
ping the device, only every second ping is replyed, the others runs in a
timeout.
Has anybody an idea how to fix or debug this?

The same OS-version works fine with the orininal STM32F407 on an other (same
type) board.
IDE is STM32 System workbench, toolchain ARM-GCC embedded 5.0.2 (support for
Cortex-M7)
Linkerscript is modified to use RAM from 0x20010000, same value used in
configurator, "Memory Start" entry in Kernel section, because the first 64kb RAM
is not accesible from DMA (Data RAM, core access only)

Another thing is that the default config for Kernel/Main Thread Stack size is
too low, it fails with F756 (produces some different usage and hard faults at
init phase). However, if i set this to 2048 or 3072 bytes, the OS runs.


mit freundlichen Grüßen / Best Regards

Holger Mai

***@gemac-chemnitz.de



GEMAC - Gesellschaft für Mikroelektronik-
anwendung Chemnitz mbH
Zwickauer Straße 227
D-09116 Chemnitz
Tel. +49 371 3377 - 0
Fax +49 371 3377 272
UST-ID: DE140851265
HRB 6443 Chemnitz/Stadt
Geschäftsführer: Dirk Hübner / Karsten Grönwoldt
http://www.gemac-chemnitz.de
_______________________________________________
http://lists.egn
Uwe Bonnes
2016-04-08 09:09:22 UTC
Permalink
Holger> Hello all since a few days i had a working NutOS 5.2.4 (from a
Holger> repository snapshot ~3 weeks old) with a webserver app, based on
Holger> an Olimex STM32E407 board, MCU replaced by a STM32F576. All own
Holger> application parts works fine, but the OS has a strange effect:
Holger> only every second network request will be answered. I.e., you
Holger> ping the device, only every second ping is replyed, the others
Holger> runs in a timeout. Has anybody an idea how to fix or debug
Holger> this?

I have a Nucleo-f746 and have run app/httpd some weeks ago. Can this ping
test be done with app/httpd?

Otherwise the M7 has more complex caching. I activated caching some weeks
ago and maybe this exposes some problems, with e.g. DMA shuffling data to
RAM and CPU still working with old data in cache. Please try with caching
disabled ( Architecture-> CM3 -> STM32 Family -> STM32 Clock and system
settings -> Instruction prefetch/Unified flash acceleration buffer).

Probably DMA needs to set some barriers at the right moment. Help is
welcome.

Holger> The same OS-version works fine with the orininal STM32F407 on an
Holger> other (same type) board. IDE is STM32 System workbench,
Holger> toolchain ARM-GCC embedded 5.0.2 (support for Cortex-M7)

I use the lastest GCC found at https://launchpad.net/gcc-arm-embedded
This seems to be the reference for gcc on arm.

Holger> Linkerscript is modified to use RAM from 0x20010000, same value
Holger> used in configurator, "Memory Start" entry in Kernel section,
Holger> because the first 64kb RAM is not accesible from DMA (Data RAM,
Holger> core access only)

SRAM setup for F7 is complex. The present setup worked for me with Nucleo-F7
and Discovery F7 on app/uart and app/httpd for a coarse test :-)

Can you proposed and explain a better setup?

Holger> Another thing is that the default config for Kernel/Main Thread
Holger> Stack size is too low, it fails with F756 (produces some
Holger> different usage and hard faults at init phase). However, if i
Holger> set this to 2048 or 3072 bytes, the OS runs.

Again, contribution is welcome. Hopefully this stack setup is a
configuration setting...

Cheers
--
Uwe Bonnes ***@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 1623569 ------- Fax. 06151 1623305 ---------
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion
Uwe Bonnes
2016-04-12 15:08:18 UTC
Permalink
...
Holger> Linkerscript is modified to use RAM from 0x20010000, same value
Holger> used in configurator, "Memory Start" entry in Kernel section,
Holger> because the first 64kb RAM is not accesible from DMA (Data RAM,
Holger> core access only)

That seems wrong. Heap, Stack annd Data should be in DTCM by default to
allow fast and eventually 64-bit access. DTCM is also accessible via AHBS
bus for DMA. Defaults DTCM DMA however has priority over CPU DTCM access and
may in some places slow down CPU. But priority may be changed when needed.

I have carefully read DM00169764 "STM32F7 Series system architecture
overview" and have following conlusions:

1. Enable ICACHE/DCACHE only just before main to lower risk of
inconststancies, see Tips in the document above
2. As long as no external data memory is involved, I see no reason to
enable DCACHE at all. Access to DTCM/SRAM1 and SRAM2 is without wait
states. If no objections arise, I will disable DCACHE in the default
setup as configurable option.
3. If there is a need for DCACHE enable,
http://www.nuttx.org/doku.php?id=wiki:howtos:port-drivers_stm32f7
should be considered.
4. RAMFUNC should be loaded to ITCM and not RAM, to use the additional
internal buses to minimize bus contention. This needs an additional
copy function in system setup and some variables in the linker.

Any comments?
--
Uwe Bonnes ***@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 1623569 ------- Fax. 06151 1623305 ---------
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion
Uwe Bonnes
2016-04-12 16:46:48 UTC
Permalink
Holger> Linkerscript is modified to use RAM from 0x20010000, same value
Holger> used in configurator, "Memory Start" entry in Kernel section,
Holger> because the first 64kb RAM is not accesible from DMA (Data RAM,
Holger> core access only)

Digging more in the configuration, this is where your problem starts. SRAM1
and SRAM2 on the F7 are connected to the AXIM bus and AXIM bus goes through
I/D cache. DTCM does not go through DCACHE, but can do DMA as opposed of
what http://www.nuttx.org/doku.php?id=wiki:howtos:port-drivers_stm32f7
tells.

So DMA with memory from DTCM does _not_ need to care for cache coherency.

When changing the RAM start address away from DTCM, you opened the can of
worm for cache coherency problems. We could solve by disabling DCACHE, at
the cost of much slower "const data" acess or by keeping the RAM layout.

Bye
--
Uwe Bonnes ***@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 1623569 ------- Fax. 06151 1623305 ---------
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion
Uwe Bonnes
2016-04-12 17:45:13 UTC
Permalink
...
Holger> Another thing is that the default config for Kernel/Main Thread
Holger> Stack size is too low, it fails with F756 (produces some
Holger> different usage and hard faults at init phase). However, if i
Holger> set this to 2048 or 3072 bytes, the OS runs.

Did you have large variables on the stack? That often causes such problems.

Bye
--
Uwe Bonnes ***@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 1623569 ------- Fax. 06151 1623305 ---------
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion
Holger Mai
2016-04-14 09:54:29 UTC
Permalink
Hi Uwe
thanks for your hints, they are helpful.
indeed, the problem was caused by SRAM1. If let start Nut RAM on DTCM, all works
fine. Sniffing the network packets of some Pings brings it up, that there is
some packet data corruption on the Network interface, if the NIF Buffers are
located in SRAM (> 0x20010000). But this looks like only a problem of the
ETH-DMA, even if DCache is enabled or disabled, the "normal" DMAs are not
touched from this effect (SD-Card and other drivers with standard DMAs works
well without any errors).
DCache is on startup always disabled (by default, must explicit be enabled).
There must be other reasons for the effect.
As first workaround to fix this effect:
in the Driver stm32_emac.h make shure, that the tx- and rx-Buffers and the
Buffer Descriptors are fully located in the DTCM Area. If they will be placed in
normal SRAM, the ETH corrupts packet Data, even if DCache is disabled or not!
Be aware, if youe place them manually with the section attribute to DTCM, and
also put the BufIdx vars into DTCM, then the initialisation of these two fails,
you must set them to 0 in the EmacInit function or create an own .data
equivalent section for DTCM.
I dont can find out, if that is caused by the specialized ETH-DMA or other
mechanisms. It is possible, that similarly effects come up from other
peripherals with own DMAs (USB OTG HS,LCD-TFT, DMA2D)
Next i will do is to send a support request to STM.

P.S. Cache-functions are in core_cm7.h, check the cache status by reading the
SCB->CCR register (Startup value is 0x40200 = DCache disabled)

mit freundlichen Grüßen /Best Regards

Holger Mai

***@gemac-chemnitz.de



GEMAC - Gesellschaft für Mikroelektronik-
anwendung Chemnitz mbH
Zwickauer Straße 227
D-09116 Chemnitz
Tel. +49 371 3377 - 0
Fax +49 371 3377 272
UST-ID: DE140851265
HRB 6443 Chemnitz/Stadt
Geschäftsführer: Dirk Hübner / Karsten Grönwoldt
http://www.gemac-chemnitz.de
_______________________________________________
http://lis
Uwe Bonnes
2016-04-14 10:36:48 UTC
Permalink
Holger> Hi Uwe thanks for your hints, they are helpful. indeed, the
Holger> problem was caused by SRAM1. If let start Nut RAM on DTCM, all
Holger> works fine. Sniffing the network packets of some Pings brings it
Holger> up, that there is some packet data corruption on the Network
Holger> interface, if the NIF Buffers are located in SRAM (>
Holger> 0x20010000).

That means buffers are allocated in SRAM1.

Holger> But this looks like only a problem of the ETH-DMA,
Holger> even if DCache is enabled or disabled, the "normal" DMAs are not
Holger> touched from this effect (SD-Card and other drivers with
Holger> standard DMAs works well without any errors).

Did these devices really allocate buffers in SRAM1/2?

Holger> DCache is on
Holger> startup always disabled (by default, must explicit be enabled).

D/I Cache are enabled as default since rev. 6359. The I/D Cache is now
settable since Rev. 6427. They are eventually enabled now much later since
Rev. 6426 to follow some hint in an ST app-note.

Holger> There must be other reasons for the effect. As first workaround
Holger> to fix this effect: in the Driver stm32_emac.h make shure, that
Holger> the tx- and rx-Buffers and the Buffer Descriptors are fully
Holger> located in the DTCM Area. If they will be placed in normal SRAM,
Holger> the ETH corrupts packet Data, even if DCache is disabled or not!

Interessting finding!

Holger> Be aware, if youe place them manually with the section attribute
Holger> to DTCM, and also put the BufIdx vars into DTCM, then the
Holger> initialisation of these two fails, you must set them to 0 in the
Holger> EmacInit function or create an own .data equivalent section for
Holger> DTCM. I dont can find out, if that is caused by the specialized
Holger> ETH-DMA or other mechanisms. It is possible, that similarly
Holger> effects come up from other peripherals with own DMAs (USB OTG
Holger> HS,LCD-TFT, DMA2D) Next i will do is to send a support request
Holger> to STM.

Holger> P.S. Cache-functions are in core_cm7.h, check the cache status
Holger> by reading the SCB-> CCR register
Holger> (Startup value is 0x40200 = DCache disabled)

Cache handling needs to be fixed. ChibiOS had some interesting writeup about
that:
http://chibios.org/dokuwiki/doku.php?id=chibios:articles:cortexm7_dma_guide

However I will probably not find time to implement the needed changes
anytime soon. An testing these features is another subject.

So as a workaround I think about restricting SRAM on F7 to use
only 64 kiByte DTCM Ram, by setting RAM0_LENGTH to 64 kbyte in
conf/arch/arch.nut. If static allocation exceeds the limits, user would be
warned by linker. Allocation on the heap however may however fail at runtime.

NutOS would already set up SRAM2 and SRAM2 sections in a STM32F7 linker
script. If users need more memory, they would need tp partition the RAM by
themself with the section argument. It RAM from SRAM1/2 is used for DMA,
user would need to care for cahce coherenncy for themself.

Rev 6425 now places RAMFUNC in ITCM-RAM, freeing some DTCM Ram for other use.

Bye
--
Uwe Bonnes ***@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 1623569 ------- Fax. 06151 1623305 ---------
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion
Continue reading on narkive:
Loading...