Discussion:
ARM GCC 4.4 Alignment Problems
(too old to reply)
Harald Kipp
2010-04-01 10:27:08 UTC
Permalink
Hi all,

It looks like the optimizer changed in GCC 4.4.2, specifically access to
packed structure elements.

At several places we may have to change

struct __attribute__ ((packed))...

into

__attribute__ ((packed, __may_alias__))

In Nut/OS packed structures often refer to elements within other packed
structures. As far as I understood, the __may_alias__ attribute tells
the compiler that it should _not_ assume that the structure itself is
placed at an aligned address.


Harald
Ole Reinhardt
2010-04-01 16:09:28 UTC
Permalink
Hi Harald,
Post by Harald Kipp
It looks like the optimizer changed in GCC 4.4.2, specifically access to
packed structure elements.
Just an aditional idea... What about an alignment trap handler? Indeed I
don't have any code, but I'm quite shure I saw some code on the web...
Will search again for it.

Bye,

Ole
--
Thermotemp GmbH, Embedded-IT

Embedded Hard-/ Software and Open Source Development,
Integration and Consulting

http://www.embedded-it.de

Gesch?ftsstelle Siegen - Steinstra?e 67 - D-57072 Siegen -
tel +49 (0)271 5513597, +49 (0)271-73681 - fax +49 (0)271 736 97

Hauptsitz - Hademarscher Weg 7 - 13503 Berlin
Tel +49 (0)30 4315205 - Fax +49 (0)30 43665002
Gesch?ftsf?hrer: J?rg Friedrichs, Ole Reinhardt
Handelsregister Berlin Charlottenburg HRB 45978 UstID DE 156329280
Harald Kipp
2010-04-01 18:06:25 UTC
Permalink
Hi Ole,
Post by Ole Reinhardt
Just an aditional idea... What about an alignment trap handler? Indeed I
don't have any code, but I'm quite shure I saw some code on the web...
Will search again for it.
What should this handler do in addition to what is already implemented
in nut/arch/arm/debug?

Harald
Ole Reinhardt
2010-04-02 22:21:14 UTC
Permalink
Hi!
Post by Harald Kipp
What should this handler do in addition to what is already implemented
in nut/arch/arm/debug?
It would not just call the debug code and produce a stack trace but will
"correct" the unaligned access. Let me use Bernds words:

"You either load an aligned 32 bit value into a register or load two
partial data into two registers, then shift and aggregate them."

This fixup needs to be done for every memory access instruction.

The program would not crash and will only work a little slower. For sure
unaligned access should be reported together with the current pc value
to allow easy error detection and correction by the developer.

A quite complex implementation can be found in the linux kernel at

arch/arm/mm/alignment.c

Bye,

Ole
--
Thermotemp GmbH, Embedded-IT

Embedded Hard-/ Software and Open Source Development,
Integration and Consulting

http://www.embedded-it.de

Gesch?ftsstelle Siegen - Steinstra?e 67 - D-57072 Siegen -
tel +49 (0)271 5513597, +49 (0)271-73681 - fax +49 (0)271 736 97

Hauptsitz - Hademarscher Weg 7 - 13503 Berlin
Tel +49 (0)30 4315205 - Fax +49 (0)30 43665002
Gesch?ftsf?hrer: J?rg Friedrichs, Ole Reinhardt
Handelsregister Berlin Charlottenburg HRB 45978 UstID DE 156329280
Ole Reinhardt
2010-04-02 23:54:08 UTC
Permalink
Hi!
Post by Ole Reinhardt
It would not just call the debug code and produce a stack trace but will
Sorry, forget (at least partly) what I've said... The alignment trap
feature ist only available for ARM CPUs providing the CP15 register with
MMU support.

So the Ethernut5 board with an arm9 would benefit from this handler, but
not the arm7 cpu.

Bye,

Ole
--
Thermotemp GmbH, Embedded-IT

Embedded Hard-/ Software and Open Source Development,
Integration and Consulting

http://www.embedded-it.de

Gesch?ftsstelle Siegen - Steinstra?e 67 - D-57072 Siegen -
tel +49 (0)271 5513597, +49 (0)271-73681 - fax +49 (0)271 736 97

Hauptsitz - Hademarscher Weg 7 - 13503 Berlin
Tel +49 (0)30 4315205 - Fax +49 (0)30 43665002
Gesch?ftsf?hrer: J?rg Friedrichs, Ole Reinhardt
Handelsregister Berlin Charlottenburg HRB 45978 UstID DE 156329280
Ole Reinhardt
2010-04-02 23:54:08 UTC
Permalink
Hi!
Post by Ole Reinhardt
It would not just call the debug code and produce a stack trace but will
Sorry, forget (at least partly) what I've said... The alignment trap
feature ist only available for ARM CPUs providing the CP15 register with
MMU support.

So the Ethernut5 board with an arm9 would benefit from this handler, but
not the arm7 cpu.

Bye,

Ole
--
Thermotemp GmbH, Embedded-IT

Embedded Hard-/ Software and Open Source Development,
Integration and Consulting

http://www.embedded-it.de

Gesch?ftsstelle Siegen - Steinstra?e 67 - D-57072 Siegen -
tel +49 (0)271 5513597, +49 (0)271-73681 - fax +49 (0)271 736 97

Hauptsitz - Hademarscher Weg 7 - 13503 Berlin
Tel +49 (0)30 4315205 - Fax +49 (0)30 43665002
Gesch?ftsf?hrer: J?rg Friedrichs, Ole Reinhardt
Handelsregister Berlin Charlottenburg HRB 45978 UstID DE 156329280
Ole Reinhardt
2010-04-02 22:21:14 UTC
Permalink
Hi!
Post by Harald Kipp
What should this handler do in addition to what is already implemented
in nut/arch/arm/debug?
It would not just call the debug code and produce a stack trace but will
"correct" the unaligned access. Let me use Bernds words:

"You either load an aligned 32 bit value into a register or load two
partial data into two registers, then shift and aggregate them."

This fixup needs to be done for every memory access instruction.

The program would not crash and will only work a little slower. For sure
unaligned access should be reported together with the current pc value
to allow easy error detection and correction by the developer.

A quite complex implementation can be found in the linux kernel at

arch/arm/mm/alignment.c

Bye,

Ole
--
Thermotemp GmbH, Embedded-IT

Embedded Hard-/ Software and Open Source Development,
Integration and Consulting

http://www.embedded-it.de

Gesch?ftsstelle Siegen - Steinstra?e 67 - D-57072 Siegen -
tel +49 (0)271 5513597, +49 (0)271-73681 - fax +49 (0)271 736 97

Hauptsitz - Hademarscher Weg 7 - 13503 Berlin
Tel +49 (0)30 4315205 - Fax +49 (0)30 43665002
Gesch?ftsf?hrer: J?rg Friedrichs, Ole Reinhardt
Handelsregister Berlin Charlottenburg HRB 45978 UstID DE 156329280
Bernd Walter
2010-04-01 18:06:56 UTC
Permalink
Post by Ole Reinhardt
Hi Harald,
Post by Harald Kipp
It looks like the optimizer changed in GCC 4.4.2, specifically access to
packed structure elements.
Just an aditional idea... What about an alignment trap handler? Indeed I
don't have any code, but I'm quite shure I saw some code on the web...
Will search again for it.
I'm not so much a fan about missaligned data.
In almost every case it is avoidable without much trouble.
One of the most annoying points with network data is the 6 bytes long
ethernet header, which usually garanties missaligned IP headers, but
usually this is avoided by setting RX buffers at a 2 byte offset or
copying the data before parsing if HW require 32bit aligned DMA RX
buffers - AT91SAM7X should be happy with 2 byte offset, AT91RM9200
are not and require copying.

Packed on ARM has different reasons - older ARM CPUs required bytes
and 16bit words to be on the same alignment because they had to
mask bytes out from 32bit memory operations and therefor structs
containing 3 bytes are 4 bytes long, so that an array of those structs
always start the same members at the same 32bit offset.
Usually this isn't a problem and it is also Ok with C-standards, but
parsing network data with structs can be a problem.
Since we don't need ABI compatibility with older ARM systems it
shouldn't be a problem to use -mstructure-size-boundary=8 (bits)
and don't use packed.
--
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Harald Kipp
2010-04-02 15:04:04 UTC
Permalink
Hi Bernd,

It's always a pleasure to see you jumping in when things become a bit
complicated.
Post by Bernd Walter
I'm not so much a fan about missaligned data.
Who is? ;-)
Post by Bernd Walter
In almost every case it is avoidable without much trouble.
One of the most annoying points with network data is the 6 bytes long
ethernet header, which usually garanties missaligned IP headers, but
usually this is avoided by setting RX buffers at a 2 byte offset or
copying the data before parsing if HW require 32bit aligned DMA RX
buffers - AT91SAM7X should be happy with 2 byte offset, AT91RM9200
are not and require copying.
Using a 2 byte offset is indeed an option worth to be evaluated.
Copying, however, is something that really consumes CPU power.
Post by Bernd Walter
Packed on ARM has different reasons - older ARM CPUs required bytes
and 16bit words to be on the same alignment because they had to
mask bytes out from 32bit memory operations and therefor structs
containing 3 bytes are 4 bytes long, so that an array of those structs
always start the same members at the same 32bit offset.
Not a real problem, but of course additional instructions are required
compared to 32-bit aligned elements.
Post by Bernd Walter
Usually this isn't a problem and it is also Ok with C-standards, but
parsing network data with structs can be a problem.
Since we don't need ABI compatibility with older ARM systems it
shouldn't be a problem to use -mstructure-size-boundary=8 (bits)
and don't use packed.
This option will reduce the total size of a structure, but it will not
pack its members.

typedef struct __attribute__((packed)) ether_header {
uint8_t ether_dhost[ETHER_ADDR_LEN];
uint8_t ether_shost[ETHER_ADDR_LEN];
uint16_t ether_type;
} ETHERHDR;

struct __attribute__((packed)) frame {
ETHERHDR hdr;
uint32_t data[8];
};

ETHERHDR is 14 bytes and struct frame is 8 * 4 + 14 = 46 bytes.

When removing packed and instead compiling with
-mstructure-size-boundary=8, then ETHERHDR is still 14 bytes only, but
struct frame will grow by 2 bytes, because data[] will become aligned.

The problem is not the size of structures, but the alignment of their
members.

Please correct me if I'm wrong, I'm just evaluating this stuff.

Harald
Bernd Walter
2010-04-02 19:01:42 UTC
Permalink
Post by Harald Kipp
Hi Bernd,
It's always a pleasure to see you jumping in when things become a bit
complicated.
Post by Bernd Walter
I'm not so much a fan about missaligned data.
Who is? ;-)
That's a good question.
Post by Harald Kipp
Post by Bernd Walter
In almost every case it is avoidable without much trouble.
One of the most annoying points with network data is the 6 bytes long
ethernet header, which usually garanties missaligned IP headers, but
usually this is avoided by setting RX buffers at a 2 byte offset or
copying the data before parsing if HW require 32bit aligned DMA RX
buffers - AT91SAM7X should be happy with 2 byte offset, AT91RM9200
are not and require copying.
Using a 2 byte offset is indeed an option worth to be evaluated.
Copying, however, is something that really consumes CPU power.
So does code for accessing potentially missaligned data.
You either load an aligned 32 bit value into a register or load two
partial data into two registers, then shift and aggregate them.
1 instruction, 1 memory access
or
5 instructions, 2 memory accesses and one scratchpad register.
Writing even requires read-modify-write cycles.
Some architecture can access missaligned data directly, but have
to do the additional memory cycles as well and fix them in hardware.
On systems with data cache copying is cheap compared to code bloat,
because it accesses data within same cacheline and uses burst access
to memory.
Copying is often the cheaper option.
Post by Harald Kipp
Post by Bernd Walter
Packed on ARM has different reasons - older ARM CPUs required bytes
and 16bit words to be on the same alignment because they had to
mask bytes out from 32bit memory operations and therefor structs
containing 3 bytes are 4 bytes long, so that an array of those structs
always start the same members at the same 32bit offset.
Not a real problem, but of course additional instructions are required
compared to 32-bit aligned elements.
Post by Bernd Walter
Usually this isn't a problem and it is also Ok with C-standards, but
parsing network data with structs can be a problem.
Since we don't need ABI compatibility with older ARM systems it
shouldn't be a problem to use -mstructure-size-boundary=8 (bits)
and don't use packed.
This option will reduce the total size of a structure, but it will not
pack its members.
typedef struct __attribute__((packed)) ether_header {
uint8_t ether_dhost[ETHER_ADDR_LEN];
uint8_t ether_shost[ETHER_ADDR_LEN];
uint16_t ether_type;
} ETHERHDR;
struct __attribute__((packed)) frame {
ETHERHDR hdr;
uint32_t data[8];
};
ETHERHDR is 14 bytes and struct frame is 8 * 4 + 14 = 46 bytes.
Yes - because it is packed.
This is on a FreeBSD arm system:
#include <inttypes.h>
#include <stdio.h>

int main()
{
struct {
uint8_t ether1[6];
uint8_t ether2[6];
uint16_t ethertype;
} testvar;

printf("sizeof(testvar): %i\n", sizeof(testvar));
printf("offset ether1: %i\n", (int)&testvar.ether1 - (int)&testvar);
printf("offset ether2: %i\n", (int)&testvar.ether2 - (int)&testvar);
printf("offset ethertype: %i\n", (int)&testvar.ethertype - (int)&testvar);
return 0;
}

[63]chipmunk.cicely.de# gcc -o test test.c
2.000u 0.000s 0:04.29 81.5% 31417+5528k 0+0io 0pf+0w
[64]chipmunk.cicely.de# ./test
sizeof(testvar): 16
offset ether1: 0
offset ether2: 6
offset ethertype: 12

ether1 is 6 bytes.
ether2 is 6 bytes and since it contains bytes it has an alignment of 1
and starts directly after ether1 at an offset of 6.
ether2 is the same size as ether1.
ethertype starts at an offset of 12 (6 + 6) because it has an alignment
of 2, which fits.
The complete size however isn't 14 because the whole size is padded up to
n*4 size, so that an array of such a struct has every element startet
4 byte aligned.
This is special to ARM and the case because of old processors, which
couldn't natively address bytes and words - they masked them and the
masking code needed to know the concrete offsets.
Early alpha systems had the same restriction but offered special mask
commands to avoid this problem, so the padding is unique to ARM.

Modern ARM don't have this restriction, so it possible to avoi it:
[65]chipmunk.cicely.de# gcc -mstructure-size-boundary=8 -o test test.c
2.000u 0.000s 0:04.21 82.6% 30521+5412k 0+0io 0pf+0w
[66]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12

About you second struct.
Lets extend our testprogram:
#include <inttypes.h>
#include <stdio.h>

int main()
{
struct tv {
uint8_t ether1[6];
uint8_t ether2[6];
uint16_t ethertype;
};
struct tv testvar;

printf("sizeof(testvar): %i\n", sizeof(testvar));
printf("offset ether1: %i\n", (int)&testvar.ether1 - (int)&testvar);
printf("offset ether2: %i\n", (int)&testvar.ether2 - (int)&testvar);
printf("offset ethertype: %i\n", (int)&testvar.ethertype - (int)&testvar);

struct {
struct tv hdr;
uint32_t data[8];
} testvar2;

printf("sizeof(testvar2): %i\n", sizeof(testvar2));
printf("offset hdr: %i\n", (int)&testvar2.hdr - (int)&testvar2);
printf("offset data: %i\n", (int)&testvar2.data - (int)&testvar2);

return 0;
}

[82]chipmunk.cicely.de# gcc -o test test.c
2.000u 0.000s 0:04.37 80.0% 31021+5474k 0+0io 0pf+0w
[83]chipmunk.cicely.de# ./test
sizeof(testvar): 16
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 0
offset data: 16

[84]chipmunk.cicely.de# gcc -mstructure-size-boundary=8 -o test test.c
2.000u 0.000s 0:04.25 82.5% 30833+5453k 0+0io 0pf+0w
[85]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 0
offset data: 16

If sizeof(ether_header) is 16 instead of 14 it allocates 16 bytes
within your struct.
Then you add an array of 32bit values.
In case of a 16 bytes there is no problem with it because data starts
at the natural aligned position for 32bit values.
If it is 14 bytes the 32bit values require 2 byte padding for alignment.
The size is the same, although the way is different.
The second case will also happen on other architecture with alignment
requirements.

So what are you soing with packed.
You tell that the structure has no padding at all.
Both cases are dropped and data is missaligned.
All access to data needs special code overhead to deal with it.
Code size increases, speed drops because of 2 byte memory savings.
It is a different point if you need to parse data handed over by
other systems, but in this case you also need to deal with byte order.

What happens with padding if we reorder hdr and data:
[90]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 32
offset data: 0

Same size, but this time not because of ARM alignment requirements,
but because of uint32_t requirements want sizeof to be 4*n, so
-mstructure-size-boundary=8 won't help.

In other cases it might help.
E.g.:
struct xxx {
uint16_t foo1; // requires 2byte alignment and 2*n sizeof
uint8_t foo2; // no special requirement
uint16_t foo3; // requires 1 byte alignment padding in front for 2 byte alignment
uint8_t foo4; // no special requirement
// 1 bytes passing for sizeof 2*n requirement of int16_t's
} // sizeof = 8
and
struct xxx{
uint16_t foo1; // requires 2byte alignment and 2*n sizeof
uint16_t foo3; // requires 2byte alignment and 2*n sizeof
uint8_t foo2; // no special requirement
uint8_t foo4; // no special requirement
} // sizeof = 6 with -mstructure-size-boundary=8 or 8 without
Post by Harald Kipp
When removing packed and instead compiling with
-mstructure-size-boundary=8, then ETHERHDR is still 14 bytes only, but
struct frame will grow by 2 bytes, because data[] will become aligned.
Yes - but why don't you want it to be aligned?
It is a 32bit variable after all and not 4 char.
Post by Harald Kipp
The problem is not the size of structures, but the alignment of their
members.
Yes it is, but then again, why don't you want them to be aligned.
See my first statement: I'm not a fan of missaligned data.
Post by Harald Kipp
Please correct me if I'm wrong, I'm just evaluating this stuff.
You are right, but there is a reason for the defaults.
--
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Bernd Walter
2010-04-02 19:01:42 UTC
Permalink
Post by Harald Kipp
Hi Bernd,
It's always a pleasure to see you jumping in when things become a bit
complicated.
Post by Bernd Walter
I'm not so much a fan about missaligned data.
Who is? ;-)
That's a good question.
Post by Harald Kipp
Post by Bernd Walter
In almost every case it is avoidable without much trouble.
One of the most annoying points with network data is the 6 bytes long
ethernet header, which usually garanties missaligned IP headers, but
usually this is avoided by setting RX buffers at a 2 byte offset or
copying the data before parsing if HW require 32bit aligned DMA RX
buffers - AT91SAM7X should be happy with 2 byte offset, AT91RM9200
are not and require copying.
Using a 2 byte offset is indeed an option worth to be evaluated.
Copying, however, is something that really consumes CPU power.
So does code for accessing potentially missaligned data.
You either load an aligned 32 bit value into a register or load two
partial data into two registers, then shift and aggregate them.
1 instruction, 1 memory access
or
5 instructions, 2 memory accesses and one scratchpad register.
Writing even requires read-modify-write cycles.
Some architecture can access missaligned data directly, but have
to do the additional memory cycles as well and fix them in hardware.
On systems with data cache copying is cheap compared to code bloat,
because it accesses data within same cacheline and uses burst access
to memory.
Copying is often the cheaper option.
Post by Harald Kipp
Post by Bernd Walter
Packed on ARM has different reasons - older ARM CPUs required bytes
and 16bit words to be on the same alignment because they had to
mask bytes out from 32bit memory operations and therefor structs
containing 3 bytes are 4 bytes long, so that an array of those structs
always start the same members at the same 32bit offset.
Not a real problem, but of course additional instructions are required
compared to 32-bit aligned elements.
Post by Bernd Walter
Usually this isn't a problem and it is also Ok with C-standards, but
parsing network data with structs can be a problem.
Since we don't need ABI compatibility with older ARM systems it
shouldn't be a problem to use -mstructure-size-boundary=8 (bits)
and don't use packed.
This option will reduce the total size of a structure, but it will not
pack its members.
typedef struct __attribute__((packed)) ether_header {
uint8_t ether_dhost[ETHER_ADDR_LEN];
uint8_t ether_shost[ETHER_ADDR_LEN];
uint16_t ether_type;
} ETHERHDR;
struct __attribute__((packed)) frame {
ETHERHDR hdr;
uint32_t data[8];
};
ETHERHDR is 14 bytes and struct frame is 8 * 4 + 14 = 46 bytes.
Yes - because it is packed.
This is on a FreeBSD arm system:
#include <inttypes.h>
#include <stdio.h>

int main()
{
struct {
uint8_t ether1[6];
uint8_t ether2[6];
uint16_t ethertype;
} testvar;

printf("sizeof(testvar): %i\n", sizeof(testvar));
printf("offset ether1: %i\n", (int)&testvar.ether1 - (int)&testvar);
printf("offset ether2: %i\n", (int)&testvar.ether2 - (int)&testvar);
printf("offset ethertype: %i\n", (int)&testvar.ethertype - (int)&testvar);
return 0;
}

[63]chipmunk.cicely.de# gcc -o test test.c
2.000u 0.000s 0:04.29 81.5% 31417+5528k 0+0io 0pf+0w
[64]chipmunk.cicely.de# ./test
sizeof(testvar): 16
offset ether1: 0
offset ether2: 6
offset ethertype: 12

ether1 is 6 bytes.
ether2 is 6 bytes and since it contains bytes it has an alignment of 1
and starts directly after ether1 at an offset of 6.
ether2 is the same size as ether1.
ethertype starts at an offset of 12 (6 + 6) because it has an alignment
of 2, which fits.
The complete size however isn't 14 because the whole size is padded up to
n*4 size, so that an array of such a struct has every element startet
4 byte aligned.
This is special to ARM and the case because of old processors, which
couldn't natively address bytes and words - they masked them and the
masking code needed to know the concrete offsets.
Early alpha systems had the same restriction but offered special mask
commands to avoid this problem, so the padding is unique to ARM.

Modern ARM don't have this restriction, so it possible to avoi it:
[65]chipmunk.cicely.de# gcc -mstructure-size-boundary=8 -o test test.c
2.000u 0.000s 0:04.21 82.6% 30521+5412k 0+0io 0pf+0w
[66]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12

About you second struct.
Lets extend our testprogram:
#include <inttypes.h>
#include <stdio.h>

int main()
{
struct tv {
uint8_t ether1[6];
uint8_t ether2[6];
uint16_t ethertype;
};
struct tv testvar;

printf("sizeof(testvar): %i\n", sizeof(testvar));
printf("offset ether1: %i\n", (int)&testvar.ether1 - (int)&testvar);
printf("offset ether2: %i\n", (int)&testvar.ether2 - (int)&testvar);
printf("offset ethertype: %i\n", (int)&testvar.ethertype - (int)&testvar);

struct {
struct tv hdr;
uint32_t data[8];
} testvar2;

printf("sizeof(testvar2): %i\n", sizeof(testvar2));
printf("offset hdr: %i\n", (int)&testvar2.hdr - (int)&testvar2);
printf("offset data: %i\n", (int)&testvar2.data - (int)&testvar2);

return 0;
}

[82]chipmunk.cicely.de# gcc -o test test.c
2.000u 0.000s 0:04.37 80.0% 31021+5474k 0+0io 0pf+0w
[83]chipmunk.cicely.de# ./test
sizeof(testvar): 16
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 0
offset data: 16

[84]chipmunk.cicely.de# gcc -mstructure-size-boundary=8 -o test test.c
2.000u 0.000s 0:04.25 82.5% 30833+5453k 0+0io 0pf+0w
[85]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 0
offset data: 16

If sizeof(ether_header) is 16 instead of 14 it allocates 16 bytes
within your struct.
Then you add an array of 32bit values.
In case of a 16 bytes there is no problem with it because data starts
at the natural aligned position for 32bit values.
If it is 14 bytes the 32bit values require 2 byte padding for alignment.
The size is the same, although the way is different.
The second case will also happen on other architecture with alignment
requirements.

So what are you soing with packed.
You tell that the structure has no padding at all.
Both cases are dropped and data is missaligned.
All access to data needs special code overhead to deal with it.
Code size increases, speed drops because of 2 byte memory savings.
It is a different point if you need to parse data handed over by
other systems, but in this case you also need to deal with byte order.

What happens with padding if we reorder hdr and data:
[90]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 32
offset data: 0

Same size, but this time not because of ARM alignment requirements,
but because of uint32_t requirements want sizeof to be 4*n, so
-mstructure-size-boundary=8 won't help.

In other cases it might help.
E.g.:
struct xxx {
uint16_t foo1; // requires 2byte alignment and 2*n sizeof
uint8_t foo2; // no special requirement
uint16_t foo3; // requires 1 byte alignment padding in front for 2 byte alignment
uint8_t foo4; // no special requirement
// 1 bytes passing for sizeof 2*n requirement of int16_t's
} // sizeof = 8
and
struct xxx{
uint16_t foo1; // requires 2byte alignment and 2*n sizeof
uint16_t foo3; // requires 2byte alignment and 2*n sizeof
uint8_t foo2; // no special requirement
uint8_t foo4; // no special requirement
} // sizeof = 6 with -mstructure-size-boundary=8 or 8 without
Post by Harald Kipp
When removing packed and instead compiling with
-mstructure-size-boundary=8, then ETHERHDR is still 14 bytes only, but
struct frame will grow by 2 bytes, because data[] will become aligned.
Yes - but why don't you want it to be aligned?
It is a 32bit variable after all and not 4 char.
Post by Harald Kipp
The problem is not the size of structures, but the alignment of their
members.
Yes it is, but then again, why don't you want them to be aligned.
See my first statement: I'm not a fan of missaligned data.
Post by Harald Kipp
Please correct me if I'm wrong, I'm just evaluating this stuff.
You are right, but there is a reason for the defaults.
--
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Harald Kipp
2010-04-02 15:04:04 UTC
Permalink
Hi Bernd,

It's always a pleasure to see you jumping in when things become a bit
complicated.
Post by Bernd Walter
I'm not so much a fan about missaligned data.
Who is? ;-)
Post by Bernd Walter
In almost every case it is avoidable without much trouble.
One of the most annoying points with network data is the 6 bytes long
ethernet header, which usually garanties missaligned IP headers, but
usually this is avoided by setting RX buffers at a 2 byte offset or
copying the data before parsing if HW require 32bit aligned DMA RX
buffers - AT91SAM7X should be happy with 2 byte offset, AT91RM9200
are not and require copying.
Using a 2 byte offset is indeed an option worth to be evaluated.
Copying, however, is something that really consumes CPU power.
Post by Bernd Walter
Packed on ARM has different reasons - older ARM CPUs required bytes
and 16bit words to be on the same alignment because they had to
mask bytes out from 32bit memory operations and therefor structs
containing 3 bytes are 4 bytes long, so that an array of those structs
always start the same members at the same 32bit offset.
Not a real problem, but of course additional instructions are required
compared to 32-bit aligned elements.
Post by Bernd Walter
Usually this isn't a problem and it is also Ok with C-standards, but
parsing network data with structs can be a problem.
Since we don't need ABI compatibility with older ARM systems it
shouldn't be a problem to use -mstructure-size-boundary=8 (bits)
and don't use packed.
This option will reduce the total size of a structure, but it will not
pack its members.

typedef struct __attribute__((packed)) ether_header {
uint8_t ether_dhost[ETHER_ADDR_LEN];
uint8_t ether_shost[ETHER_ADDR_LEN];
uint16_t ether_type;
} ETHERHDR;

struct __attribute__((packed)) frame {
ETHERHDR hdr;
uint32_t data[8];
};

ETHERHDR is 14 bytes and struct frame is 8 * 4 + 14 = 46 bytes.

When removing packed and instead compiling with
-mstructure-size-boundary=8, then ETHERHDR is still 14 bytes only, but
struct frame will grow by 2 bytes, because data[] will become aligned.

The problem is not the size of structures, but the alignment of their
members.

Please correct me if I'm wrong, I'm just evaluating this stuff.

Harald
Harald Kipp
2010-04-01 18:06:25 UTC
Permalink
Hi Ole,
Post by Ole Reinhardt
Just an aditional idea... What about an alignment trap handler? Indeed I
don't have any code, but I'm quite shure I saw some code on the web...
Will search again for it.
What should this handler do in addition to what is already implemented
in nut/arch/arm/debug?

Harald
Bernd Walter
2010-04-01 18:06:56 UTC
Permalink
Post by Ole Reinhardt
Hi Harald,
Post by Harald Kipp
It looks like the optimizer changed in GCC 4.4.2, specifically access to
packed structure elements.
Just an aditional idea... What about an alignment trap handler? Indeed I
don't have any code, but I'm quite shure I saw some code on the web...
Will search again for it.
I'm not so much a fan about missaligned data.
In almost every case it is avoidable without much trouble.
One of the most annoying points with network data is the 6 bytes long
ethernet header, which usually garanties missaligned IP headers, but
usually this is avoided by setting RX buffers at a 2 byte offset or
copying the data before parsing if HW require 32bit aligned DMA RX
buffers - AT91SAM7X should be happy with 2 byte offset, AT91RM9200
are not and require copying.

Packed on ARM has different reasons - older ARM CPUs required bytes
and 16bit words to be on the same alignment because they had to
mask bytes out from 32bit memory operations and therefor structs
containing 3 bytes are 4 bytes long, so that an array of those structs
always start the same members at the same 32bit offset.
Usually this isn't a problem and it is also Ok with C-standards, but
parsing network data with structs can be a problem.
Since we don't need ABI compatibility with older ARM systems it
shouldn't be a problem to use -mstructure-size-boundary=8 (bits)
and don't use packed.
--
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Harald Kipp
2010-04-01 10:27:08 UTC
Permalink
Hi all,

It looks like the optimizer changed in GCC 4.4.2, specifically access to
packed structure elements.

At several places we may have to change

struct __attribute__ ((packed))...

into

__attribute__ ((packed, __may_alias__))

In Nut/OS packed structures often refer to elements within other packed
structures. As far as I understood, the __may_alias__ attribute tells
the compiler that it should _not_ assume that the structure itself is
placed at an aligned address.


Harald
Ole Reinhardt
2010-04-01 16:09:28 UTC
Permalink
Hi Harald,
Post by Harald Kipp
It looks like the optimizer changed in GCC 4.4.2, specifically access to
packed structure elements.
Just an aditional idea... What about an alignment trap handler? Indeed I
don't have any code, but I'm quite shure I saw some code on the web...
Will search again for it.

Bye,

Ole
--
Thermotemp GmbH, Embedded-IT

Embedded Hard-/ Software and Open Source Development,
Integration and Consulting

http://www.embedded-it.de

Gesch?ftsstelle Siegen - Steinstra?e 67 - D-57072 Siegen -
tel +49 (0)271 5513597, +49 (0)271-73681 - fax +49 (0)271 736 97

Hauptsitz - Hademarscher Weg 7 - 13503 Berlin
Tel +49 (0)30 4315205 - Fax +49 (0)30 43665002
Gesch?ftsf?hrer: J?rg Friedrichs, Ole Reinhardt
Handelsregister Berlin Charlottenburg HRB 45978 UstID DE 156329280
Continue reading on narkive:
Search results for 'ARM GCC 4.4 Alignment Problems' (newsgroups and mailing lists)
13
replies
[En-Nut-Discussion] sscanf bug or feature ? %hx
started 2008-02-06 18:32:34 UTC
en-nut-discussion@egnite.de
49
replies
[En-Nut-Discussion] IP Checksum calculation broken. Reenabled old algorithm for the meantime.
started 2012-06-17 20:56:22 UTC
en-nut-discussion@egnite.de
12
replies
[En-Nut-Discussion] Problem with ARM floating point, again
started 2013-01-30 08:52:09 UTC
en-nut-discussion@egnite.de
Search results for 'ARM GCC 4.4 Alignment Problems' (Questions and Answers)
4
replies
Cross-Debugging für ARM / MIPS ELF mit QEMU / Toolchain
started 2015-05-06 22:31:24 UTC
reverse engineering
9
replies
Sollen sich die Arme der LEGO Minifiguren lösen?
started 2011-10-26 01:26:18 UTC
lego
3
replies
Warum gibt mir die Malloc-Funktion meines atmega32 die Zeigeradresse in Vielfachen von 4 zurück?
started 2016-12-03 23:55:10 UTC
elektronik
8
replies
Wie kann ich .so-Dateien, die in Android-APKs gefunden wurden, zurückentwickeln?
started 2014-06-17 12:09:59 UTC
reverse engineering
2
replies
Permanenter Fix "Fehler in check_disks" beim Neustart
started 2015-04-09 03:07:03 UTC
raspberry pi
Unrelated but interesting topics
6
replies
Wenn jemand mein Wi-Fi-Passwort hackt, was kann er sehen und wie?
started 2013-02-04 20:43:09 UTC
8
replies
Wie melde ich eine Sicherheitslücke über eine vertrauenswürdige Zertifizierungsstelle?
started 2015-06-10 13:56:41 UTC
15
replies
Verhindern, dass registrierte Benutzer Kennwörter freigeben
started 2015-11-16 08:09:56 UTC
5
replies
Ist die Verzögerung der Schaltfläche zum Speichern in einem Firefox-Download-Dialog eine Sicherheitsfunktion? Was schützt es?
started 2016-03-21 06:32:06 UTC
16
replies
Welche Tools stehen zur Verfügung, um die Sicherheit einer Webanwendung zu bewerten?
started 2010-11-11 22:12:55 UTC
Loading...