Post by Harald KippHi Bernd,
It's always a pleasure to see you jumping in when things become a bit
complicated.
Post by Bernd WalterI'm not so much a fan about missaligned data.
Who is? ;-)
That's a good question.
Post by Harald KippPost by Bernd WalterIn almost every case it is avoidable without much trouble.
One of the most annoying points with network data is the 6 bytes long
ethernet header, which usually garanties missaligned IP headers, but
usually this is avoided by setting RX buffers at a 2 byte offset or
copying the data before parsing if HW require 32bit aligned DMA RX
buffers - AT91SAM7X should be happy with 2 byte offset, AT91RM9200
are not and require copying.
Using a 2 byte offset is indeed an option worth to be evaluated.
Copying, however, is something that really consumes CPU power.
So does code for accessing potentially missaligned data.
You either load an aligned 32 bit value into a register or load two
partial data into two registers, then shift and aggregate them.
1 instruction, 1 memory access
or
5 instructions, 2 memory accesses and one scratchpad register.
Writing even requires read-modify-write cycles.
Some architecture can access missaligned data directly, but have
to do the additional memory cycles as well and fix them in hardware.
On systems with data cache copying is cheap compared to code bloat,
because it accesses data within same cacheline and uses burst access
to memory.
Copying is often the cheaper option.
Post by Harald KippPost by Bernd WalterPacked on ARM has different reasons - older ARM CPUs required bytes
and 16bit words to be on the same alignment because they had to
mask bytes out from 32bit memory operations and therefor structs
containing 3 bytes are 4 bytes long, so that an array of those structs
always start the same members at the same 32bit offset.
Not a real problem, but of course additional instructions are required
compared to 32-bit aligned elements.
Post by Bernd WalterUsually this isn't a problem and it is also Ok with C-standards, but
parsing network data with structs can be a problem.
Since we don't need ABI compatibility with older ARM systems it
shouldn't be a problem to use -mstructure-size-boundary=8 (bits)
and don't use packed.
This option will reduce the total size of a structure, but it will not
pack its members.
typedef struct __attribute__((packed)) ether_header {
uint8_t ether_dhost[ETHER_ADDR_LEN];
uint8_t ether_shost[ETHER_ADDR_LEN];
uint16_t ether_type;
} ETHERHDR;
struct __attribute__((packed)) frame {
ETHERHDR hdr;
uint32_t data[8];
};
ETHERHDR is 14 bytes and struct frame is 8 * 4 + 14 = 46 bytes.
Yes - because it is packed.
This is on a FreeBSD arm system:
#include <inttypes.h>
#include <stdio.h>
int main()
{
struct {
uint8_t ether1[6];
uint8_t ether2[6];
uint16_t ethertype;
} testvar;
printf("sizeof(testvar): %i\n", sizeof(testvar));
printf("offset ether1: %i\n", (int)&testvar.ether1 - (int)&testvar);
printf("offset ether2: %i\n", (int)&testvar.ether2 - (int)&testvar);
printf("offset ethertype: %i\n", (int)&testvar.ethertype - (int)&testvar);
return 0;
}
[63]chipmunk.cicely.de# gcc -o test test.c
2.000u 0.000s 0:04.29 81.5% 31417+5528k 0+0io 0pf+0w
[64]chipmunk.cicely.de# ./test
sizeof(testvar): 16
offset ether1: 0
offset ether2: 6
offset ethertype: 12
ether1 is 6 bytes.
ether2 is 6 bytes and since it contains bytes it has an alignment of 1
and starts directly after ether1 at an offset of 6.
ether2 is the same size as ether1.
ethertype starts at an offset of 12 (6 + 6) because it has an alignment
of 2, which fits.
The complete size however isn't 14 because the whole size is padded up to
n*4 size, so that an array of such a struct has every element startet
4 byte aligned.
This is special to ARM and the case because of old processors, which
couldn't natively address bytes and words - they masked them and the
masking code needed to know the concrete offsets.
Early alpha systems had the same restriction but offered special mask
commands to avoid this problem, so the padding is unique to ARM.
Modern ARM don't have this restriction, so it possible to avoi it:
[65]chipmunk.cicely.de# gcc -mstructure-size-boundary=8 -o test test.c
2.000u 0.000s 0:04.21 82.6% 30521+5412k 0+0io 0pf+0w
[66]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12
About you second struct.
Lets extend our testprogram:
#include <inttypes.h>
#include <stdio.h>
int main()
{
struct tv {
uint8_t ether1[6];
uint8_t ether2[6];
uint16_t ethertype;
};
struct tv testvar;
printf("sizeof(testvar): %i\n", sizeof(testvar));
printf("offset ether1: %i\n", (int)&testvar.ether1 - (int)&testvar);
printf("offset ether2: %i\n", (int)&testvar.ether2 - (int)&testvar);
printf("offset ethertype: %i\n", (int)&testvar.ethertype - (int)&testvar);
struct {
struct tv hdr;
uint32_t data[8];
} testvar2;
printf("sizeof(testvar2): %i\n", sizeof(testvar2));
printf("offset hdr: %i\n", (int)&testvar2.hdr - (int)&testvar2);
printf("offset data: %i\n", (int)&testvar2.data - (int)&testvar2);
return 0;
}
[82]chipmunk.cicely.de# gcc -o test test.c
2.000u 0.000s 0:04.37 80.0% 31021+5474k 0+0io 0pf+0w
[83]chipmunk.cicely.de# ./test
sizeof(testvar): 16
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 0
offset data: 16
[84]chipmunk.cicely.de# gcc -mstructure-size-boundary=8 -o test test.c
2.000u 0.000s 0:04.25 82.5% 30833+5453k 0+0io 0pf+0w
[85]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 0
offset data: 16
If sizeof(ether_header) is 16 instead of 14 it allocates 16 bytes
within your struct.
Then you add an array of 32bit values.
In case of a 16 bytes there is no problem with it because data starts
at the natural aligned position for 32bit values.
If it is 14 bytes the 32bit values require 2 byte padding for alignment.
The size is the same, although the way is different.
The second case will also happen on other architecture with alignment
requirements.
So what are you soing with packed.
You tell that the structure has no padding at all.
Both cases are dropped and data is missaligned.
All access to data needs special code overhead to deal with it.
Code size increases, speed drops because of 2 byte memory savings.
It is a different point if you need to parse data handed over by
other systems, but in this case you also need to deal with byte order.
What happens with padding if we reorder hdr and data:
[90]chipmunk.cicely.de# ./test
sizeof(testvar): 14
offset ether1: 0
offset ether2: 6
offset ethertype: 12
sizeof(testvar2): 48
offset hdr: 32
offset data: 0
Same size, but this time not because of ARM alignment requirements,
but because of uint32_t requirements want sizeof to be 4*n, so
-mstructure-size-boundary=8 won't help.
In other cases it might help.
E.g.:
struct xxx {
uint16_t foo1; // requires 2byte alignment and 2*n sizeof
uint8_t foo2; // no special requirement
uint16_t foo3; // requires 1 byte alignment padding in front for 2 byte alignment
uint8_t foo4; // no special requirement
// 1 bytes passing for sizeof 2*n requirement of int16_t's
} // sizeof = 8
and
struct xxx{
uint16_t foo1; // requires 2byte alignment and 2*n sizeof
uint16_t foo3; // requires 2byte alignment and 2*n sizeof
uint8_t foo2; // no special requirement
uint8_t foo4; // no special requirement
} // sizeof = 6 with -mstructure-size-boundary=8 or 8 without
Post by Harald KippWhen removing packed and instead compiling with
-mstructure-size-boundary=8, then ETHERHDR is still 14 bytes only, but
struct frame will grow by 2 bytes, because data[] will become aligned.
Yes - but why don't you want it to be aligned?
It is a 32bit variable after all and not 4 char.
Post by Harald KippThe problem is not the size of structures, but the alignment of their
members.
Yes it is, but then again, why don't you want them to be aligned.
See my first statement: I'm not a fan of missaligned data.
Post by Harald KippPlease correct me if I'm wrong, I'm just evaluating this stuff.
You are right, but there is a reason for the defaults.
--
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.