Something just occurred to me about the gcc implementation of std::vector in C++. Internally, an instance of vector needs to store (1) a pointer to the start of the memory, (2) the current vector size, and (3) the current buffer size. However, (2) and (3) can be represented either as integers relative to the start pointer or as absolute pointers past the end of the active and buffer spaces. Specifically, the two representations are
template
struct relative {
T* start;
size_t size, buffer_size;
};
and
template
struct absolute {
T* start;
T* end; // = start + size
T* buffer_end; // = start + buffer_size
};
The relative integer representation seemed more natural, so it’s always struck me as odd that gnu stdc++’s vector class used the absolute representation.
What just occurred to me is that the absolute representation is better from a strict aliasing standpoint, since arrays of pointers are much less common than arrays of integers. This potentially gives the compiler more knowledge with which to optimize. For example, consider the following implementation of copy:
void copy(vector<size_t>& dest, const vector<size_t>& src) {
for (size_t i = 0; i < src.size(); i++)
dest[i] = src[i];
}
If the compiler is allowed to assume strict aliasing and vector<size_t>
uses
the absolute representation, it knows that the write to dest can’t change the
value of src.size()
, which allows the loop bound to be computed at the start of
the loop. However, if the relative representation is used, nothing tells the
compiler that dest.start != &src.size
. If a write to dest[i]
might change the
value of size, the compiler has to repeat the memory lookup during every
iteration of the loop, which prevents all interesting loop optimizations.
P.S. Someone pointed out that I forgot to define “strict aliasing”. Strict aliasing means that it is invalid in C++ to access the same memory through pointers of different types, and in particular that the compiler is allowed to make this assumption when optimizing. See the Wikipedia page for more details.