FAQs in section [26]:
- [26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?
- [26.2] What are the units of sizeof?
- [26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?
- [26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?
- [26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?
- [26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?
- [26.7] What is a "POD type"?
- [26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?
- [26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?
- [26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?
- [26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a?
- [26.12] How can I tell if an integer is a power of two without looping?
- [26.13] What should be returned from a function?
[26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?
No, sizeof(char) is always 1. Always. It is never 2. Never, never, never.
Even if you think of a "character" as a multi-byte thingy, char is not.
sizeof(char) is always exactly 1. No exceptions, ever.
Look, I know this is going to hurt your head, so please, please just
read the next few FAQs in sequence and hopefully the pain will go away by
sometime next week.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.2] What are the units of sizeof?
Bytes.
For example, if sizeof(Fred) is 8, the distance between two Fred objects
in an array of Freds will be exactly 8 bytes.
As another example, this means sizeof(char) is one
byte. That's right: one byte. One, one, one, exactly one byte,
always one byte. Never two bytes. No exceptions.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?
Yes that's right: the thing commonly referred to as a "character" might be
different from the thing C++ calls a char.
I'm really sorry if that hurts, but believe me, it's better to get all the
pain over with at once. Take a deep breath and repeat after me: "character
and char might be different." There, doesn't that feel better? No? Well
keep reading it gets worse.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?
Yep, that's right: a C++ byte might have more than 8 bits.
The C++ language guarantees a byte must always have at least 8 bits.
But there are implementations of C++ that have more than 8 bits per byte.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?
Wrong.
I have heard of one implementation of C++ that has 64-bit "bytes." You read
that right: a byte on that implementation has 64 bits. 64 bits per byte. 64.
As in 8 times 8.
And yes, you're right, combining with the above would
mean that a char on that implementation would have 64 bits.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?
Here are the rules:
- The C++ language gives the programmer the impression that memory is
laid out as a sequence of something C++ calls "bytes."
- Each of these things that the C++ language calls a byte has at
least 8 bits, but might have more than 8 bits.
- The C++ language guarantees that a char* (char pointers) can
address individual bytes.
- The C++ language guarantees there are no bits between two
bytes. This means every bit in memory is part of a byte. If you grind your
way through memory via a char*, you will be able to see every
bit.
- The C++ language guarantees there are no bits that are part of two
distinct bytes. This means a change to one byte will never cause a change
to a different byte.
- The C++ language gives you a way to find out how many bits are in a
byte in your particular implementation: include the header <climits>,
then the actual number of bits per byte will be given by the CHAR_BIT
macro.
Let's work an example to illustrate these rules. The PDP-10 has 36-bit words
with no hardware facility to address anything within one of those words. That
means a pointer can point only at things on a 36-bit boundary: it is not
possible for a pointer to point 8 bits to the right of where some other
pointer points.
One way to abide by all the above rules is for a PDP-10 C++ compiler to define
a "byte" as 36 bits. Another valid approach would be to define a "byte" as 9
bits, and simulate a char* by two words of memory: the first could point to
the 36-bit word, the second could be a bit-offset within that word. In that
case, the C++ compiler would need to add extra instructions when compiling
code using char* pointers. For example, the code generated for *p =
'x' might read the word into a register, then use bit-masks and bit-shifts
to change the appropriate 9-bit byte within that word. An int* could
still be implemented as a single hardware pointer, since C++ allows
sizeof(char*) != sizeof(int*).
Using the same logic, it would also be possible to define a PDP-10 C++ "byte"
as 12-bits or 18-bits. However the above technique wouldn't allow us to
define a PDP-10 C++ "byte" as 8-bits, since 8*4 is 32, meaning every 4th byte
we would skip 4 bits. A more complicated approach could be used for
those 4 bits, e.g., by packing nine bytes (of 8-bits each) into two adjacent
36-bit words. The important point here is that memcpy() has to be
able to see every bit of memory: there can't be any bits between two adjacent
bytes.
Note: one of the popular non-C/C++ approaches on the PDP-10 was to pack 5
bytes (of 7-bits each) into each 36-bit word. However this won't work in C or
C++ since 5*7 = 35, meaning using char*s to walk through memory would "skip"
a bit every fifth byte (and also because C++ requires bytes to have at least 8
bits).
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.7] What is a "POD type"?
A type that consists of nothing but Plain Old
Data.
A POD type is a C++ type that has an equivalent in C, and that uses the same
rules as C uses for initialization, copying, layout, and addressing.
As an example, the C declaration struct Fred x; does not initialize the
members of the Fred variable x. To make this same behavior happen in C++,
Fred would need to not have any constructors. Similarly to make the
C++ version of copying the same as the C version, the C++ Fred must not have
overloaded the assignment operator. To make sure the other rules match, the
C++ version must not have virtual functions, base classes, non-static members
that are private or protected, or a destructor. It can, however, have
static data members, static member functions, and non-static non-virtual
member functions.
The actual definition of a POD type is recursive and gets a little gnarly.
Here's a slightly simplified definition of POD: a POD type's
non-static data members must be public and can be of any of these types:
bool, any numeric type including the various char variants, any
enumeration type, any data-pointer type (that is, any type convertible to
void*), any pointer-to-function type, or any POD type, including arrays of
any of these. Note: data-pointers and pointers-to-function are okay, but
pointers-to-member are not. Also note that
references are not allowed. In addition, a POD type can't have constructors,
virtual functions, base classes, or an overloaded assignment operator.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?
For symmetry, it is usually best to initialize all non-static data members in
the constructor's "initialization list," even those that are of a built-in /
intrinsic / primitive type. The FAQ shows you why and
how.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?
Yes, if you initialize your built-in / intrinsic / primitive variable by an
expression that the compiler doesn't evaluate solely at compile-time. The FAQ
provides several solutions for
this (subtle!) problem.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?
No, the C++ language requires that your operator overloads take at least one
operand of a "class type" or enumeration type. The C++ language will not let you define an
operator all of whose operands / parameters are of primitive types.
For example, you can't define an
operator== that takes two char*s and uses string comparison. That's
good news because if s1 and s2 are of type char*, the
expression s1 == s2 already has a well defined meaning: it compares
the two pointers, not the two strings pointed to by those pointers.
You shouldn't use pointers anyway. Use
std::string instead of char*.
If C++ let you redefine the meaning of operators on built-in types, you
wouldn't ever know what 1 + 1 is: it would depend on which headers got
included and whether one of those headers redefined addition to mean, for
example, subtraction.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a?
Because you can't.
Look, please don't write me an email asking me why C++ is what it is.
It just is. If you really want a rationale, buy Bjarne Stroustrup's excellent
book, "Design and Evolution of C++" (Addison-Wesley publishers). But if your
real goal is to write some code, don't waste too much time figuring out
why C++ has these rules, and instead just abide by its rules.
So here's the rule: if a points to an array of thingies that was
allocated via new T[n], then you must,
must, must delete it via delete[] a. Even if the
elements in the array are built-in types. Even if they're of type char or
int or void*. Even if you don't understand why.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.12] How can I tell if an integer is a power of two without looping?
inline bool isPowerOf2(int i)
{
return i > 0 && (i & (i - 1)) == 0;
}
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.13] What should be returned from a function?
In practice, there are a lot of cases. Here are a few of them in random
order:
- void if you don't need a return value, don't return one.
- local by value it's the simplest, and with a little care NRVO
maximizes performance.
- local by pointer or reference NOT!. Please don't do this.
- data member by value excellent choice if the function is a non-static
member function, and if the data member can be copied relatively quickly,
e.g., int. If the data member is something that is slow to copy, this
has a performance penalty if you call this member function in the
inner loop of a CPU-bound application.
- data member by pointer okay, but make sure you don't want to return it
by reference, and make sure you use Foo const* or const Foo*
if you don't want the caller to modify the data member. Since callers might
store the pointer rather than copy the data member, you should warn callers in
the member function's "contract" that they must not use the returned pointer
after the this-object dies.
- data member by reference-to-nonconst okay, but this allows the caller
to make changes to your object's data member without your class "seeing" the
change. If you have a "set" method that changes this data member, use either
a reference-to-const or by-value instead. Another thing: since callers might
store the reference rather than copy the data member, you should warn callers
in the member function's "contract" that they must not use the returned
reference after the this-object dies.
- data member by reference-to-const okay, but it does allow your users
to see the data type of your member variables. That means if you
ever need to change the type of your member variables, the change might break
the code that uses your class, and that's one of the main points of
encapsulation. You can ameliorate that risk by exposing a public
typedef for the type of that member variable (and therefore the type
of the reference-to-const return value), and by warning your users that they
should use the typedef rather than the raw, underlying type. Another
reality is that if the caller captures this reference, as opposed to copying
the object, then the underlying referent might change "under the caller's
nose," even though the type is
reference-to-const. Because a lot of programmers are surprised by that,
it's smart to warn callers in the member function's "contract." You should
also warn callers to discard the returned reference once the
this-object has died.
- shared_ptr to a member that was allocated via new this has
tradeoffs that are very similar to those of returning a member by pointer or
by reference; see those bullets for the tradeoffs. The advantage is that
callers can legitimately hold onto and use the returned pointer after the
this-object dies.
- local auto_ptr or shared_ptr to freestore-allocated copy
of the datum. This is useful for polymorphic objects, since it lets you have
the effect of return-by-value yet without the "slicing" problem. The
performance needs to be evaluated on a case-by-case basis.
- others this list is by way of example and not by way of exclusion. In
other words, this is just a starting point, not an ending point.
Murphy's Law basically guarantees that your particular needs will fall under
the last bullet, rather than any of the earlier bullets .
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
E-mail the author
[ C++ FAQ
| Table of contents
| Subject index
| About the author
| ©
| Download your own copy ]
Revised Jun 26, 2011