Boost.RawMemory

"... once again, [programmers] get to use the legendary names peek and poke..."

"... templates and policy-based class design
meet Assembler, compiler intrinsics and calling convention specifiers."

"Enough about the Dark Ages of Programming..."

For feedback, questions, answers, support or anything you would like to communicate with the clumsy author of this library, you are kindly asked to write an email message (or several email messages) to adder_2003@y.....com ! Thank you so !... (-:

Download locations

Initial version (2011-09-05): http://adder.iworks.ro/Boost/RawMemory/Boost_RawMemory_00.zip
Update-01 (2011-09-12): http://adder.iworks.ro/Boost/RawMemory/Boost_RawMemory_01.zip

Table of contents

Table of contents (expanded version)

Overview

The Boost RawMemory library supports safe, efficient and portable (all in one) transfer of multi-byte integer values to/from "untyped memory" (also known as "arrays of bytes") (also known as "raw memory").

It handles the little endian and big endian representations and various memory transfer modes (e.g. to support aligned and unaligned data with good performance).

Therefore, it can be useful to any application dealing with binary data formats (e.g.: bitmap images, network packets, serialized data structures).

Usage of the library is easy and convenient(*). If you are comfy with templates, that is.

(*) As easy and as convenient as possible, without introducing performance problems or encouraging programming styles that lead to non-portable applications.

A secondary goal consists of helping programmers learn about some subtle aspects of binary data representation that might affect their applications and some ways to deal with those.

Design philosophy

The words of William Kahan (discussing the initial debates over the IEEE 754 Standard for floating-point computing) happen to fit very well with what we had in mind for our library since the very beginning:

"It looked pretty complicated. On the other hand, we had a rationale for everything. What distinguished our proposal from the others was the reasoning behind every decision.

I had worked out the details initially for Intel. My reasoning was based on the requirements of a mass market. A lot of code involving a little floating-point will be written by many people who have never attended my (nor anyone else's) numerical analysis classes. We had to enhance the likelihood that their programs would get correct results.

At the same time we had to ensure that people who really are expert in floating-point could write portable software and prove that it worked, since so many of us would have to rely upon it.

There were a lot of almost conflicting requirements on the way to a balanced design. [...]"

An Interview with the Old Man of Floating Point
Reminiscences elicited from William Kahan by Charles Severance
1998-02-20
http://www.eecs.berkeley.edu/~wkahan/ieee754status/754story.html

Quick start

Let us imagine that we have read binary data from a file into a vector of characters (bytes):

std::vector <char> vbBuffer (4096); { std::ifstream f ("Data.bin", std::ios_base::binary); if (f.bad () || f.fail ()) return false; f.read (&vbBuffer [0], vbBuffer.size ()); if (f.bad () || f.fail ()) return false; if (f.gcount () != vbBuffer.size ()) return false; }

We wish to read the double-word at offset 5 (which just happens to contain the amount of cash available for our character in our favourite RPG game, as our hac... reverse-engineering friend has informed us), add 72 to it and store it back.

According to the specification we have, that double-word value is stored in big endian format. We develop on an IA-32 (x86) machine (which uses the little endian convention), but we wish our application to also run on big endian machines. We also wish to use a single syntax that generates fast code on every target platform (because we are also going to use this code in our real-time network packet analyzer, for our boss at work, who does not enjoy RPG games).

The IA-32 (x86 architecture) allows unaligned access to memory for reading and writing integers (unless "Alignment Check" is activated in EFLAGS), but other machines might either crash or run the code extremely slowly (e.g. Alpha) when attempting to perform such operations. We expect our application to behave correctly and perform any necessary work-around behind the scenes.

We are familiar with endianness and alignment issues, but we would rather not worry much about them and just write tight and elegant source code that generates correct, efficient and portable object code. Here we go:

using namespace boost; uint32_t x = raw_memory <uint32_t, big_endian>::peek (&vbBuffer [5]); x += 72; raw_memory <uint32_t, big_endian>::poke (&vbBuffer [5], x);

Supported data types, endian conventions and transfer modes

Data types

The sizes of fundamental integer types (i.e. char, wchar_t, long, etc.) in C++ are notorious for varying from one platform to another. This makes them quite unfit for the purpose of dealing with binary data formats.

Fortunately, for some time now, the so called fixed-size integer types have been available. These typedef's (e.g. uint8_t / int8_t, uint16_t / int16_t, uint32_t / int32_t, uint64_t / int64_t) are available in either a Standard Library header called <stdint.h> / <cstdint> or in <boost/cstdint.hpp> (in the namespace boost, in the latter case).

The types supported by the library are:

Endian conventions

The library supports the following endian conventions (which specify how multi-byte values are stored in adjacent one-byte memory locations):

A short and elegant essay (crash-course) on Endian Order (with examples of the endianness of various file formats) can be found here: http://www.cs.umass.edu/~verts/cs32/endian.html.

As an example, let us examine how a 32-bit unsigned integer value (10203040h) is stored in memory on architectures using these conventions.

We will assume that the address is 01C50000h. Each memory cell can hold 8 bits (1 byte). In order to store our 32-bit value, we are going to need 4 memory cells, with addresses from 01C50000h to 01C50003h.

Little endian convention:
Address of cell: 01C50000h 01C50001h 01C50002h 01C50003h
Byte value in cell: 40h 30h 20h 10h
32-bit (4-byte) resulting value: 10203040h
Big endian convention:
Address of cell: 01C50000h 01C50001h 01C50002h 01C50003h
Byte value in cell: 10h 20h 30h 40h
32-bit (4-byte) resulting value: 10203040h

The little_endian and big_endian library types directly support these endian formats on any machine.

The natural_endian type denotes the endian format native to the machine, and the reverse_endian type denotes the opposite format.

Other conventions for storing multi-byte integer values in multiple (consecutive) single-byte memory cells can be imagined and might even exist in the real world. The library can easily be extended to support them with the same elegant syntax (elegant if you like templates, of course ^^).

Transfer modes

The following low-level transfer modes are currently supported (to deal with non-native endianness of data, unaligned operations, etc.):

byte_sized_memory_xfer

casting_memory_xfer

(*) In Update-01, we have rephrased this sentence in order to emphasize the fact that any necessary byte swapping is already performed internally by the library.

branching_memory_xfer

Most of the times, the users of the library are not interested in the low-level details of how the operations are implemented. They just have an address (a pointer) to "raw memory" and wish to read ("peek") data from it or write ("poke") data to it.

If performance is critical and they know that the address is properly aligned, they might care to provide this hint to the library, which might perform the operations faster (or fail, if the programmers were wrong, now this is revenge !).

If performance is not critical or they don't know whether the address is properly aligned or not, the library should ensure that the operation succeeds and still provide quite a decent speed of execution.

Thus, the programmers will probably use the following high-level memory transfer modes (which are mapped to the low-level memory transfer modes by the authors of the library, according to per-platform benchmarks that they have conducted) (hopefully):

checked_memory_xfer

unchecked_memory_xfer

Note (especially for the worried x86/x64 programmers): For some platforms, checked_memory_xfer and unchecked_memory_xfer might actually map to the same low-level transfer mode.

The library can easily be extended to support transfer modes tweaked for various scenarios. For example, on certain platforms, benchmark tests might prove that 32-bit-aligned-but-64-bit-unaligned 64-bit integers should best be processed in 32-bit chunks.

Supported CPU architectures, operating systems and compilers

CPU architectures

The following table indicates CPU architectures that the library has been tested on. It also denotes whether performance has already been fine tuned (i.e. benchmarks have been performed and optimizations have been included in the source code).

It is my hope that, with the help of fellow programmers who enjoy RPG games, I will be able to extend support and fine-tuning.

One more thing: These green check-marks look very nice on a report, but there may still be room for improvement.

CPU architecture: Tested: Fine-tuned:
IA-32 (x86) OK OK
AMD64 / IA-32 EM64T (x64) OK OK
IA-64 (IPF) (Itanium Processor Family) Pending Pending
PowerPC OK Pending
SPARC Pending Pending
MIPS Pending Pending
ARM Pending Pending
Cell BE Pending Pending

Operating systems

The following table indicates, for each listed operating system, whether the library has been tested on applications running on top of that operating system.

I hope that we will fill this table too with beautiful, green checkmarks !

Operating system: Tested:
Windows OK
Linux OK
Mac OS OK
Symbian Pending
iOS Pending
Android Pending

Compilers

Compiler support is depicted in the following table.

The library source code should easily be processed by any modern toolchain, but I wish we can actually test it (and modify it accordingly) to ensure that it works.

Compiler / version: Tested:
Borland/Inprise/Borland/CodeGear/Embarcadero
5.5 (1998 - 2000) (C++Builder 5 and "the free command line tools") OK
5.6 (2002) (C++Builder 6) OK
5.8 (2006) (Developer Studio 2006 / Turbo C++ Explorer) OK
5.9 (2007) (RAD Studio 2007) OK
Digital Mars C++
Any version over the past 5 years OK
Microsoft Visual C++
6.0 (1998) Pending
7.1 (2003) OK
8.0 (2005) OK
9.0 (2008) OK
10.0 (2010) OK
gcc
3.x OK
4.x OK

Rationale

Historically, programmers have been using the cast-the-pointer-and-dereference-it "technique" in order to read and write multi-byte integer values from and to arrays of bytes (e.g. file buffers and network packets).

Many computer science Universities classes (at least here, in Bucharest, Romania, Eastern Europe, that is) have been overlooking the portability and performance issues involved, at best by saying "We will leave this aspect as a home exercise..." and at worst by actually giving example programs and otherwise recommending the above-mentioned "technique".

Most students did not take the time to delve into endianness and alignment issues regarding binary data structures across different architectures and thus never learned to handle them correctly.

Confused students got jobs and ended up as project managers. Too busy now to unlearn a wrong way of doing things and learn the right way, they trash any attempt made by new employees (and any suggestion made by interviewed candidates) to port the code, as if targetting a single architecture (with 32-bit int's, 32-bit long's and 32-bit pointers) is normal and natural and doing otherwise is so difficult that it is not worth the time or the effort or the budget of the department.

Except when the company boss comes to their office and tells them that she has heard that these new 64-bit capable processors might perform better than the old ones if fed 64-bit tailored code and that she would love to also run the client application on her iPhone.

Not so proud as when discussing matters with the programmers in the teams they manage (or with the dorky-looking, brilliant and enthusiast candidates that they would rather not introduce to the company boss), they call up a meeting and move on to porting...

... but what should really be just a rebuild of last night's source code snapshot with a different toolset and/or maybe different switches followed by a run of the automated tests becomes a 6-month nightma... activity, that is, with lots of runtime errors (sometimes difficult to reproduce) and no possibility to somehow "grep" to the exact non-portable lines.

Enough about the Dark Ages of Programming...

Enter boost::raw_memory (hopefully to become std::raw_memory and make me famous).

Now the teachers of C++ computer programming can just draw on the blackboard the diagrams describing the endian conventions (the ones that I had to draw in HTML, for the win), draw one more diagram showing an unaligned integer value, talk about the drawings for 5 minutes and then recommend that the students use the highly-available Boost (hopefully to become standard) facility.

Now the busy programmers with deadlines at the workplace, grumpy wives (or husbands) at home and ticking payment schedules at the bank can just read through this doc, smile a bit and get started to writing portable, efficient code in 15 minutes.

And they can also enjoy the blissful feeling of understanding the subtle issues of binary representations and memory access and knowing what they write.

(The fear of spending too much time on this "small" aspect has vanished. Not that they would not love to delve deep into the matter -- but their boss would not love them to.)

The lines of code and functions dealing with binary data ("raw memory") can easily be found in the source tree by grep-ping for "raw_memory", "peek" and "poke".

While they are working hard to add new features to the (now portable) application, Boost experts (much more skillful than the noob writing these words) continuously update the library with performance tweaks for the latest CPU's and compilers. Simply fetching the latest version of Boost every once in a while ensures the programmers that the resulting code will benefit from benchmarks performed a wide variety of platforms.

Last, but not least, the programmers have a feeling of coming home as once again, they get to use the legendary names "peek" and "poke", that they are familiar with from the days of BASIC.

And one day, even our beloved Microsoft updates the examples in some sections of MSDN by eliminating 0.02% of the casts and producing more portable code.

Installation

You need not wait for the Boost review to be completed and the library to be included in Boost with red carpet, flowers and trumpets.

You can download the .zip archive right now, extract its contents in your local Boost folder and start using it right away.

The file organization follows the Boost conventions:

User's manual

Users of the library need only learn about a single class template: raw_memory.

That single class template has only two member functions (which are public and static): peek and poke.

Additionally, users might want to learn about the byte_swapper class template, which provides the handy reverse member function (which is public and static).

Class template raw_memory

template < typename T, class Endian = natural_endian, template <class> class MemoryXfer = BOOST_DEFAULT_MEMORY_XFER > class raw_memory { public: typedef T value_type; typedef Endian endian_type; typedef MemoryXfer <Endian> memory_xfer_type; static T peek (const void *pvAddress); static void poke (void *pvAddress, T tValue); };

Template argument(s):

Argument Description
T This is the integer type that we are going to work with.

Possible types to use for this argument:
  • boost::uint8_t, boost::uint16_t, boost::uint32_t, boost::uint64_t;
  • boost::int8_t, boost::int16_t, boost::int32_t, boost::int64_t;
  • BYTE, WORD, DWORD, etc.;
  • the fundamental integer types (not recommended);
  • other integer types based on the above.
Endian A tag type specifying the format of the binary representation of the value in memory.

Possible types to use for this argument:
  • little_endian or big_endian;
  • natural_endian or reverse_endian.
MemoryXfer A class template specifying the way values are read from and written to memory, in order to support:
  • data that may not use the native representation of the target machine;
  • data that might not be aligned to the boundaries required by the target machine.
Possible (template) types to use for this argument:
  • checked_memory_xfer (the default value for BOOST_DEFAULT_MEMORY_XFER);
  • unchecked_memory_xfer.

Member function(s):

static T peek (const void *pvAddress);

This function reads the integer (of type T) that is stored (in the format specified by Endian) in the memory locations beginning at pvAddress and returns it.

static void poke (void *pvAddress, T tValue);

This function writes the integer tValue (of type T) in the memory locations beginning at pvAddress (in the format specified by Endian).

Class template byte_swapper

template <typename T> class byte_swapper { public: static T reverse (T tValue); };

Design note regarding usability:

A class template has been preferred to a function template and to overloaded functions in order to force the user to specify the integer type she wishes to work with and thus avoid unwanted behaviour caused by integral promotions and other conversions (e.g. when the argument is the result of the evaluation of an expression).

Template argument(s):

Argument Description
T This is the integer type that we are going to work with.

It is always one of these fixed-size unsigned integer types:
  • boost::uint8_t, boost::uint16_t, boost::uint32_t, boost::uint64_t.

Member function(s):

static T reverse (T tValue);

This function swaps the most-significant bytes of the supplied argument with its the least-significant bytes and returns the result.

It is mostly useful if the value was (somehow) obtained by a read operation from "raw memory" using the "wrong" endian format.

It is also useful as a building block for casting_memory_xfer. When dealing with the reverse_endian format, the casting_memory_xfer policy internally calls byte_swapper <T>::reverse after having read the value from or before writing it to "raw memory".

Examples of usage

A quick overview of the syntax and usage is provided in the "Quick start" section above. The corresponding C++ program can be viewed here. It is included in the distribution archive (in the "libs/raw_memory/example/example-01" subfolder).

An example of what I consider to be the correct way to deal with binary data formats is available here (and in the "libs/raw_memory/example/example-02" subfolder in the distribution archive).

The comments in the code and the output obtained when running the resulting program are meant to help clarify what is wrong with the widely-spread method of dealing with such binary data formats (by casting-to-struct) and how the problems can be avoided(*).

(*) This example contains notes that apply for any programs dealing with binary data formats, whether they use our library or not.

The Guidelines for dealing with binary data formats section below also explains these issues in more detail.

Guidelines for usage

Most of the times, we need not worry about the third template argument (MemoryXfer).

The things that we are interested in are:

(*) We are specifically interested in the length and signedness. Is the integer 8-bit, 16-bit, 32-bit or 64-bit ? Is the integer unsigned or signed ?

We should prefer using typenames that describe these (platform-independent) characteristics.

A very good choice is using the uint8_t, uint16_t, uint32_t, uint64_t types and their signed counterparts (int8_t, int16_t, int32_t, int64_t).

(These types are available in the boost namespace via inclusion of the <boost/cstdint.hpp> header or in the std or the std::tr1 namespace via inclusion of the <cstdint> header or in the global namespace via inclusion of the <stdint.h> header.)

Using the BYTE, WORD, DWORD types from the Windows API and/or even more descriptive ones such as UINT32, UINT64, INT32, INT64 types ("The new data types") is a good choice in its spirit, except for portability to non-Windows world. It is good because the typenames describe exactly the number of bits and whether the types are signed or unsigned (thus conforming to the idea of processing binary data formats) and the portability issue can easily be fixed with a few typedef's.

Using fundamental types such as "int" or "long" is not such a good idea. The signedness is known, but the number of bits in the representation (the range of values) is only loosely specified by the Standard. Using the fundamental types (in the sections of code dealing with binary data formats) defeats the purpose of portability.

For example, the "unsigned long" data type is 32-bit on most Windows compilers, but it is either 32-bit or 64-bits on most Linux compilers.

We express the things in the checklist above as arguments (template arguments for the raw_memory class template and function arguments for the peek and poke static member functions).

We should also prefer using unsigned types as much as possible. Their behaviour is much more predictable than that of the signed counterparts. Unsigned types obey the rules of arithmetic modulo 2n, while the behaviour of negative signed integers on conditions such as overflow is deemed "implementation-defined" by the C++ Standard2.

2 Most or all platforms in use today use the 2's complement representation of signed integers. On most platforms, the conversion of an unsigned integer value to a signed integer value of the same bit-length will leave the binary representation unchanged and only choose to reinterpret it as "2's complement signed value".

The default memory transfer type is configured in the (internal) header file boost/raw_memory/config.hpp. This configuration is done only once for each platform that the library is tested on, by the maintainers of the library.

The default memory transfer type allows safe reading and writing of any supported values from and to memory.

When we are dealing with an array of values (i.e. contiguous values) and performance is critical, we might want to check whether the data is aligned (before entering the loop) and provide two branches of code:

When performance is critical, we should conduct benchmarks and use them as a basis for our optimizations.

Guidelines for dealing with binary data formats

When specifying a binary data format (e.g. in the documentation, on the blackboard, on the whiteboard, on a piece of paper, in our code) we should make a clear separation between the data type representation in the programming language (e.g. a struct) and the actual binary format.

The data type (e.g. the struct) used in the programming language might have an entirely different format on another machine, on another compiler, on the next version of the same compiler or even on the same compiler (when using different switches).

So the last thing that we want to do is cast a pointer-to-an-array-of-bytes to a pointer-to-a-struct-type. Even if someone tells us "We have always done it like this and it has been working all this time !" (*). Even if code examples doing just this are abundent at Programming courses in the Computer Science Universities and in some manuals.

(*) What they really mean is: "It has been working so far on our machines and with our compilers, but only if we use these compiler settings and we add these [unportable] #pragma's..."

If we try to point out this silent aspect of their message, their next reply will probably sound similar to Illidan's "You'll regret approaching me !".

This technique only works if the stars are aligned.

An example of inexact specification is: "The format of the file header of Win32 executable files is specified by the IMAGE_FILE_HEADER structure in <WinNT.h>:"

typedef struct _IMAGE_FILE_HEADER { WORD Machine; WORD NumberOfSections; DWORD TimeDateStamp; DWORD PointerToSymbolTable; DWORD NumberOfSymbols; WORD SizeOfOptionalHeader; WORD Characteristics; } IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

An example of correct specification is available in the official documentation (pecoff_v8.doc or pecoff_v8.docx) published by Microsoft at http://msdn.microsoft.com/en-us/windows/hardware/gg463119. (Section 3.3 documents the file header.)

Another example of correct specification is the FAT on-disk format published at http://msdn.microsoft.com/en-us/windows/hardware/gg463080.

We can notice that these specifications describe each field in each structure by denoting its offset, its byte-length and whether it is a signed or an unsigned integer.

Also, the endianness of the data representation is clearly documented.

We should not worry that the code that we have to write to portably deal with binary data formats takes a little more typing than the one produced by the afore-mentioned "technique". It is the same as with the battle between the old, C-style cast operator and the C++ cast operators. We are highlighting the fact that we are doing something special here by using a (somehow) verbose syntax.

(Typedef's for the used instantiation(s) of raw_memory can help reduce that verbosity to a minimum.)

Please see example-02 for an example involving the handling of the header of the BMP graphics file format.

Extender's manual

Users of the library need not study this section of the documentation. However, it may be instructive and fun ! :geek:

Guidelines for creating a custom MemoryXfer class template

A custom MemoryXfer class template might be needed to provide better performance on a specific hardware/software architecture.

The MemoryXfer class template is used as a policy class by raw_memory. Thus, the signatures of functions need not exactly match the example given below.

A class template that is a model of the MemoryXfer concept has a single mandatory template argument (Endian) that is one of the type tags little_endian and big_endian (or, equivalently, but not necessarily respectively, natural_endian and reverse_endian).

Other template arguments of the class template may be used, as long as default values are provided for them.

Thus, here is what the declaration of the class template might look like:

template <class Endian> class MyMemoryXfer { protected: ~MyMemoryXfer () {} public: // ... Please see below ! }

(It is good practice to mark the destructor of a policy class as protected, in order to avoid the possibility of an implicit conversion of a pointer-to-a-mixin-class to a pointer-to-a-base-policy-class followed by object destruction via the resulting pointer, which leads to undefined behaviour as the destructor is not virtual. A more clear explanation is given by Herb Sutter in his September 2001 CUJ article "Virtuality".)

Template argument(s):

Argument Description
Endian A tag type specifying the format of the binary representation of the value in memory.

Possible values to use:
  • little_endian or big_endian;
  • natural_endian or reverse_endian.

Note on member function(s):

Before describing the peek and poke member functions, we must make a very important note.

The type of the pointer argument passed by raw_memory to its MemoryXfer policy class is const T * or T * (with T being one of the fixed-size unsigned integer types, i.e. uint8_t, uint16_t, uint32_t or uint64_t).

This is misleading.

The raw_memory functions are always given untyped pointers, i.e. pointers the type of which is either const void * or void *.

The raw_memory functions always perform a shameless reinterpret_cast to const T * or T * of the pointers they are given, in order for the "correct" MemoryXfer member function overload or member function specialization to be called.

But the MemoryXfer member functions should not assume that they can blindly dereference the given pointer, either for reading or for writing. Rather, they should treat their pointer arguments as pointers to const void or to void which just happen to have been reinterpret_cast'ed(*).

(*) Why have we employed such a confusing convention ? As a (rather low-level) optimization, of course !

Testing and analysis of the generated machine code showed that some compilers did not optimize away the passing of a type tag argument (whether it was a null pointer carrying type information or an empty object constructed using the Loki::Type2Type mechanism). Some of the compilers even introduced pessimizations that are too horrid to describe here.

Benchmarks have proven that our newly-employed convention really helped them get on par with the "behaving" compilers.

Another note on member function(s):

The only types that need to be supported by a model of the MemoryXfer concept are the fixed-size unsigned integer types:

The raw_memory functions only call MemoryXfer functions for these types. Any conversions (e.g. from/to signed versions, i.e. int8_t, int16_t, int32_t, int64_t) are performed outside MemoryXfer functions (within raw_memory functions).

This choice has simplified the code of the provided MemoryXfer models and has also proven to be a significant optimization in some cases(*).

(*) It turned out that some compilers did not inline calls to member template functions. Thus, instead of using member templates, we just provided non-template overloaded peek and poke functions for the (only) 4 fixed-size unsigned integer types. All the compilers we used were able to inline calls to those.

Member function(s):

// Either something like this: template <typename T> static T peek (const T *p); // or something like this: static uint8_t peek (const uint8_t *p); static uint16_t peek (const uint16_t *p); static uint32_t peek (const uint32_t *p); static uint64_t peek (const uint64_t *p); // or other variations that can be called with similar syntax.

The peek functions provide functionality to read the integer value (of type T, one of uint8_t, uint16_t, uint32_t, uint64_t) from memory cells pointed to by the p argument.

// Either something like this: template <typename T> static void poke (T *p, T t); // or something like this: static void poke (uint8_t *p, uint8_t t); static void poke (uint16_t *p, uint16_t t); static void poke (uint32_t *p, uint32_t t); static void poke (uint64_t *p, uint64_t t);

The poke functions provide functionality to write the integer value t (of type T) to the memory cells pointed to by the p argument.

Benchmarks and platform/toolset-specific notes

Overview

We have used various platform/toolset-specific optimizations (and optimization attempts) in an effort to squeeze more performance for low-level tasks such as reading from and writing to "raw memory".

In the source code of this library, templates and policy-based class design meet Assembler, compiler intrinsics and calling convention specifiers.

We have benchmarked the results of the "tricks" that we employed against the results generated by regular C++ compiled code, in order to determine whether the tricks should be enabled (and, more precisely, for which types / endian formats / alignment constraints they should be enabled).

Our simple test looks like this:

// Test the performance of byte_swapper: { ... const T xDiff = ...; T x = ...; T xResult = 0; for (unsigned iSample = 0; iSample < nSamples; ++iSample, x += xDiff) { const T y = boost::byte_swapper <T>::reverse (x); const T z = boost::byte_swapper <T>::reverse (y); if (z != x) ... // Report a failure. xResult += y; } ... } // Test the performance of raw_memory <...>::peek and poke: { for (...) // "A few" repetitions. { const T t = *ptExpected + 25; raw_memory <T, Endian, MemoryXfer>::poke (pb, t); if (raw_memory <T, Endian, MemoryXfer>::peek (pb) != t) ... // Report a failure. } ... }

As you can see, the first part of the test attempts to measure the efficiency of the byte_swapper <...>::reverse function. This directly influences the efficiency of peek's and poke's using casting_memory_xfer <reverse_endian> and thus the corresponding barcharts are grouped together in the upper half of the sub-table for the corresponding integer type.

The second part of the test attempts to measure both read (peek) and write (poke) performance (at the same time). Generally, these results have guided our choices regarding low-level-MemoryXfer-policy-to-high-level-MemoryXfer-policy mapping. The timings are grouped together in the lower half of the per-type sub-tables.

Specific applications might require other benchmarks. Users can easily select a different MemoryXfer policy that best suits the results of their own benchmarks.

Quick guide to the barcharts below

The red barcharts correspond to the machine code generated from pure C++ source code, with none of our "tricks" employed -- the "normal" version.

The green barcharts correspond to the machine code generated from hand-written Assembler and/or compiler intrinsics -- the "tricky" version.

In cases where the "tricky" version proved to perform worse than the "normal" version, we have not employed our "tricks" for those combinations of T, Endian, MemoryXfer and/or alignment hint(*).

(*) For reasons that are completely different for each toolset (as noted below), the uint16_t versions of our "tricks" have proven out not to be significant optimizations or optimizations at all for some of the compilers that we used.

The tricks have been disabled in such cases, but we have retained the results of the previously performed benchmarks in order to show them here (and attempt to explain them using our limited knowledge).

(The updated benchmarks do not retain these results.)

Test system

I have performed the benchmarks using my most loyal computer, an Acer TravelMate 4401 with AMD Turion 64 ML-28 (mobile version of the Socket 754 single-core Athlon 64) CPU running at 1600 MHz (a rather modest configuration by some "modern" standards). The operating system is Windows XP Professional x64 Edition. And the name of the computer is "SyndiMobile" (previously: "PREVXAMD64"), just so you know.

Notes for x86/x64 platforms

Unaligned access to data is supported(*) and, unless additional operations need to be performed (e.g. when using the big_endian format), casting_memory_xfer provides very good performance.

(*) Unless the Alignment Check flag (bit 18 of EFLAGS/RFLAGS) has been (Machiavellically) set.

16-bit byte swapping of a value in a general-purpose register can be implemented easily(*) by just exchanging the value of the less significant byte with the value of the more significant byte, e.g.: xchg al, ah.

(*) Another method that has been tested was wrap-around-rolling the bits of the register 8 position to the left (e.g.: rol ax, 8) or to the right (e.g.: ror ax, 8).

However, for various reasons that are discussed below, we have instead relied on the machine code generated by compilers from the portable C++ source code: return (x >> 8) | (x << 8);.

32-bit values can be byte-swapped efficiently using the bswap instruction that was introduced in the 486 generation of IA-32 CPU's.

64-bit values can be treated as pairs of 32-bit values. The individual bytes in each 32-bit value are shuffled using bswap and the 32-bit values themselves are swapped with each other (e.g.: xchg eax, edx).

The x64 machine code for 64-bit byte swapping is very elegant. A single 64-bit bswap instruction replaces some 70 bytes of machine code that are happily generated by the compiler from the portable C++ source.

Various calling conventions exist in the x86/x64 realm. When inlining of the function call was not possible (as described below), we tried to at least use a more efficient calling convention, e.g. to pass the arguments via registers (instead of the in-memory stack) or to clean the stack in the called function (thus saving a little space for each call).

Calling conventions (not just for x86/x64, but also for Alpha, MIPS, PowerPC and the mighty IA-64) are discussed, among other places, in the following articles:

(*) If clicking on the link does not scroll the text somewhere inside the "Comments" section, please search for the text "Is is true that" in the page.

Results and notes for Borland/Inprise/Borland/CodeGear/Embarcadero C++Builder

The Borland C++ compiler was one of the first to support template template arguments(*).

(*) Although in Boost.Config the BOOST_NO_TEMPLATE_TEMPLATES macro ends up being defined for versions earlier than C++Builder 2006 Update 2, even the "free command-line tools" based on C++Builder 5.5 (released in 2000) support this feature well enough for our library.

Although the documentation mentions the availability of compiler intrinsics that could have been useful for byte swapping (e.g. for 16-bit wrap-around rotation), we were unable to use them.

Inline Assembler is not allowed in inline functions. This has two effects:

(*) Borland's __fastcall convention uses the EAX, EDX and ECX registers to pass the first 3 integer arguments. This is different from (and arguably more efficient than) Microsoft's __fastcall convention, which uses ECX and EDX. Borland compilers can use Microsoft's __fastcall convention, but they call it __msfastcall.

Neither the __fastcall convention, nor the __msfastcall one pass 64-bit integer arguments in registers. However, we still get the benefit of the stack being cleaned up by the called function.

(**) The BOOST_RAW_MEMORY_SEPARATE_SOURCE macro (documented in comments in <boost/raw_memory/config.hpp>) can be forced to a value of 0 in order to obtain a header-only library (thus giving up the Borland-specific optimizations). Doing so is not recommended.

As can be seen in the table, our "tricks" have done more harm than good when working with 16-bit values. Thus, they have only been employed for the 32-bit and 64-bit cases, where they provided quite a visible performance boost.

The back-end of the compiler does not seem to have suffered major modifications "lately". While we have tested with all the versions released from 2000 up to 2007, the results below are the ones obtained with the "free command line tools" (C++Builder 5.5, released in 2000). Later versions that we had generated very similar timings.

The raw_memory.cpp source file needs to be compiled with support for 486 or later instruction set. The provided Jamfile does indeed pass the /6 command line argument to the bcc32.exe compiler (thus actually enabling support for the P6, a.k.a. Pentium Pro generation), but if you are not using Boost Build System v2, please make sure that you pass the /4, /5 or /6 switch to the compiler from your build environment.

The results of benchmarks obtained with the initial version of the library are available here (as well as in the "doc" subfolder of the initial archive) for reference.

In Update-01, we employed an additional trick: a work-around for the fact that 64-bit integers are not passed via registers in the __fastcall and __msfastcall conventions.

Also, to work-around the compiler's unstoppable desire to spill 64-bit integers to memory only to reload them back (e.g. when entering the body of an inlined function), we have come up with two new tricks that improve speed:

The results are depicted in the following table (with tricks disabled for 16-bit integers):

Integer typeTestVersionResultTable alignment barchart
 
uint16_tbyte_swapperNormal 0.447Red barchart
Tricky 0.443Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.440Red barchart
Tricky 0.436Green barchart
AlignedNormal 0.341Red barchart
Tricky 0.343Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.211Red barchart
Tricky 0.211Green barchart
AlignedNormal 0.195Red barchart
Tricky 0.211Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.210Red barchart
Tricky 0.219Green barchart
AlignedNormal 0.260Red barchart
Tricky 0.260Green barchart
 
uint32_tbyte_swapperNormal 1.711Red barchart
Tricky 1.180Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.563Red barchart
Tricky 0.449Green barchart
AlignedNormal 0.502Red barchart
Tricky 0.369Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.275Red barchart
Tricky 0.274Green barchart
AlignedNormal 0.193Red barchart
Tricky 0.209Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.356Red barchart
Tricky 0.354Green barchart
AlignedNormal 0.377Red barchart
Tricky 0.338Green barchart
 
uint64_tbyte_swapperNormal 4.963Red barchart
Tricky 1.717Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 1.268Red barchart
Tricky 0.587Green barchart
AlignedNormal 1.185Red barchart
Tricky 0.424Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.391Red barchart
Tricky 0.397Green barchart
AlignedNormal 0.253Red barchart
Tricky 0.261Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 1.181Red barchart
Tricky 0.587Green barchart
AlignedNormal 1.137Red barchart
Tricky 0.424Green barchart
 

Results and notes for Digital Mars C++

Digital Mars C++ is the successor of the very first "native" C++ compiler, Zortech C++ (later known as Symantec C++).

A function that:

and

is never actually expanded inline (but instead is called as a non-inline function).

However, such a function does retain the source code organization properties of functions marked as inline (e.g. duplicate bodies are eliminated from the object files and thus no separate source code file is needed).

No function calling convention that passes arguments via registers is really supported. The _fastcall and __fastcall keywords are (silently) ignored. The fastcall and __msfastcall keywords generate compile errors.

The next best thing was using the __stdcall convention, which at least has the called function clean up the stack (instead of the calling function).

I was not able to find any compiler intrinsics to use for this library. The <cstdlib> / <stdlib.h> header declares _rotl, _rotr, _lrotl and _lrotr, but these are library functions (not compiler intrinsics) and only apply to 32-bit values (unsigned int, unsigned long) anyway(*).

(*) A 16-bit left or right rotation intrinsic would have been useful to implement byte swapping of 16-bit integer values.

So while we do not need a separate source code file, we only have 2 options when it comes to speed:

or

(*) The compiler did not inline calls to template member functions. Thus, we switched from using such lovely beasts to using overloaded member functions, with great improvments of the performance of the resulting machine code (and no significant drawbacks, since only 4 fixed-size unsigned integer types must be supported by the lower levels of this library).

We have performed benchmarks to determine which option is faster.

It turns out that for 16-bit values, the cost of a function call outweighs the performance benefit of replacing C++ code with Assembler code, but for 32-bit and 64-bit values we gain a real benefit, despite the overhead of a function call.

We used our test computer and version 8.52 of the compiler (released in May 2010).

For the results obtained with the initial version, please click here (or consult the "doc" subfolder in the initial archive).

In Update-01, we benchmarked the additional tricks considered for Borland/CodeGear C++Builder and concluded that it was best to keep the current code for the reverse function while also providing an Assembler version for poke (but not also for poke).

Here are the new results (with tricks disabled for the 16-bit integer operations and for the 32-bit reverse function):

Integer typeTestVersionResultTable alignment barchart
 
uint16_tbyte_swapperNormal 0.694Red barchart
Tricky 0.633Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.211Red barchart
Tricky 0.207Green barchart
AlignedNormal 0.211Red barchart
Tricky 0.207Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.082Red barchart
Tricky 0.097Green barchart
AlignedNormal 0.081Red barchart
Tricky 0.098Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.122Red barchart
Tricky 0.124Green barchart
AlignedNormal 0.211Red barchart
Tricky 0.207Green barchart
 
uint32_tbyte_swapperNormal 1.010Red barchart
Tricky 1.011Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.324Red barchart
Tricky 0.273Green barchart
AlignedNormal 0.321Red barchart
Tricky 0.275Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.081Red barchart
Tricky 0.081Green barchart
AlignedNormal 0.081Red barchart
Tricky 0.081Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.324Red barchart
Tricky 0.273Green barchart
AlignedNormal 0.321Red barchart
Tricky 0.275Green barchart
 
uint64_tbyte_swapperNormal 2.847Red barchart
Tricky 1.641Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.724Red barchart
Tricky 0.489Green barchart
AlignedNormal 0.630Red barchart
Tricky 0.303Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.346Red barchart
Tricky 0.326Green barchart
AlignedNormal 0.113Red barchart
Tricky 0.112Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.724Red barchart
Tricky 0.489Green barchart
AlignedNormal 0.630Red barchart
Tricky 0.303Green barchart
 

Results and notes for GCC

Starting with version 4.3, compiler intrinsics are available for byte swapping of 32-bit and 64-bit values. They are called __builtin_bswap32 and __builtin_bswap64.

They work extremely well for the x86/x64 architecture.

For other architectures, should the code run slower than the "unoptimized" version (which is always available by defining the BOOST_RAW_MEMORY_TRICKS macro as 0, as explained in the comments of <boost/raw_memory/config.hpp>), one might find the following article rather interesting: http://hardwarebug.org/2010/01/14/beware-the-builtins/.

We used the following version for our tests: 4.3.4 20090804 (release) 1. Initial results are available here and in the initial archive (the "doc" subfolder).

For Update-01, we attempted to improve performance in the absence of compiler intrinsics (e.g. to support older versions of the compiler or misbehaving intrinsics).

GCC makes it possible to use inline Assembler in inlined function bodies and thus we were able(*) to equal the results of the 32-bit integer and 64-bit integer intrinsics and to surpass the results of the 16-bit integer code (for which intrinsics are not available):

(*) Currently, only for the x86 platform, but we hope that similar results can be achieved for other platforms as well. (-:

Integer typeTestVersionResultTable alignment barchart
 
uint16_tbyte_swapperNormal 0.484Red barchart
Tricky 0.375Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.125Red barchart
Tricky 0.109Green barchart
AlignedNormal 0.125Red barchart
Tricky 0.109Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.046Red barchart
Tricky 0.047Green barchart
AlignedNormal 0.047Red barchart
Tricky 0.047Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.110Red barchart
Tricky 0.094Green barchart
AlignedNormal 0.125Red barchart
Tricky 0.109Green barchart
 
uint32_tbyte_swapperNormal 0.938Red barchart
Tricky 0.187Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.281Red barchart
Tricky 0.078Green barchart
AlignedNormal 0.265Red barchart
Tricky 0.062Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.047Red barchart
Tricky 0.047Green barchart
AlignedNormal 0.047Red barchart
Tricky 0.047Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.281Red barchart
Tricky 0.078Green barchart
AlignedNormal 0.265Red barchart
Tricky 0.062Green barchart
 
uint64_tbyte_swapperNormal 1.954Red barchart
Tricky 0.531Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.593Red barchart
Tricky 0.125Green barchart
AlignedNormal 0.610Red barchart
Tricky 0.110Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.344Red barchart
Tricky 0.359Green barchart
AlignedNormal 0.094Red barchart
Tricky 0.093Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.593Red barchart
Tricky 0.125Green barchart
AlignedNormal 0.610Red barchart
Tricky 0.110Green barchart
 

Results and notes for Microsoft Visual C++

Compiler intrinsics that perform the exact byte swapping that we need are available for 16-bit, 32-bit and 64-bit values:

Yuuuppppiiiiiii ! They provide excellent performance by combining highly optimized generated machine code with other back-end optimizations enabled by function inlining...

... except when they don't !

The 16-bit version, _byteswap_ushort, seems to prevent other compiler optimizations. The optimizer is clever enough to use byte-register swapping (or direct assignment of low-byte value to high-byte register and viceversa) even in the absence of the intrinsic, anyway. Thus, we are not using it(*) any more.

(*) "It" = the intrinsic, not the optimizer.

The Visual C++ .NET 2003 back-end manifested a serious bug: when building the release version (e.g. with /O2), the compiler seemed to get confused regarding which registers were used to store the result of the _byteswap_uint64 intrinsic. Our work-around is based on a suggestion from Mr. Gary Chang's answer to Mr. Ivan S. Warren: http://us.generation-nt.com/answer/possible-vc7-1-c-plus-plus-optimizer-error-intrinsic-byteswap-uint64-when-compiler-enters-register-contention-0-1-help-10167812.html.

Differently from the solution suggested there, we used another intrinsic (_ReadWriteBarrier) that helped the compiler generate optimal and correct machine code.

However, while working on Update-01 (e.g. while studying some interesting pieces of code presented by Tymofey and Phil Endecott), we noticed that sometimes the bug manifests itself in a different way: an "internal compiler error" is reported when the source code is processed. Currently, the work-around has been to separately compile the reverse function so that calls to it are not inlined anymore.

Unfortunately, as expected, this results in a loss of performance. We tried to mitigate this by employing a similar trick as for Borland/CodeGear C++Builder in order to pass the 64-bit integer value via registers (despite the fact that Microsoft's __fastcall convention does not naturally support this, any more than Borland's).

Additional efforts to mitigate the slow-down by hacking the peek and poke functions of casting_memory_xfer <reverse_endian> only resulted in additional slow-down. (-:

The inline version of <reverse> has been kept and is available(*) for users that need the extra performance with this particular version of the compiler and who do not encounter the "internal compiler error".

(*) It can be selected by forcing the BOOST_RAW_MEMORY_SEPARATE_SOURCE macro to 0 before including <boost/raw_memory.hpp>. The conditional guard might look like this: #if defined (_MSC_VER) && _MSC_VER < 1400.

The bug was not present in later versions of the compiler.

Currently, we have 3 result sets for this family of compilers:

(Later versions will also be included. It is just too hot in the summer for me to stress my beloved laptop with benchmarks.)

As expected, x64 code really shined at handling 64-bit values. This may sound childish, but it is a pleasure to actually see the improvements made possible by the AMD64 architecture.

Microsoft Visual C++ 7.1 (2003)

The initial results have been kept here and in the "doc" subfolder of the initial archive. Here are the updated results (with inlined reverse code for uint64_t):

Integer typeTestVersionResultTable alignment barchart
 
uint16_tbyte_swapperNormal 0.253Red barchart
Tricky 0.252Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.090Red barchart
Tricky 0.090Green barchart
AlignedNormal 0.082Red barchart
Tricky 0.082Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.049Red barchart
Tricky 0.050Green barchart
AlignedNormal 0.049Red barchart
Tricky 0.050Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.090Red barchart
Tricky 0.090Green barchart
AlignedNormal 0.082Red barchart
Tricky 0.082Green barchart
 
uint32_tbyte_swapperNormal 0.952Red barchart
Tricky 0.252Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.292Red barchart
Tricky 0.060Green barchart
AlignedNormal 0.289Red barchart
Tricky 0.059Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.049Red barchart
Tricky 0.049Green barchart
AlignedNormal 0.048Red barchart
Tricky 0.049Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.209Red barchart
Tricky 0.060Green barchart
AlignedNormal 0.209Red barchart
Tricky 0.059Green barchart
 
uint64_tbyte_swapperNormal 3.492Red barchart
Tricky 0.510Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.662Red barchart
Tricky 0.150Green barchart
AlignedNormal 0.660Red barchart
Tricky 0.127Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.048Red barchart
Tricky 0.048Green barchart
AlignedNormal 0.048Red barchart
Tricky 0.048Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.662Red barchart
Tricky 0.150Green barchart
AlignedNormal 0.630Red barchart
Tricky 0.127Green barchart
 

When the code of the 64-bit reverse function is not inlined (to work-around the possible "internal compiler error"), the results change as follows:

Integer typeTestVersionResultTable alignment barchart
 
uint64_tbyte_swapperNormal 3.688Red barchart
Tricky 1.023Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.661Red barchart
Tricky 0.229Green barchart
AlignedNormal 0.662Red barchart
Tricky 0.222Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.049Red barchart
Tricky 0.048Green barchart
AlignedNormal 0.048Red barchart
Tricky 0.049Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.661Red barchart
Tricky 0.229Green barchart
AlignedNormal 0.633Red barchart
Tricky 0.222Green barchart
 

Microsoft Visual C++ 8.0 (2005)

The initial results have been kept here and in the "doc" subfolder of the initial archive. Here are the updated results:

Integer typeTestVersionResultTable alignment barchart
 
uint16_tbyte_swapperNormal 0.315Red barchart
Tricky 0.315Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.099Red barchart
Tricky 0.098Green barchart
AlignedNormal 0.098Red barchart
Tricky 0.098Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.049Red barchart
Tricky 0.051Green barchart
AlignedNormal 0.049Red barchart
Tricky 0.050Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.099Red barchart
Tricky 0.098Green barchart
AlignedNormal 0.098Red barchart
Tricky 0.098Green barchart
 
uint32_tbyte_swapperNormal 0.695Red barchart
Tricky 0.252Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.178Red barchart
Tricky 0.066Green barchart
AlignedNormal 0.178Red barchart
Tricky 0.065Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.033Red barchart
Tricky 0.034Green barchart
AlignedNormal 0.033Red barchart
Tricky 0.034Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.178Red barchart
Tricky 0.066Green barchart
AlignedNormal 0.178Red barchart
Tricky 0.065Green barchart
 
uint64_tbyte_swapperNormal 1.584Red barchart
Tricky 0.443Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.398Red barchart
Tricky 0.329Green barchart
AlignedNormal 0.395Red barchart
Tricky 0.143Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.048Red barchart
Tricky 0.048Green barchart
AlignedNormal 0.049Red barchart
Tricky 0.048Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.398Red barchart
Tricky 0.329Green barchart
AlignedNormal 0.395Red barchart
Tricky 0.143Green barchart
 

Microsoft Visual C++ 8.0 (2005) for x64

The initial results have been kept here and in the "doc" subfolder of the initial archive. Here are the updated results:

Integer typeTestVersionResultTable alignment barchart
 
uint16_tbyte_swapperNormal 0.253Red barchart
Tricky 0.253Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.074Red barchart
Tricky 0.074Green barchart
AlignedNormal 0.065Red barchart
Tricky 0.066Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.049Red barchart
Tricky 0.049Green barchart
AlignedNormal 0.050Red barchart
Tricky 0.050Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.074Red barchart
Tricky 0.074Green barchart
AlignedNormal 0.065Red barchart
Tricky 0.066Green barchart
 
uint32_tbyte_swapperNormal 0.697Red barchart
Tricky 0.252Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.161Red barchart
Tricky 0.059Green barchart
AlignedNormal 0.163Red barchart
Tricky 0.061Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.034Red barchart
Tricky 0.033Green barchart
AlignedNormal 0.033Red barchart
Tricky 0.037Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.161Red barchart
Tricky 0.059Green barchart
AlignedNormal 0.161Red barchart
Tricky 0.061Green barchart
 
uint64_tbyte_swapperNormal 1.287Red barchart
Tricky 0.252Green barchart
casting_memory_xfer <reverse_endian>UnalignedNormal 0.315Red barchart
Tricky 0.065Green barchart
AlignedNormal 0.316Red barchart
Tricky 0.064Green barchart
 
unchecked_memory_xfer <natural_endian>UnalignedNormal 0.049Red barchart
Tricky 0.049Green barchart
AlignedNormal 0.048Red barchart
Tricky 0.050Green barchart
unchecked_memory_xfer <reverse_endian>UnalignedNormal 0.315Red barchart
Tricky 0.065Green barchart
AlignedNormal 0.316Red barchart
Tricky 0.064Green barchart
 

Acknowledgements

The following pieces of documentation really helped me, so I will mention them here as my silent (but loyal :x) companions:

Initial version (2011-09-05)

The initial public version of the library was released on 2011-09-05 to the Boost mailing list. It can still be downloaded from the following location:

http://adder.iworks.ro/Boost/RawMemory/Boost_RawMemory_00.zip

(The initial version of this documentation can be found inside the "doc" subfolder in the archive.)

Update-01 (2011-09-12)

One can search for Update-01 in this document in order to locate the text that has been updated. We hope that this helps save some time for people who have already read the initial version.

Download location:

http://adder.iworks.ro/Boost/RawMemory/Boost_RawMemory_01.zip

Compiler support and further speed-ups

Borland/Inprise/Borland/CodeGear/Embarcadero C++Builder:

Digital Mars C++:

GCC:

Microsoft Visual C++:

Examples

Example-02 was added to illustrate the difference between an incorrect (but wide-spread) "technique" of dealing with binary data formats and the approach that we consider correct and that we recommend. It is meant to complement the Guidelines for dealing with binary data formats.

Benchmarking

The number of repetitions performed in our benchmarks has been greatly reduced. Thanks to using the "high performance counter" when timing the results, the loss of precision should be minimal.

Since our benchmarks have been running much faster, it has been much easier to test the countless modifications to our performance tuning tricks. I believe this has resulted in faster machine code being generated for the user's application (and also a more pleasant time for the developers of the library, namely myself and I).

(It is very easy to change back the number of repetitions as per the initial version of the library.)

Another side effect is that the scale of the barcharts in the Benchmarks section has changed.

Documentation updates

Comments:

Post a comment: