A zoo of strings in your C++ code?

Sometimes evolution comes up with strange solutions

CryEngine2 used its own CString class for working with strings, plus a bit of the standard Windows string library. As far as I remember, the latest CryEngine still uses the same CString: inside it changed radically, but as a nod to history the class name was kept, while its functionality was greatly expanded. I'm not 100% sure whether CString was used only in the editor or in the game runtime too — you can check the sources yourself, they're still available on GitHub. This is one approach to strings, fairly common in the gamedev world: when we write everything we need ourselves, without looking around… though "glancing" at existing implementations and dragging the best bits into the project is the more fitting word.

There's another approach too… I worked on a team building a project that was supposed to ship on consoles, and at some point an "effective" team lead joined who was great at pretty presentations and pushed through using std::string from the SDK. All the very experienced programmers, the seniors and management nodded importantly at the meeting and agreed to convert everything to std::string… they turned out not to be quite as experienced as it seemed. In the end we replaced most of the CString with std::string. I wouldn't say it affected compile time much — a minute or so on a project that builds in twenty minutes makes no difference — but it also turned our fairly readable codebase into a tangled nightmare. Maybe it was better for portability, but neither our project nor the CryEngine2 Editor was ever ported to Linux or any other platform.

Ten years have passed, and I see exactly the same situation on my current project — a new team lead decided to convert the local MySuperPupeString to std::string. Already sensing the consequences in my gut, I'm stocking up on popcorn and taking the next month off after the decision is made. But that's not what's interesting — what's interesting is which strings can actually live in your C++ code.

char*

Ritchie fiddling with strings

A wild wolf with a minimal set of instincts — a pointer and the terminator '\0', which are more than enough for it to survive in the wild Linux steppes, the Android forests, and generally anywhere there's even a semblance of a processor. In the beginning there was BCPL, where there were no types at all — a primordial world where all data were the same beasts, just chunks of memory with no passports or social status. In 1969 Ken Thompson created the language B for the first version of UNIX on the PDP-7. It was also typeless — "C without types", as it would later be remembered. And in C, Ritchie laid down a brilliant idea: a variable's declaration looks the same as its use in an expression. If you write char str, then in code you use str to get a character. At the time this wasn't an abstraction (that's what they'd call it later) but a direct reflection of how the hardware, and especially memory, works. So char* isn't a "string" at all, just a pointer to a byte of data in memory. In that sense the wolf doesn't think of a flock of sheep as an array — it just sees the first sheep and knows there are surely more grazing further on. To figure out where the food… I mean the string, ends, you only need to agree on a signal. BCPL used the length in the first byte, and Pascal later picked up that idea. But C chose the concept of '\0' — a special null byte at the end. It was simpler for the hardware: no need to store an extra length, just walk through memory until you hit a zero. Just like in the wild — the wolf doesn't carry a sheep counter. Same with strcpy(), strlen(): they run through memory until they bump into '\0'.

The quirks of keeping one: low readability — ptr++, *str, strcpy(a, b) — but in exchange the instruction set is minimal and efficient, like a hunting instinct. The syntactic sugar of string literals "hello" and square brackets str[i] didn't appear right away either, but you could always get by with the harsh construct (str + i). It bites your leg off as fast as possible: just go out of the buffer's bounds, forget the terminator, or put it "in the wrong place", and you almost always get a segfault. This wolf has a taste for perversions, but it's on its magic that whole mountains of preprocessor directives, stringification macros like #define STR(x) #x, and even sizeable chunks of code generators are built.

char[N]

The turtle is fast

When Dennis Ritchie created C, he didn't have the luxury of complex abstractions — and luxury didn't exist back then at all. The PDP-11 computers on which the first versions of UNIX were built had a pitiful few kilobytes of memory, and malloc didn't appear right away. Programmers worked with what they had: automatic variables on the stack and static data in the data segment were considered a normal diet. A character array char buf[N] was the natural way to store a string. You declared the array and it immediately took up residence on the stack: no allocations, no frees, no drama. It was that very turtle — a simple and ancient creature that appeared millions of years ago and isn't particularly eager to evolve. Why evolve, if everything works as is.

In the UNIX culture, where programs were small utilities that do one thing well, fixed buffers were everywhere. Reading lines from stdin, parsing configuration, processing temporary data — everywhere stood char buffer[BUFFER_SIZE] with magic numbers like 80, 256, 1024. These numbers were chosen empirically: large enough to usually suffice, but not so large the stack would blow up.

The main magic of fixed arrays is that they live on the stack. Declaring char buf[N] reserves the bytes right there and guarantees the turtle is always home. It never travels to the distant lands of the heap, where loud, unpredictable creatures live and memory is freed whenever it pleases. The turtle just sits in its corner of the stack, does its job, and disappears when the function returns. But even turtles have limits — the stack is finite (a typical size is one to eight megabytes per thread), and the turtle can live only in a small box. If you need too big a box, you'll have to go out into the wild steppe of dynamic allocations, where other, far more nervous and voracious beasts live.

This story reminds us that in programming you don't always need fancy tools. Sometimes a simple array on the stack solves the task better than a smart dynamic string with dozens of glossy methods. The turtle will never outrun the cheetah, but it'll definitely crawl to the finish — predictably and reliably. Therein lies its wisdom and value: simplicity that has worked for fifty years and, in its own opinion, will work for fifty more. Modern programmers sometimes look at such solutions with a faint smile, believing the world has long moved on and it's time to use advanced containers, smart strings, and auto types. But sooner or later everyone hits a situation where you need something small and simple, like a chunk of memory that lives exactly as long as the function does and leaves no loose ends. And then up pops that very char buf[N], which looks like a hello from the seventies and yet still fits the task perfectly.

char literal

Who's to say this isn't a fish?

The very first form of string invented in C was the literal (a string constant, a string literal, or just a literal) — simply a sequence of characters in double quotes. You wrote "hello" in code, the compiler neatly placed those bytes in the executable, and the program got a pointer to them. It was a fish that took up residence in a glass aquarium at the birth of the language: beautiful, swimming before your eyes, but strictly off-limits to touch with your hands.

Literals are constants by nature: they're compiled into the executable and placed in the .rodata section, which the OS marks read-only after the program loads. An attempt to write something there causes an immediate segfault — the fish instantly reminds you that the glass is bulletproof and you shouldn't break the aquarium.

One of the early compiler optimizations was string literal pooling. If the same literal appeared several times in the code, the compiler could create a single copy in .rodata and point all the pointers at it. You wrote "error" in ten places — and the binary might end up with just one instance. All the fish labeled "error" were actually the same fish, viewed by different parts of the program from different angles. The standard didn't forbid this, but didn't guarantee it either, and counting on two "hello"s pointing to the same address was a pastime for especially trusting romantics.

Modern compilers do far more interesting things with literals. Short strings may not reach .rodata at all: the compiler turns them into a set of instructions, and the string "ab" becomes a number loaded into a register in one move. Medium-sized strings can be assembled right on the stack with a series of mov instructions. Only truly long strings live in .rodata as full-fledged aquarium dwellers and are copied via memcpy.

Literals live forever from the program's point of view. They're created when the process is loaded into memory and disappear only when the process ends. You can't free the memory a literal occupies, can't move it, can't change it — and the address stays fixed for the whole life of the process. The fish in the aquarium was born at launch and will swim there to the very end. The fish takes care of itself — there's no need to feed it, clean it, move it, or breed it.

In 2025, five decades after C appeared, literals remain a fundamental part of the language. They didn't evolve like the other string types but stayed as straightforward as they were. The fish in the aquarium doesn't change over generations, but does its job perfectly. It swims behind the glass, pleases the eye, demands nothing, and lives forever.

std::string (C++03/11)

A slouching dog, ordinary, one piece

When C++ and std::string appeared in the eighties, it looked like an attempt to domesticate the wild wolf. They added RAII, methods, automatic memory management — and got something like a dog. Reliable, convenient, with kind eyes, but voracious and not so fast in the hands of an experienced hunter. And in a novice's hands it easily turned into an Australian rabbit that breeds nonstop and litters all of memory to the horizon.

Bjarne Stroustrup, creating C++, took C as a base and bolted classes onto it, but strings turned out awkward. They didn't fit the language's concept, and char* stayed as wild as it was. It bit programmers' legs off at full speed, staged buffer overflows, forgotten null bytes, and the other joys of life inherent to grandpa C. In the language itself strings weren't formalized at all: the C++ philosophy assumed the language just provides tools and abstractions are built by third-party libraries. And that was a charming mistake, because the basic entities really should be defined by the language itself, otherwise a zoo begins.

And the zoo began. Every company wrote its own breed of String and pushed it wherever it could. Microsoft had CString, Borland AnsiString, Qt QString, and each insisted that its dog sat better than the others. And of course compatibility between these "creatures" was about like between a poodle and an Afghan hound — possible in principle, but why? There were so many problems that Stroustrup, in his retrospective on the history of C++, publicly admitted the lack of a standard string type was a problem, but did nothing about it. Then the community, in the persons of Alexander Stepanov and Meng Lee, brought STL — a collection of containers and algorithms, including basic_string as the base for the standard string. It was a revolution: the string finally became a container like vector, only with special semantics, rather than just a pile of data inside.

But the crossbreeding of domestic and wild breeds was already unstoppable, and every framework felt obliged to outdo std::string in at least something, even if the results were dubious. Meanwhile real C folks still look at std::string as a spoiled decorative lapdog with no place in the harsh wild, where only the strongest survive — or maybe the simplest?

std::string's readability is excellent — until you decide to open the implementation in the standard library. Even simple string concatenation turns into a call to an overloaded plus operator, and if you fancy shooting yourself in the foot a little, there are plenty of classic problems with iterator invalidation, sudden reallocation when data changes, or the cheerful and sometimes unpredictable behavior of SSO — the little tail that sometimes wags the whole dog.

std::string_view

A thin dog, three pieces

The joy of domesticating and taming wolves was quickly spoiled by one small annoyance: the dogs started getting fat from overeating. By the mid-2000s std::string had become the standard, but the dog had developed a serious problem — it loved eating too much, or rather copying. Every time a string was passed to a function via const std::string&, there was a risk that inside the function someone clever would create a new string from a literal for comparison and inevitably trigger a temporary allocation. The dogs overate yet again, and performance suffered.

bool compare(const std::string& s1, const std::string& s2) {
    return s1 == s2;
}

std::string str = "hello";
compare(str, "world"); // Creates a temporary std::string, an allocation

Before a standard string_view appeared, the community was busy trying to breed a thin-dog breed. Thus appeared boost::string_ref — an early proposal for non-owning references to strings. As usual, various breeders immediately started crossbreeding their critters and putting them on show: for example, absl::string_view in the Abseil library. Qt got its own string_ref plus separate CoW strings. They all solved the same problem — how not to copy when you only need read-only access. These were the first wild cats that no one really tamed, and each library raised its own breed. In C++17 the cat was finally let in officially as std::string_view. It mostly repeated the Boost implementation. The cat doesn't own memory at all — it only looks at it from the height of its fluffy independence.

The cat's advantages are obvious — it doesn't drag food home but eats where it found it. And it can eat any fish you point a finger at.

std::string str = "this is my input string";
std::string_view sv(&str[11], 2); // "my" — no copy

void print(std::string_view sv) {
    std::cout << sv;
}

print("literal");           // OK
print(std::string("str"));  // OK
print(some_char_ptr);       // OK

The string_view cat came into the C++ house not to drive out the dog. It simply shows that sometimes ownership isn't needed and it's enough to just look. "Just looking" is now the whole philosophy of C++17: less copying, more smart view types, and more performance through zero-cost abstractions and little hacks. The syntax is quite readable, and modern C++ textbooks happily pet this cat too. After all, it's harder to shoot yourself in the foot with it — it doesn't own memory, does no allocations, and generally behaves itself. But cats have their quirks: specific problems with dangling references appeared if the source string passed into the other world. People now write high-performance code on string_view, cleverly arranging the lifetime of full-fledged strings. But the view didn't start extending the data's life because of this — it's still the same cat, independent and able to leave at any moment, leaving you alone amid someone's memory.

std::pmr::string

Someone eats too much

By the early 2010s std::string was a good dog for most tasks, but it had one small tragedy — it couldn't choose where to eat at all. Attempts to breed a thin-dog line had already produced cats, as told above, but a different beast was needed — a type that could work equally deftly with different allocators without throwing a tantrum. The trouble is that in std::string the allocator was a compile-time parameter. Pick a different allocator for an ordinary string, and you got a new data type that looked at the old code like a distant relative and didn't recognize it. It looked like a dog trained to eat from only one bowl its whole life. And to switch it to another bowl, say a custom pool allocator in a game, you had to start a whole new breed. For embedded systems and games such a zoo-farm is somehow tolerable, but for general use this exotica no longer fit, because the number of string types grew faster than the rabbit population in spring.

In 2017 the C++ committee looked at this kennel and decided it was time to bring order. They took the extended-allocators proposal from Boost and introduced Polymorphic Memory Resources into the standard, giving us polymorphism for allocators via virtual functions. They had to create a new type, std::pmr::memory_resource, as an abstract base class, but it was already a serious step forward compared with the old mess of incompatible types.

class memory_resource {
public:
    void* allocate(size_t bytes, size_t alignment);
    void deallocate(void* ptr, size_t bytes, size_t alignment);
    bool is_equal(const memory_resource& other) const;
private:
    virtual void* do_allocate(size_t, size_t) = 0;
    virtual void do_deallocate(void*, size_t, size_t) = 0;
    virtual bool do_is_equal(const memory_resource&) const = 0;
};

namespace pmr {
    using string = std::basic_string<char, std::char_traits<char>, 
                                     polymorphic_allocator<char>>;
}

Now the pedigreed dog knows its lineage and can eat from any memory_resource bowl it's given. True, it turned out a touch slower than the ordinary one and demands suspiciously many ceremonies before feeding — which dented its popularity a bit. By 2025 PMR is used mainly in specialized areas like games, embedded, and HFT, while for ordinary applications good old std::string is still good enough. pmr::string's readability is about the same, it just comes with a small zoo of derived allocators and templates, but now the bowls can be swapped on the fly and the dog doesn't object, as long as it survives the swap.

QString (Qt)

The parrot is a clever, talkative bird

As the software industry developed, strange and pronounced problems of using different software ecosystems started to appear, each of which promoted only its own, personal, one-true string implementation. To solve yet another problem of yet another "thirteenth standard", in 1991 the Norwegian Trolltech dared a feat and began developing the cross-platform framework Qt. The task was almost heroic — to make the same code work on Windows, Unix X11, and Mac. And each system had its own opinion about character encoding. Windows used UTF-16 via the wide-char API. Mac OS also preferred UTF-16, but through its own native Unicode APIs. Unix and Linux demonstrated full creative chaos — from ASCII and Latin1 to exotic locales remembered only at midnight. std::string didn't exist yet, and even C++98 looked like a mirage, so the appearance of yet another "fourteenth standard" universal string was a matter of time and patience.

When Qt 1.0 was being made, its creators Haavard Nord and Eirik Chambe-Eng decided to pick UTF-16 as QString's internal representation. Technically UTF-16 seemed the best compromise: most characters took just 2 bytes, including Chinese and Japanese ideographs, and only rare emoji and characters of historical scripts needed 4 bytes. So the parrot learned the most fashionable language of the time. On top of that it was taught an incredible number of words — toLower, split, startsWith, contains, replace, and many more. So the parrot turned out talkative and with an answer for every occasion; even normalization methods like toHtmlEscaped() let you safely send it to the browser.

In 1998 C++98 came out with its std::string, and then began the era of agonizing conversions back and forth — many had already tamed the QString parrot and built whole forests of birdhouses for it. Attempts to make friends with std::string led to creating a temporary QByteArray via toUtf8, which looked like the work of a translator translating a translation of another translation. The parrot proudly kept speaking its UTF-16 while the rest of the world rapidly moved to UTF-8, and this became real pain for millions of developers. They wrote in Qt but were forced to deal with std::string, boost, and a heap of other libs as dependencies, turning the whole process into a local branch of hell.

In 2025 QString lives in the gilded cage of its ecosystem, even though half of desktop software for Linux is written in Qt. This cage includes KDE, VLC, OBS Studio, Telegram Desktop, and many others. But it has to be noted that the parrot turned out beautiful, shiny, and superbly trained — only letting it loose means catastrophe. It's just unclear who faces doom: the parrot or the world around it. Many developers consider it a legacy of 1995 dragged into 2025 by inertia: Qt chose cross-platformness at any cost, and the cost turned out to be an exotic string that stubbornly won't be friends with the rest of the C++ world.

The code's readability is excellent, any C++ programmer easily recognizes it even if they've never seen Qt (though there are almost none of those left). Constructs like str.toLower or str.split can even be mistaken for Python from afar. All the magic, as usual in Qt, is hidden deep in PImpl and inside their meta-system, which rumor says can do everything. But every time I write another Qt application, I again run into external std::string-like libraries and recall the conversion hell like a bad dream.

NSString

A unique species in a private zoo

NSString was developed in the early 1990s, when the general idea of Unicode was only being born, but work on a universal character encoding had begun in the mid-eighties with the participation of major tech companies, including Apple and NeXT. Unicode 1.0, released in October 1991, was a sixteen-bit encoding, and NSString was based on this concept with the unichar type as the main unit — which became a problem later: the monkey was taught a language that was still evolving, and when the language changed, it was left with an archaic dialect. Although NSString is conceptually based on UTF-16, the internal implementation actually depends on the string's content.

NSString was conceived as an immutable object from the very start; at the time the decision was inspired by functional programming and Smalltalk practices. Such an immutable string is safe to pass between threads, to use as a key in dictionaries, for caching. If a mutable string was needed, NSMutableString was used — another breed of monkey from the same zoo, which created an interesting dichotomy: two strings with different behavior but a common ancestor; this is an even wilder monkey that could be trained and have its behavior changed.

One of the most magical features of the iMonkey was data binding into CFStringRef. NSString could be cast to CFStringRef and back without any conversions — it was the same object in memory. It was as if the monkey could turn into a dog from a parallel C universe without a single effort. But it was cool until questions of memory ownership came up: the object's existence in two universes was very precarious — you had to know exactly who owned the object and at what moment ownership of the data was transferred.

When Apple acquired NeXT, the entire NeXTSTEP stack, including NSString, became the basis for Mac OS X and later iOS. The exotic monkey from NeXT's closed zoo moved into Apple's even more closed zoo, and right up to 2014 NSString was ubiquitous. Every app on Mac and iPhone used it for UI, for file work, for network requests, for storing data, and the monkey population grew and overran the whole zoo, displacing other animals. But it was impossible to release it beyond the Apple ecosystem: Objective-C syntax, dependence on Cocoa frameworks, reference counting, and other constraints — all of it worked only inside Apple's walled garden. NSString was a product of its time, an elegant solution for an era when Smalltalk's dynamism seemed the future of development and UTF-16 was a reasonable choice that would be enough for everyone.

std::wstring

A mistake of evolution

In the early 1990s the software world had to think not only about English speakers but also about those who didn't want to learn English at all, preferring to see their native glyphs, ideographs, and little dots over letters. ASCII with its pitiful 128 characters was hopelessly small — neither Chinese, nor Japanese, nor Arabic, nor even the Latin alphabet in full glory fit. Unicode appeared as a great solution, and in its first version in 1991 it was a sixteen-bit encoding: all the world's characters were supposed to fit into 65536 positions. As the wise people of the time said, 640 KB ought to be enough for everyone.

When C++ was standardized in 1998, the committee included wchar_t for wide characters in the standard and std::wstring as a string of them. The idea was brilliant in its naivety: let each character be wider than char, and Unicode would fit without problems. At this moment evolution decided to experiment and created the platypus — a creature that looked like a good idea on paper but in practice turned out to be a biological paradox. The problem was that the standard didn't fix the size of wchar_t, only promised it was big enough to hold any Unicode character. In practice this meant its own interpretation on every platform.

Windows went its own way and chose 16 bits, because its API was based on UTF-16, but these were its own, unique 16 bits. Linux and Unix decided on 32 bits to be more universal, while macOS started moving away from wchar_t toward its own implementation altogether. The result was a platypus that laid 16-bit eggs in Australia, bore live 32-bit young in Europe, and left 24-bit clutches in certain zoos. The same code compiled on different platforms and worked completely differently. sizeof(wchar_t) returned 2 on Windows, 4 on Linux, and something on Mac, turning any attempts at binary compatibility into a utopian dream.

Attempts to convert between std::string and std::wstring on Windows turned into a multi-hour journey through documentation, studying std::wstring_convert and the magic of codecvt facets for conversions between encodings. The code looked fairly simple, but you could only use it within very narrow bounds, like a trained platypus that can swim, but only in the right puddle.

    // UTF-8 string
    std::string utf8 = u8"Привет, мир! 🌍";

    // Convert UTF-8 -> UTF-16 (std::wstring)
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
    std::wstring wide = converter.from_bytes(utf8);

    std::wcout << L"UTF-16: " << wide << std::endl;

    // Convert UTF-16 -> UTF-8
    std::string utf8_again = converter.to_bytes(wide);
    std::cout << "UTF-8: " << utf8_again << std::endl;

The main reason std::wstring exists was simple — the Windows API took this niche before anyone else and relatively easily dictated the terms of software development. All WinAPI functions for working with files, windows, the registry, and the network accepted WCHAR*, i.e. UTF-16 strings, and before std::filesystem appeared in C++17 the only way to open a file with a non-Latin name on Windows was to convert the path to std::wstring and use non-standard compiler extensions. The second reason was legacy: tons of old code were written in the era of saber-toothed wolves, and wchar_t seemed the future. Understandably, companies didn't want to rewrite millions of lines of code just for refactoring's sake, so std::wstring was dragged from one version of the standard to the next as a relic of the past.

The platypus would have gone extinct millions of years ago if not for the isolated ecosystem of Windstralia, where it had no competitors. Everyone wants to forget it and replace it with modern solutions, but the Windows API won't let it be fully abandoned, and legacy wchar_t code keeps living in corporate codebases like the fossil remains of the Mesozoic era of programming. The platypus is strange, baffling, nobody can explain why nature created it, but it keeps existing in its isolated ecosystem as a reminder of the industry's early years.

FrameString

Lives briefly, but brightly

Game engines have always faced the fundamental problem of performance and limited resources. A game running at 60 frames per second had a budget of just 16.6 milliseconds per frame, and in that microscopic span it had to process physics, AI, animations, rendering, and thousands of other tasks. If you look at malloc inside a standard std::string, it turns out to take hundreds of CPU cycles: you have to find a suitable block in the heap, possibly call the system allocator — and when you have hundreds and thousands of places working with strings, this becomes a problem.

Debug messages with player positions, UI labels with health and ammo counters, particle-effect names, formatted event logs, accessing object properties by name — all of these are strings that live exactly one frame. At the start of the update loop they're created, used, and by the end of render they're no longer needed; even more often they're used within a single block of code or function. But each of them calls malloc/free, turning memory management into a performance bottleneck.

To get rid of this overhead, developers make a compromise and trade memory for speed. The industry turned to an old idea from systems programming — linear allocators. The concept is simple: allocate one big block of memory at startup and hand out pieces from it by simply advancing a pointer. Allocation becomes two instructions — a check that there's enough room and an increment of the offset. No searches through free lists, no operating-system calls, and minimal overhead.

class ArenaAllocator {
    char buffer[10 * 1024 * 1024]; // 10 MB
    size_t offset = 0;

public:
    char* alloc(size_t n) {
        char* ptr = buffer + offset;
        offset += n;
        return ptr; // Instant!
    }

    void reset() { offset = 0; } // Reset in O(1)
};

Later many engines wrapped such allocators in convenient high-level types; for example, a String class that accepted a framemem_ptr was one of them. From the outside it looked like an ordinary string with append, format, operator+ methods, but under the hood all allocations went inside the frame.

String debugMsg(framemem_ptr);
debugMsg = "Player position: ";
debugMsg += to_string(x);
debugMsg += ", ";
debugMsg += to_string(y);
// At the end of the frame all this memory is freed automatically

But the beauty of ephemerality carries deadly danger: a mayfly lives only one day, and if you try to keep it for tomorrow, you'll be left holding a corpse. The main mistake with frame-memory strings was trying to keep a pointer into the next frame. The compiler can't prevent this, and static analysis won't see it either unless you write the right rules. Despite the dangers, such strings became standard in high-performance engines because speed is everything: their allocation takes a few CPU cycles, whereas standard malloc took several hundred — the choice is clearly not in favor of std::string and ordinary strings.

String(framemem_ptr) is a mayfly because its life is fleeting and beautiful: it's born at the start of the frame, lives a few milliseconds, and dies when the frame ends. Its existence is ephemeral, but there's elegance in that ephemerality, since you don't have to think about destructors, don't have to call free, don't have to fear memory leaks. The danger is that you can't catch the butterfly and keep it for tomorrow.

FString

FString will fetch everything you need — and what you don't need too

In the mid-1990s Tim Sweeney, founder of Epic Games, was captivated by the revolution Quake brought about; Epic took those ideas as a basis but in the end made not just a game but a full-fledged engine for making games. In 1998 Unreal came out — the first shooter on the first-generation Unreal Engine. It was the gold standard for its time: dynamic lighting, an advanced level editor, the UnrealScript scripting system. But technically Unreal Engine was optimized for one task — the first-person shooter. The whole architecture, all the networking, all the rendering were sharpened for the genre's corridors and arenas. And strings in the engine were utilitarian tools for that purpose — level names, weapon names, debug messages, network packets. Thus was born the ancestor of FString — a simple custom string that didn't depend on the C++ standard library. As usual, game engines control everything themselves (from allocators to string methods); only that way can you guarantee predictable behavior on all platforms. It was a golden retriever puppy taken not from the standard-library kennel but raised from scratch.

With each generation of Unreal Engine the string system grew more complex. In Unreal Engine 3 there appeared not one but three breeds of string: FString for mutable text, FName for efficient comparisons via a hash table, and FText for localized UI text. Everyone likes FString — a big, friendly dog that did whatever you asked: concatenate strings, formatting, split by a delimiter, ParseIntoArray, Find, Contains, StartsWith, EndsWith, and a heap more. But the retriever was big, significantly bigger than std::string, because of extra fields for engine integration. Each string carried with it metadata for the reflection system, allocator information, possibly thread-safety flags — that was the price of universality, and on first meeting the retriever may accidentally knock you off your feet with its enthusiasm. It fetches not only what you asked for but everything else too — the ball, the stick, the newspaper, its own bowl, the leash. FString drags the whole Unreal Engine infrastructure behind it, and that's both its strength and its weakness. If you're inside the engine, it's an advantage and everything works together seamlessly; if you try to integrate with third-party libs, it becomes a problem: you can't just take FString and use it in another project, it won't survive without its family.

The golden retriever has lived in the big house of Epic Games for thirty years now. It knows every corner, every family member, every command. New puppies may be more modern and efficient, but the retriever has something they don't — decades of experience in AAA games, millions of lines of proven code, an army of developers who know all its quirks. The retriever that helped create Gears of War, Fortnite, and other hits has definitely earned its place by the fireplace, even if it weighs too much.

StringAtom

The turtle is not only fast but also eternal

In 1960 John McCarthy created LISP, and one of the language's fundamental concepts was working with symbols as atomic identifiers that existed as unique entities. The LISP 1.5 version of 1962 described the intern function, which either returned an existing symbol with the given name or created a new one if none existed yet — and that was the moment string interning was born as a concept.

Symbols in LISP were immutable and unique by nature. If you write (foo) twice in code, both times you get a reference to the same symbol in memory. Comparing symbols in this case becomes very fast — we just compare pointers, because identical symbols were physically the same object. This was the Galapagos tortoise of programming: created slowly via a lookup in the symbol table, but then living forever and unique — it was impossible to create a duplicate.

McCarthy introduced the fundamental idea of immutable identifier objects: in programs, identifiers are compared thousands of times but created rarely. The names of variables, functions, classes — all of this is known at the time the code is written and barely changes at runtime, so why compare them character by character every time?

In parallel with LISP's development, compiler creators independently arrived at the same idea: the compiler must track all identifiers in the program, and the symbol table became a fundamental data structure described in the classic Dragon Book by Aho, Sethi, and Ullman.

By the 1990s the idea of string interning migrated into high-level languages too: Java, for instance, made it part of the language — all compile-time strings were automatically interned, and the programmer could explicitly call String.intern() for runtime strings. Scheme, Smalltalk, Julia continued the LISP tradition with symbols as a type. As game engines grew more complex, the problem of string identifiers came there too: different systems needed fast lookup of components by name — for example, shaders looked up parameters by string names, animations also switched between states by names. The solution was StringAtom (or StringID) — a wrapper over an interned string: when you created StringAtom("Health"), a lookup happened in a global hash table, and if "Health" was already there the existing ID or pointer was returned, and if not, a new entry was created and the string copied into eternal storage. After that all operations on a StringAtom were just comparing a pair of numbers.

StringAtom healthTag("Health");
StringAtom manaTag("Mana");

// Comparison is instant - just ID or pointer equality
if (component.tag == healthTag) {
    // ...
}

StringAtom's most dangerous feature was its lifetime — effectively eternal. Once created, an atom was never deleted; the global table of interned strings only grows. This was a deliberate decision: tracking when an atom is no longer used would be tens of times more expensive than just keeping it in memory forever. This works great if a reasonable number of unique atoms is created. Tags in a game — hundreds of unique names, shader parameters — thousands, unique strings — a few thousand more. All of it fits into kilobytes of memory and lives until the end of the program; the Galapagos-tortoise population is stable — they live long, breed slowly, and the ecosystem copes.

But if a programmer mistakenly creates a StringAtom from user input or unique names in a loop, the population quickly swells to gigabytes, and everyone dies: even though the tortoises are immortal, if their population grows uncontrollably the island overflows and the ecosystem collapses.

xstring/cow

This is a cow

In the mid-80s memory was an expensive resource — it was good if computers had megabytes of memory. If memory becomes an expensive resource, it's logical to look for ways to optimize everyday operations with it, and one idea seemed obvious — storing unique strings and pointers to them separately. The Copy-On-Write concept came from operating systems, where processes shared memory pages until the moment of modification: the OS didn't copy the parent process's entire address space at once but only marked pages as shared and copied them on demand when someone tried to write. This saved an enormous amount of resources and memory, so why not apply the same idea to strings?

Thus was born the concept of xstring, or COW string — strings where several variables point to the same data, like rabbits living in a shared burrow. One big buffer with string data and many owners sharing its use. A reference count tracks how many rabbits live in the burrow, and when we make a copy of a string we just increment the count instead of allocating and copying the whole buffer. Rabbits are thrifty — why should each dig its own tunnel if they can live together?

By the early 2000s multicore processors exposed the problems of COW strings. Reference counting requires atomic operations: every copy of a string is an atomic increment, every destruction an atomic decrement. Given that atomic operations are orders of magnitude more expensive than ordinary increments, this leads to certain problems when working with such strings. It gets especially bad in multithreaded code, where strings are often passed between threads — here atomic reference counting becomes a hot spot and a problem rather than a solution. The rabbits get confused when too many of them run around the burrow at once: they collide in the narrow tunnels and get in each other's way.

Readability is ordinary — the same methods as std::string. Usage backfires through unexpected patterns, when a modification triggers an expensive copy of all the data — false or erroneous data copying. The rabbit is peaceful as long as there's enough room and quiet in the burrow, but the moment you start moving it, it quickly jumps aside and digs new lodging.

StringID

At the same time, in the early 2000s, game engines hit another fundamental problem — identifying unique data: the explosive growth in the number of resources used (textures, models, sounds, animations) meant that existing storage systems based on ordinary strings couldn't cope. On top of that, the growth of scripting and modding added its own barrel of tar to the spoonful of honey, pouring in non-unique shared strings (when two mods had identical strings but couldn't share them), logic, names, systems. Everywhere there were string comparisons, and everywhere they were a performance bottleneck.

StringAtom partly solved this problem through interning, but required a table lookup and memory to store all unique strings forever, while std::string compared slowly via strcmp, so something radically different was needed — a way to turn a string into a number at compile time, but with the ability to do the same at runtime too, leaving only the number. In 2007 an article appeared that formalized an approach still used in most AAA studios. The idea was brilliantly simple: hash the string via CRC32 to get a uint32_t and use that number as a unique ID. The string "PlayerHealth" turned into 0x8F4A23B1, and the compiler didn't include the original string in the executable at all — the cheetah started running, and running very fast.

constexpr uint32_t operator""_sid(const char* str, size_t len) {
    return FNV1a(str);
}

constexpr uint32_t damageEvent = "DamageEvent"_sid;

Now strings are numbers, but the programmer still sees text in the code, understands what's going on, while the compiler generates only a number. And it was the perfect balance: the code stays clear to humans but optimal for the machine. One of StringID's killer features was the ability to use string identifiers in switch statements. Normally C++ doesn't allow switch on strings because they can't be represented as a number, but string hashes are compile-time constants, so such code will work.

switch(messageType) {
    case "PlayerDied"_sid:
        handlePlayerDeath();
        break;
    case "EnemySpawned"_sid:
        handleEnemySpawn();
        break;
}

StringID found its niche everywhere fast string identifiers known at compile time were needed — for example, in Event IDs, resource names, shader components and variables; reflection systems use StringID instead of full strings. Everywhere a string was an identifier rather than data, the cheetah turns out more effective than any other animal in the savanna, but the cheetah can only run fast and nothing else.

StringID takes its place in the string ecosystem as a specialized form for identifiers — not for text processing, not for user input or data storage. Only for fast work with known names that are compared often and must be as simple as possible.

std::string().append()

We won't get away from std::string as the standard. It's built into everything around us, and ignoring it completely is simply impossible — most of the C++ ecosystem expects std::string one way or another. In any large codebase it will still seep into interfaces, and fighting it is pointless. Yes, it's not perfect, but its strength is that it provides compatibility, predictable behavior, and minimal cognitive load for everyone who reads and writes the code.

But fixating only on std::string, as if it were the only possible representation of a string, is harmful too. Inside an engine, on hot call stacks sensitive to allocations, you can and should use other solutions: small-string optimizations, arena allocators, string_view, your own string buffers, or even fixed arrays. If you know exactly where alternatives are needed, you can get a huge win in performance and code manageability — without breaking integration with the standard components.

Special thanks to my friend Sasha Vasiliev for the illustrations provided and the artwork.

If you're curious to hear about the implementation of xstring in game engines, come to the webinar — there'll be no less fascinating talks from Andrey Karpov and Denis Yaroshevsky.

And drop by my course "Programming without the boredom" — let's try to bring a little magic back to C++ development. Promo code as usual HABR50.

← All articles