String¶
High-level UTF-8 string handling utilities.
Overview¶
Typical use cases:
Handling UTF-8 encoded strings and characters
Header¶
<RaeptorCogs/IO/String.hpp>
Metadata¶
- Author
Estorc
- Version
v1.0
- Copyright
Copyright (c) 2025 Estorc MIT License.
Classes¶
Class |
Description |
|---|---|
|
UTF-8 Character class. |
|
UTF-8 Character Iterator class. Allows iteration over UTF-8 characters in a string. |
|
UTF-8 String class. |
-
class U8Char¶
UTF-8 Character class.
Represents a single UTF-8 encoded character and provides utilities for handling it.
RaeptorCogs::U8Char ch = RaeptorCogs::U8Char("é"); uint32_t codepoint = ch.codepoint(); // U+00E9
Public Functions
-
inline U8Char(std::string_view v)¶
Construct a U8Char from a string view.
- Parameters:
v – A string view representing the UTF-8 character.
-
inline U8Char(const std::string &s)¶
Construct a U8Char from a standard string.
- Parameters:
s – A standard string representing the UTF-8 character.
-
inline U8Char(const char *s, size_t len)¶
Construct a U8Char from a C-style string and length.
- Parameters:
s – A pointer to the C-style string.
len – The length of the UTF-8 character in bytes.
-
inline U8Char(const char *s)¶
Construct a U8Char from a C-style string.
- Parameters:
s – A pointer to the C-style string.
-
inline operator std::string_view() const noexcept¶
Conversion operator to std::string_view. Allows implicit conversion to string view.
-
inline bool operator==(std::string_view other) const noexcept¶
-
inline size_t size() const noexcept¶
Get the size of the UTF-8 character in bytes.
- Returns:
The size of the UTF-8 character.
-
inline uint32_t codepoint() const noexcept¶
Get the Unicode codepoint of the UTF-8 character.
- Returns:
The Unicode codepoint as a uint32_t.
-
inline std::string_view view() const noexcept¶
Get the underlying string view of the UTF-8 character.
- Returns:
The string view representing the UTF-8 character.
Private Members
-
std::string_view view_¶
Underlying string view representing the UTF-8 character.
Stores the bytes of the UTF-8 character.
Note
The size of the view can be between 1 and 4 bytes depending on the character.
Private Static Functions
Friends
- friend class U8CharIterator
- friend class U8String
-
inline U8Char(std::string_view v)¶
-
class U8CharIterator¶
UTF-8 Character Iterator class. Allows iteration over UTF-8 characters in a string.
RaeptorCogs::U8String u8str("Hello, 世界"); for (const auto& ch : u8str) { std::cout << "Character: " << std::string(ch.view()) << ", Codepoint: U+" << std::hex << ch.codepoint() << std::dec << std::endl; }
Public Types
-
using iterator_category = std::forward_iterator_tag¶
UTF-8 Character Iterator traits.
-
using pointer = void¶
Pointer type for the iterator.
Public Functions
-
inline U8CharIterator(const std::string &s, size_t idx)¶
Constructor for U8CharIterator.
Note
Used internally by U8String.
- Parameters:
s – The string to iterate over.
idx – The starting index for the iterator.
-
inline U8CharIterator &operator++()¶
Pre-increment operator.
- Returns:
Reference to the incremented iterator.
-
inline U8CharIterator operator++(int)¶
Post-increment operator.
- Returns:
A copy of the iterator before incrementing.
-
inline U8CharIterator operator+(int n) const¶
Addition operator.
- Parameters:
n – Number of characters to advance.
- Returns:
A new iterator advanced by n characters.
-
inline bool operator==(const U8CharIterator &o) const¶
Equality operator.
- Parameters:
o – The other iterator to compare with.
- Returns:
true if both iterators are equal, false otherwise.
-
inline bool operator!=(const U8CharIterator &o) const¶
Inequality operator.
- Parameters:
o – The other iterator to compare with.
- Returns:
true if both iterators are not equal, false otherwise.
-
using iterator_category = std::forward_iterator_tag¶
-
class U8String¶
UTF-8 String class.
Represents a UTF-8 encoded string and provides utilities for handling it.
RaeptorCogs::U8String u8str("Hello, 世界"); size_t length = u8str.size(); // Number of UTF-8 characters U8Char ch = u8str[7]; // '世' character
Public Functions
-
inline U8String(const U8String &other)¶
Copy constructor for U8String.
- Parameters:
other – The U8String to copy from.
-
inline U8String(const std::string &other)¶
Constructor from standard string.
- Parameters:
other – The standard string to construct from.
-
inline U8String(std::string &&other)¶
Move constructor from standard string.
- Parameters:
other – The standard string to move from.
-
inline U8String(const char *s)¶
Constructor from C-style string.
- Parameters:
s – A pointer to the C-style string.
-
inline void operator=(const U8String &other)¶
Assignment operator for U8String.
- Parameters:
other – The U8String to assign from.
-
inline bool operator==(const U8String &other) const¶
Equality operator.
- Parameters:
other – The U8String to compare with.
- Returns:
true if both U8Strings are equal, false otherwise.
-
inline const char *c_str() const¶
Conversion operator to C-style string.
- Returns:
Pointer to the C-style string.
-
inline U8Char at(size_t index) const¶
Get the first UTF-8 character of the string.
- Parameters:
index – The index of the UTF-8 character to access.
- Returns:
The U8Char at the specified index.
-
inline U8Char operator[](size_t index) const¶
Subscript operator for U8String.
- Parameters:
index – The index of the UTF-8 character to access.
- Returns:
The U8Char at the specified index.
-
inline size_t size() const¶
Get the number of UTF-8 characters in the string.
Note
This counts the actual UTF-8 characters, not bytes.
- Returns:
The number of UTF-8 characters in the string.
-
inline auto begin() const¶
Begin iterator for U8String.
- Returns:
An iterator to the beginning of the UTF-8 characters.
-
inline auto end() const¶
End iterator for U8String.
- Returns:
An iterator to the end of the UTF-8 characters.
-
inline U8String(const U8String &other)¶