String

High-level UTF-8 string handling utilities.

Overview

Typical use cases:

  • Handling UTF-8 encoded strings and characters

Header

<RaeptorCogs/IO/String.hpp>

Metadata

Author

Estorc

Version

v1.0

Copyright

Copyright (c) 2025 Estorc MIT License.

Classes

Classes

Class

Description

RaeptorCogs::U8Char

UTF-8 Character class.

RaeptorCogs::U8CharIterator

UTF-8 Character Iterator class. Allows iteration over UTF-8 characters in a string.

RaeptorCogs::U8String

UTF-8 String class.

class U8Char

UTF-8 Character class.

Represents a single UTF-8 encoded character and provides utilities for handling it.

RaeptorCogs::U8Char ch = RaeptorCogs::U8Char("é");
uint32_t codepoint = ch.codepoint(); // U+00E9

Public Functions

inline U8Char(std::string_view v)

Construct a U8Char from a string view.

Parameters:

v – A string view representing the UTF-8 character.

inline U8Char(const std::string &s)

Construct a U8Char from a standard string.

Parameters:

s – A standard string representing the UTF-8 character.

inline U8Char(const char *s, size_t len)

Construct a U8Char from a C-style string and length.

Parameters:
  • s – A pointer to the C-style string.

  • len – The length of the UTF-8 character in bytes.

inline U8Char(const char *s)

Construct a U8Char from a C-style string.

Parameters:

s – A pointer to the C-style string.

inline U8Char()

Default constructor for U8Char. Creates an empty U8Char.

inline U8Char(nullptr_t)

Constructor for null U8Char. Creates an empty U8Char.

inline operator std::string_view() const noexcept

Conversion operator to std::string_view. Allows implicit conversion to string view.

inline bool operator==(std::string_view other) const noexcept
inline size_t size() const noexcept

Get the size of the UTF-8 character in bytes.

Returns:

The size of the UTF-8 character.

inline uint32_t codepoint() const noexcept

Get the Unicode codepoint of the UTF-8 character.

Returns:

The Unicode codepoint as a uint32_t.

inline std::string_view view() const noexcept

Get the underlying string view of the UTF-8 character.

Returns:

The string view representing the UTF-8 character.

Private Members

std::string_view view_

Underlying string view representing the UTF-8 character.

Stores the bytes of the UTF-8 character.

Note

The size of the view can be between 1 and 4 bytes depending on the character.

Private Static Functions

static inline size_t utf8_len(unsigned char c)

Determine the length of a UTF-8 character based on its first byte.

Parameters:

c – The first byte of the UTF-8 character.

Returns:

The length of the UTF-8 character in bytes (1 to 4).

Friends

friend class U8CharIterator
friend class U8String
class U8CharIterator

UTF-8 Character Iterator class. Allows iteration over UTF-8 characters in a string.

RaeptorCogs::U8String u8str("Hello, 世界");
for (const auto& ch : u8str) {
    std::cout << "Character: " << std::string(ch.view()) << ", Codepoint: U+" << std::hex << ch.codepoint() << std::dec << std::endl;
}

Public Types

using iterator_category = std::forward_iterator_tag

UTF-8 Character Iterator traits.

using value_type = U8Char

The type of elements pointed to by the iterator.

using difference_type = std::ptrdiff_t

Difference type for iterator arithmetic.

using pointer = void

Pointer type for the iterator.

using reference = U8Char

Reference type for the iterator.

Public Functions

inline U8CharIterator(const std::string &s, size_t idx)

Constructor for U8CharIterator.

Note

Used internally by U8String.

Parameters:
  • s – The string to iterate over.

  • idx – The starting index for the iterator.

inline reference operator*() const

Dereference operator.

Returns:

The current UTF-8 character.

inline U8CharIterator &operator++()

Pre-increment operator.

Returns:

Reference to the incremented iterator.

inline U8CharIterator operator++(int)

Post-increment operator.

Returns:

A copy of the iterator before incrementing.

inline U8CharIterator operator+(int n) const

Addition operator.

Parameters:

n – Number of characters to advance.

Returns:

A new iterator advanced by n characters.

inline bool operator==(const U8CharIterator &o) const

Equality operator.

Parameters:

o – The other iterator to compare with.

Returns:

true if both iterators are equal, false otherwise.

inline bool operator!=(const U8CharIterator &o) const

Inequality operator.

Parameters:

o – The other iterator to compare with.

Returns:

true if both iterators are not equal, false otherwise.

Private Members

const std::string *data_

Pointer to the underlying string.

size_t index_

Current index in the string.

class U8String

UTF-8 String class.

Represents a UTF-8 encoded string and provides utilities for handling it.

RaeptorCogs::U8String u8str("Hello, 世界");
size_t length = u8str.size(); // Number of UTF-8 characters
U8Char ch = u8str[7]; // '世' character

Public Functions

U8String() = default

Default constructor for U8String.

~U8String() = default

Destructor for U8String.

inline U8String(const U8String &other)

Copy constructor for U8String.

Parameters:

other – The U8String to copy from.

inline U8String(const std::string &other)

Constructor from standard string.

Parameters:

other – The standard string to construct from.

inline U8String(std::string &&other)

Move constructor from standard string.

Parameters:

other – The standard string to move from.

inline U8String(const char *s)

Constructor from C-style string.

Parameters:

s – A pointer to the C-style string.

inline U8String(U8Char *ptr)

Constructor from U8Char.

Parameters:

ptr – A pointer to the U8Char.

inline void operator=(const U8String &other)

Assignment operator for U8String.

Parameters:

other – The U8String to assign from.

inline bool operator==(const U8String &other) const

Equality operator.

Parameters:

other – The U8String to compare with.

Returns:

true if both U8Strings are equal, false otherwise.

inline const char *c_str() const

Conversion operator to C-style string.

Returns:

Pointer to the C-style string.

inline U8Char at(size_t index) const

Get the first UTF-8 character of the string.

Parameters:

index – The index of the UTF-8 character to access.

Returns:

The U8Char at the specified index.

inline U8Char operator[](size_t index) const

Subscript operator for U8String.

Parameters:

index – The index of the UTF-8 character to access.

Returns:

The U8Char at the specified index.

inline size_t size() const

Get the number of UTF-8 characters in the string.

Note

This counts the actual UTF-8 characters, not bytes.

Returns:

The number of UTF-8 characters in the string.

inline auto begin() const

Begin iterator for U8String.

Returns:

An iterator to the beginning of the UTF-8 characters.

inline auto end() const

End iterator for U8String.

Returns:

An iterator to the end of the UTF-8 characters.

inline auto begin()

Begin iterator for U8String.

Returns:

An iterator to the beginning of the UTF-8 characters.

inline auto end()

End iterator for U8String.

Returns:

An iterator to the end of the UTF-8 characters.

Private Members

std::string data

Underlying standard string storing the UTF-8 data.

Stores the complete UTF-8 encoded string.

Note

The string is managed internally and should not be modified directly.