Module basenb

Source
Expand description

Basenb encoding/decoding: pack binary data into Unicode PUA.

Basenb is a way of encoding arbitrary binary data into a compact string representation that can be embedded in Unicode text, as runs of Unicode private-use characters.

It’s a modified version of Base16b that additionally encodes a “remainder” length in a trailing character, which seems to be needed to reliably round-trip values.

Actually using it requires some sort of protocol for when to switch between basenb and regular PUA characters. dcBasenb addresses that by encoding UUIDs and using them as in-band start/end markers. (Round-tripping a UTF-8 file that included dcBasenb UUIDs, for instance in reference to them, would probably only be possible by encoding the UUIDs as UTF-8 encapsulated within the Dcs, and being careful with the encode/decode settings.)

dcBasenb is a way of encoding arbitrary Dcs into runs of Unicode private-use characters; see dcbasenb.rs

Constants§

ARMORED_BASE17B_UTF8_END_UUID_BYTES
FIXME UNIMPLEMENTED
ARMORED_BASE17B_UTF8_START_UUID_BYTES
FIXME UNIMPLEMENTED
BYTE_ARRAY_FROM_BASENB_UTF8_INVALID_INPUT_EXCEPTION_BYTES
Sentinel UUID returned by the legacy JS implementation to indicate an invalid basenb UTF-8 decode input (only remainder char present or incomplete data). UUID: 3362daa3-1705-40ec-9a97-59d052fd4037

Functions§

byte_array_from_armored_base17b_utf8
Decode an armored Base17b UTF-8 run to a byte array. FIXME untested!
byte_array_from_basenb_17_utf8
Convenience wrapper (decode Basenb 17 UTF-8 representation).
byte_array_from_basenb_utf8
Decode a Basenb UTF-8 byte sequence into the original byte array.
byte_array_to_armored_base17b_utf8
Produce an “armored” Base17b UTF-8 run, encoding arbitrary binary data: armored = start_uuid || base17b_encode(bytes) || end_uuid
byte_array_to_basenb_17_utf8
Convenience wrapper (encode bytes to Basenb 17 UTF-8 representation).
byte_array_to_basenb_no_remainder_marker
byte_array_to_basenb_utf8
Encode a raw byte array into Basenb (UTF-8 sequence of pack32 codepoints).
int_bit_array_from_basenb_string
internalIntBitArrayFromBasenbString(byteArrayInput, intRemainder) JS passes a Uint8Array of UTF-8 bytes and an int remainder
int_bit_array_to_basenb_no_remainder_marker
internalIntBitArrayToBasenbString(intBase, bytes) Returns UTF-8 bytes of the encoded string (mirroring JS returning a byte array).
is_basenb_base
Is the provided base valid for Basenb? (Original: 7 through 17 inclusive.)
is_basenb_char
True if the pack32 character represents a Basenb character codepoint.
is_basenb_distinct_remainder_char
True if the pack32 character is one of the distinct remainder markers (63481..=63497).