Expand description
Basenb encoding/decoding: pack binary data into Unicode PUA.
Basenb is a way of encoding arbitrary binary data into a compact string representation that can be embedded in Unicode text, as runs of Unicode private-use characters.
It’s a modified version of Base16b that additionally encodes a “remainder” length in a trailing character, which seems to be needed to reliably round-trip values.
Actually using it requires some sort of protocol for when to switch between basenb and regular PUA characters. dcBasenb addresses that by encoding UUIDs and using them as in-band start/end markers. (Round-tripping a UTF-8 file that included dcBasenb UUIDs, for instance in reference to them, would probably only be possible by encoding the UUIDs as UTF-8 encapsulated within the Dcs, and being careful with the encode/decode settings.)
dcBasenb is a way of encoding arbitrary Dcs into runs of Unicode private-use characters; see dcbasenb.rs
Constants§
- ARMORED_
BASE17B_ UTF8_ END_ UUID_ BYTES - FIXME UNIMPLEMENTED
- ARMORED_
BASE17B_ UTF8_ START_ UUID_ BYTES - FIXME UNIMPLEMENTED
- BYTE_
ARRAY_ FROM_ BASENB_ UTF8_ INVALID_ INPUT_ EXCEPTION_ BYTES - Sentinel UUID returned by the legacy JS implementation to indicate an invalid basenb UTF-8 decode input (only remainder char present or incomplete data). UUID: 3362daa3-1705-40ec-9a97-59d052fd4037
Functions§
- byte_
array_ from_ armored_ base17b_ utf8 - Decode an armored Base17b UTF-8 run to a byte array. FIXME untested!
- byte_
array_ from_ basenb_ 17_ utf8 - Convenience wrapper (decode Basenb 17 UTF-8 representation).
- byte_
array_ from_ basenb_ utf8 - Decode a Basenb UTF-8 byte sequence into the original byte array.
- byte_
array_ to_ armored_ base17b_ utf8 - Produce an “armored” Base17b UTF-8 run, encoding arbitrary binary data:
armored =
start_uuid||base17b_encode(bytes)||end_uuid - byte_
array_ to_ basenb_ 17_ utf8 - Convenience wrapper (encode bytes to Basenb 17 UTF-8 representation).
- byte_
array_ to_ basenb_ no_ remainder_ marker - byte_
array_ to_ basenb_ utf8 - Encode a raw byte array into Basenb (UTF-8 sequence of pack32 codepoints).
- int_
bit_ array_ from_ basenb_ string - internalIntBitArrayFromBasenbString(byteArrayInput, intRemainder)
JS passes a
Uint8Arrayof UTF-8 bytes and an int remainder - int_
bit_ array_ to_ basenb_ no_ remainder_ marker - internalIntBitArrayToBasenbString(intBase, bytes) Returns UTF-8 bytes of the encoded string (mirroring JS returning a byte array).
- is_
basenb_ base - Is the provided base valid for Basenb? (Original: 7 through 17 inclusive.)
- is_
basenb_ char - True if the pack32 character represents a Basenb character codepoint.
- is_
basenb_ distinct_ remainder_ char - True if the pack32 character is one of the distinct remainder markers (63481..=63497).