ctoolbox/formats/
eite.rs

1//! Implementation of EITE formats (Rust translation – high‑level export / import wrappers).
2//!
3//! This module exposes a small public convenience surface for converting between
4//! “Dc array” document representations (`Vec<u32>`) and external serialized formats
5//! (bytes) plus a transformation hook.  The heavy‑lifting implementations live in
6//! sub‑modules.
7//!
8//! As EITE did not have all intended elements fully implemented, this Rust
9//! version similarly has some limitations.
10//!
11//! Primary entry points provided here:
12//! - [`export_document`]:      Dc array -> bytes in a named format.
13//! - [`import_document`]:      Bytes in a named format -> Dc array.
14//! - [`import_and_export`]:    Bytes in one format -> bytes in another (single convenience hop).
15//! - [`transform_document`]:   Apply an in‑memory, pure Dc→Dc transformation.
16//!
17//! These are intentionally thin wrappers delegating to lower‑level functions so that
18//! callers have a stable, ergonomic API while internal modules evolve.
19//!
20//! # Errors
21//! All fallible operations return `anyhow::Result<_>`.  Under the hood some modules
22//! may surface structured errors (e.g. an internal `EiteError` enum); those are
23//! converted into `anyhow::Error` at this boundary.  Panics are avoided unless a
24//! genuine invariant violation occurs (logic error / programmer bug).
25//!
26//! # Concurrency
27//! The underlying state object (`EiteState`) is passed in mutably for import / export
28//! because format conversion may accumulate warnings, cache dataset lookups, or
29//! modify configuration state required for subsequent operations.  Pure transformations
30//! only need an immutable reference (and could, if required later, be made fully
31//! independent of state when transformation semantics are guaranteed to be stateless).
32//!
33//! # Testing
34//! This file includes light “sanity” unit tests that verify:
35//! - Wrapper identity / invariants (e.g. `import_and_export` equivalence with a
36//!   separate `import_document` + `export_document` sequence).
37//! - Transformation pass‑through for currently identity‑like transformations
38//!   (`semanticToText`, `codeToText`).
39//! - Basic error expectations (unknown transformation).
40//!
41//! For deeper behavioral validation (round‑tripping every supported format,
42//! exhaustive edge cases, dataset driven conversions, escaping, etc.) see the
43//! dedicated tests housed alongside the lower‑level modules.
44//!
45//! # Performance
46//! These wrappers add effectively zero overhead beyond an extra function call.
47//! When profiling, attribute time spent within these wrappers to the delegated
48//! modules (`formats`, `encoding`, `transform`, etc.).
49//!
50//
51// --------------------------------------------------------------------------------
52// Transpilation & Follow‑up Notes
53// --------------------------------------------------------------------------------
54// Source provenance:
55// This file forms part of an incremental port of a legacy JavaScript/StageL
56// implementation of EITE to Rust.
57//
58// Translation policies applied globally across the port (including code not shown here):
59// - Type prefixes from the JS code (`intX`, `strY`, `boolZ`, `byteArr`, etc.) are removed;
60//   Rust’s static typing guarantees the original invariants.
61// - Runtime assertion helpers (`assertInt`, `assertStr`, `assertIntArray`, etc.) are
62//   eliminated; semantic validation is retained only where needed (e.g. validating
63//   bit arrays or settings string shape).
64// - Trivial arithmetic / boolean wrappers (`Add`, `Sub`, `Mod`, `Eq`, `Gt`, `Lt`, `Not`,
65//   `And`, `inc`, `dec`, etc.) are expressed directly with native operators.
66// - Logging / failure helpers (`Debug`, `Warn`, `Die`) are mapped to Rust tracing /
67//  warning accumulation or error returns (TBD: finalize whether former `Die()`
68//  sites should panic or return `Result::Err`; currently leaning toward `Result`).
69// - Environment detection (browser vs worker vs Node) is removed. Asset loading
70//   is routed through a single abstraction (e.g. `crate::storage::get_asset(...)`).
71// - Base16b bridging delegates directly to `crate::formats::base16b::Base16b`.
72// - Column indices for Dc datasets (script / bidi class / type / name / combining
73//   class / casing / complex traits / description) are preserved as symbolic
74//   constants until a unified, possibly zero‑based indexing scheme is finalized.
75//
76// Remaining follow‑up actions
77// 1. Confirm dataset column indexing? (choose 0‑based internally? adapt loader
78//    if original JS relied on 1‑based constants).
79// 2. Expand test coverage: every legacy JS test case mirrored in Rust, including
80//    edge cases for settings parsing, Basenb, UTF‑8 variant conversions, HTML
81//    fragment generation, and transformation pipelines.
82// 3. Consider refactoring legacy while‑loop “boolContinue” idioms into idiomatic
83//    iterator chains once correctness is locked in (retain readable imperative
84//    fallbacks in tricky hot paths).
85// 4. Reassess any placeholder heuristics (script / bidi / type lookups) once the
86//    definitive dataset schema is confirmed.
87//
88// Architectural notes:
89// - Conversion pipeline: `import_document` parses bytes -> intermediate Dc array;
90//   `export_document` serializes Dc array -> bytes. `import_and_export` short‑circuits
91//   by combining these with a single intermediate vector (no additional data copies
92//   beyond what the lower layers perform).
93// - Transformations: `transform_document` intentionally does not expose state mutation;
94//   if future transformations require contextual settings or side‑effects, widen the
95//   signature or introduce a transformation context object.
96// - Performance critical sections (e.g. pack/unpack bit manipulation, UTF‑8
97//   conversions) are implemented in lower modules; this wrapper layer stays
98//   allocation‑minimal.
99//
100// Testing / Verification strategy:
101// - Unit tests inside each functional sub‑module provide deterministic correctness.
102// - Integration tests (future) will orchestrate full import->transform->export cycles
103//   across all supported formats ensuring round‑trip invariants and preservation of
104//   semantic properties (printability, newline normalization, variant fidelity).
105
106// Sub‑modules (actual implementations live there).
107use crate::formats::{
108    FormatLog,
109    eite::{
110        eite_state::EiteState,
111        formats::{
112            Format, PrefilterSettings, convert_formats, dca_from_format,
113            dca_to_format,
114        },
115        transform::{DocumentTransformation, apply_document_transformation},
116    },
117};
118
119use anyhow::Result;
120
121pub mod dc;
122pub mod eite_state;
123pub mod encoding;
124pub mod exceptions;
125pub mod formats;
126pub mod kv;
127pub mod runtime;
128pub mod settings;
129pub mod terminal;
130pub mod transform;
131pub mod util;
132
133/// Export a document (Dc array) into a named external format.
134///
135/// Thin wrapper over `formats::dca_to_format`.
136///
137/// Arguments:
138/// - `state`: Mutable Eite runtime state (collects warnings, caches, settings).
139/// - `out_format`: Target format identifier (case / spelling must match a registered format).
140/// - `dc_array`: The in‑memory Dc array representation (sequence of codepoint / symbol IDs).
141///
142/// Returns serialized bytes on success.
143///
144/// Errors:
145/// - Propagates any format‑specific encoding or validation errors.
146/// - Unknown `out_format` will typically surface as an `Err`.
147pub fn export_document(
148    state: &mut EiteState,
149    out_format: &Format,
150    dc_array: &[u32],
151    prefilter_settings: &PrefilterSettings,
152) -> Result<(Vec<u8>, FormatLog)> {
153    // No preconditions beyond those enforced by the lower layer.
154    dca_to_format(state, out_format, dc_array, prefilter_settings)
155}
156
157/// Import a document from a named external format into a Dc array.
158///
159/// Thin wrapper over `formats::dca_from_format`.
160///
161/// Arguments:
162/// - `state`: Mutable Eite runtime state.
163/// - `in_format`: Source format identifier.
164/// - `content_bytes`: Raw serialized payload.
165///
166/// Returns a newly allocated Dc array (`Vec<u32>`).
167///
168/// Errors:
169/// - Unknown or unsupported `in_format`.
170/// - Malformed content (syntax / structural violations per format rules).
171pub fn import_document(
172    state: &mut EiteState,
173    in_format: &Format,
174    content_bytes: &[u8],
175) -> Result<(Vec<u32>, FormatLog)> {
176    dca_from_format(state, in_format, content_bytes)
177}
178
179/// Convenience: import from one format and export to another in a single step.
180///
181/// Internally delegates to `formats::convert_formats` to allow any shared
182/// optimizations (e.g. streaming pipelines in the future).
183///
184/// Arguments:
185/// - `state`: Mutable runtime state.
186/// - `in_format`: Source format identifier.
187/// - `out_format`: Target format identifier.
188/// - `content_bytes`: Raw serialized input in `in_format`.
189///
190/// Returns serialized bytes in `out_format`.
191///
192/// Errors:
193/// - Any error from `import_document` or `export_document`.
194pub fn import_and_export(
195    state: &mut EiteState,
196    in_format: &Format,
197    out_format: &Format,
198    content_bytes: &[u8],
199    prefilter_settings: &PrefilterSettings,
200) -> Result<(Vec<u8>, FormatLog)> {
201    convert_formats(
202        state,
203        in_format,
204        out_format,
205        content_bytes,
206        prefilter_settings,
207    )
208}
209
210/// Apply an in‑memory document transformation (Dc array -> Dc array).
211///
212/// Delegates to `transform::apply_document_transformation`.
213///
214/// Arguments:
215/// - `_state`: (Reserved) not currently required by the known transformations; kept
216///   for forward compatibility if future transforms become stateful.
217/// - `dc_array`: Source Dc array.
218/// - `transformation`: Transformation identifier (e.g. "semanticToText").
219///
220/// Returns the transformed Dc array (new vector).
221///
222/// Errors:
223/// - Unknown transformation name.
224/// - Transformation specific errors (if any future transform is fallible).
225pub fn transform_document(
226    dc_array: &[u32],
227    transformation: &DocumentTransformation,
228) -> Result<Vec<u32>> {
229    apply_document_transformation(transformation, dc_array)
230}
231
232#[cfg(test)]
233mod tests {
234    use crate::utilities::{assert_vec_u8_eq, assert_vec_u32_eq};
235
236    use super::*;
237
238    // Helper: construct a state. Assumes EiteState implements Default.
239    fn new_state() -> EiteState
240    where
241        EiteState: Default,
242    {
243        EiteState::default()
244    }
245
246    #[crate::ctb_test]
247    fn test_transform_document() {
248        // These filters aren't yet implemented, so they should have no effect
249        let dc_array = vec![1, 2, 3];
250        let sem = transform_document(
251            &dc_array,
252            &&DocumentTransformation::semantic_to_text_default(),
253        )
254        .unwrap();
255        assert_vec_u32_eq(&dc_array, &sem);
256        let code = transform_document(
257            &dc_array,
258            &&DocumentTransformation::code_to_text_default(),
259        )
260        .unwrap();
261        assert_vec_u32_eq(&dc_array, &code);
262    }
263
264    // --- import / export wrappers: sanity & equivalence ---
265
266    // NOTE: This does not assert specific Dc values (which are format dependent),
267    // only that the convert_formats wrapper matches manual composition when both succeed.
268    #[crate::ctb_test]
269    fn test_import_and_export_work() {
270        let input_bytes = b"Hello, EITE!";
271
272        // Path 1: separate import then export (ascii -> Dc -> ascii).
273        let (dc, import_log) =
274            import_document(&mut EiteState::new(), &Format::ASCII, input_bytes)
275                .expect("import failed");
276        assert!(!import_log.has_warnings());
277        let (exported, export_log) = export_document(
278            &mut EiteState::new(),
279            &Format::ASCII,
280            &dc,
281            &PrefilterSettings::default(),
282        )
283        .expect("export failed");
284        assert!(!import_log.has_warnings());
285
286        // Path 2: direct convenience hop (ascii -> utf8).
287        let (via_wrapper, roundtrip_log) = import_and_export(
288            &mut EiteState::new(),
289            &Format::ASCII,
290            &Format::utf8_default(),
291            input_bytes,
292            &PrefilterSettings::default(),
293        )
294        .expect("conv failed");
295        assert!(!roundtrip_log.has_warnings());
296
297        assert_vec_u8_eq(&exported, &via_wrapper);
298    }
299}