ctoolbox/formats/eite.rs
1//! Implementation of EITE formats (Rust translation – high‑level export / import wrappers).
2//!
3//! This module exposes a small public convenience surface for converting between
4//! “Dc array” document representations (`Vec<u32>`) and external serialized formats
5//! (bytes) plus a transformation hook. The heavy‑lifting implementations live in
6//! sub‑modules.
7//!
8//! As EITE did not have all intended elements fully implemented, this Rust
9//! version similarly has some limitations.
10//!
11//! Primary entry points provided here:
12//! - [`export_document`]: Dc array -> bytes in a named format.
13//! - [`import_document`]: Bytes in a named format -> Dc array.
14//! - [`import_and_export`]: Bytes in one format -> bytes in another (single convenience hop).
15//! - [`transform_document`]: Apply an in‑memory, pure Dc→Dc transformation.
16//!
17//! These are intentionally thin wrappers delegating to lower‑level functions so that
18//! callers have a stable, ergonomic API while internal modules evolve.
19//!
20//! # Errors
21//! All fallible operations return `anyhow::Result<_>`. Under the hood some modules
22//! may surface structured errors (e.g. an internal `EiteError` enum); those are
23//! converted into `anyhow::Error` at this boundary. Panics are avoided unless a
24//! genuine invariant violation occurs (logic error / programmer bug).
25//!
26//! # Concurrency
27//! The underlying state object (`EiteState`) is passed in mutably for import / export
28//! because format conversion may accumulate warnings, cache dataset lookups, or
29//! modify configuration state required for subsequent operations. Pure transformations
30//! only need an immutable reference (and could, if required later, be made fully
31//! independent of state when transformation semantics are guaranteed to be stateless).
32//!
33//! # Testing
34//! This file includes light “sanity” unit tests that verify:
35//! - Wrapper identity / invariants (e.g. `import_and_export` equivalence with a
36//! separate `import_document` + `export_document` sequence).
37//! - Transformation pass‑through for currently identity‑like transformations
38//! (`semanticToText`, `codeToText`).
39//! - Basic error expectations (unknown transformation).
40//!
41//! For deeper behavioral validation (round‑tripping every supported format,
42//! exhaustive edge cases, dataset driven conversions, escaping, etc.) see the
43//! dedicated tests housed alongside the lower‑level modules.
44//!
45//! # Performance
46//! These wrappers add effectively zero overhead beyond an extra function call.
47//! When profiling, attribute time spent within these wrappers to the delegated
48//! modules (`formats`, `encoding`, `transform`, etc.).
49//!
50//
51// --------------------------------------------------------------------------------
52// Transpilation & Follow‑up Notes
53// --------------------------------------------------------------------------------
54// Source provenance:
55// This file forms part of an incremental port of a legacy JavaScript/StageL
56// implementation of EITE to Rust.
57//
58// Translation policies applied globally across the port (including code not shown here):
59// - Type prefixes from the JS code (`intX`, `strY`, `boolZ`, `byteArr`, etc.) are removed;
60// Rust’s static typing guarantees the original invariants.
61// - Runtime assertion helpers (`assertInt`, `assertStr`, `assertIntArray`, etc.) are
62// eliminated; semantic validation is retained only where needed (e.g. validating
63// bit arrays or settings string shape).
64// - Trivial arithmetic / boolean wrappers (`Add`, `Sub`, `Mod`, `Eq`, `Gt`, `Lt`, `Not`,
65// `And`, `inc`, `dec`, etc.) are expressed directly with native operators.
66// - Logging / failure helpers (`Debug`, `Warn`, `Die`) are mapped to Rust tracing /
67// warning accumulation or error returns (TBD: finalize whether former `Die()`
68// sites should panic or return `Result::Err`; currently leaning toward `Result`).
69// - Environment detection (browser vs worker vs Node) is removed. Asset loading
70// is routed through a single abstraction (e.g. `crate::storage::get_asset(...)`).
71// - Base16b bridging delegates directly to `crate::formats::base16b::Base16b`.
72// - Column indices for Dc datasets (script / bidi class / type / name / combining
73// class / casing / complex traits / description) are preserved as symbolic
74// constants until a unified, possibly zero‑based indexing scheme is finalized.
75//
76// Remaining follow‑up actions
77// 1. Confirm dataset column indexing? (choose 0‑based internally? adapt loader
78// if original JS relied on 1‑based constants).
79// 2. Expand test coverage: every legacy JS test case mirrored in Rust, including
80// edge cases for settings parsing, Basenb, UTF‑8 variant conversions, HTML
81// fragment generation, and transformation pipelines.
82// 3. Consider refactoring legacy while‑loop “boolContinue” idioms into idiomatic
83// iterator chains once correctness is locked in (retain readable imperative
84// fallbacks in tricky hot paths).
85// 4. Reassess any placeholder heuristics (script / bidi / type lookups) once the
86// definitive dataset schema is confirmed.
87//
88// Architectural notes:
89// - Conversion pipeline: `import_document` parses bytes -> intermediate Dc array;
90// `export_document` serializes Dc array -> bytes. `import_and_export` short‑circuits
91// by combining these with a single intermediate vector (no additional data copies
92// beyond what the lower layers perform).
93// - Transformations: `transform_document` intentionally does not expose state mutation;
94// if future transformations require contextual settings or side‑effects, widen the
95// signature or introduce a transformation context object.
96// - Performance critical sections (e.g. pack/unpack bit manipulation, UTF‑8
97// conversions) are implemented in lower modules; this wrapper layer stays
98// allocation‑minimal.
99//
100// Testing / Verification strategy:
101// - Unit tests inside each functional sub‑module provide deterministic correctness.
102// - Integration tests (future) will orchestrate full import->transform->export cycles
103// across all supported formats ensuring round‑trip invariants and preservation of
104// semantic properties (printability, newline normalization, variant fidelity).
105
106// Sub‑modules (actual implementations live there).
107use crate::formats::{
108 FormatLog,
109 eite::{
110 eite_state::EiteState,
111 formats::{
112 Format, PrefilterSettings, convert_formats, dca_from_format,
113 dca_to_format,
114 },
115 transform::{DocumentTransformation, apply_document_transformation},
116 },
117};
118
119use anyhow::Result;
120
121pub mod dc;
122pub mod eite_state;
123pub mod encoding;
124pub mod exceptions;
125pub mod formats;
126pub mod kv;
127pub mod runtime;
128pub mod settings;
129pub mod terminal;
130pub mod transform;
131pub mod util;
132
133/// Export a document (Dc array) into a named external format.
134///
135/// Thin wrapper over `formats::dca_to_format`.
136///
137/// Arguments:
138/// - `state`: Mutable Eite runtime state (collects warnings, caches, settings).
139/// - `out_format`: Target format identifier (case / spelling must match a registered format).
140/// - `dc_array`: The in‑memory Dc array representation (sequence of codepoint / symbol IDs).
141///
142/// Returns serialized bytes on success.
143///
144/// Errors:
145/// - Propagates any format‑specific encoding or validation errors.
146/// - Unknown `out_format` will typically surface as an `Err`.
147pub fn export_document(
148 state: &mut EiteState,
149 out_format: &Format,
150 dc_array: &[u32],
151 prefilter_settings: &PrefilterSettings,
152) -> Result<(Vec<u8>, FormatLog)> {
153 // No preconditions beyond those enforced by the lower layer.
154 dca_to_format(state, out_format, dc_array, prefilter_settings)
155}
156
157/// Import a document from a named external format into a Dc array.
158///
159/// Thin wrapper over `formats::dca_from_format`.
160///
161/// Arguments:
162/// - `state`: Mutable Eite runtime state.
163/// - `in_format`: Source format identifier.
164/// - `content_bytes`: Raw serialized payload.
165///
166/// Returns a newly allocated Dc array (`Vec<u32>`).
167///
168/// Errors:
169/// - Unknown or unsupported `in_format`.
170/// - Malformed content (syntax / structural violations per format rules).
171pub fn import_document(
172 state: &mut EiteState,
173 in_format: &Format,
174 content_bytes: &[u8],
175) -> Result<(Vec<u32>, FormatLog)> {
176 dca_from_format(state, in_format, content_bytes)
177}
178
179/// Convenience: import from one format and export to another in a single step.
180///
181/// Internally delegates to `formats::convert_formats` to allow any shared
182/// optimizations (e.g. streaming pipelines in the future).
183///
184/// Arguments:
185/// - `state`: Mutable runtime state.
186/// - `in_format`: Source format identifier.
187/// - `out_format`: Target format identifier.
188/// - `content_bytes`: Raw serialized input in `in_format`.
189///
190/// Returns serialized bytes in `out_format`.
191///
192/// Errors:
193/// - Any error from `import_document` or `export_document`.
194pub fn import_and_export(
195 state: &mut EiteState,
196 in_format: &Format,
197 out_format: &Format,
198 content_bytes: &[u8],
199 prefilter_settings: &PrefilterSettings,
200) -> Result<(Vec<u8>, FormatLog)> {
201 convert_formats(
202 state,
203 in_format,
204 out_format,
205 content_bytes,
206 prefilter_settings,
207 )
208}
209
210/// Apply an in‑memory document transformation (Dc array -> Dc array).
211///
212/// Delegates to `transform::apply_document_transformation`.
213///
214/// Arguments:
215/// - `_state`: (Reserved) not currently required by the known transformations; kept
216/// for forward compatibility if future transforms become stateful.
217/// - `dc_array`: Source Dc array.
218/// - `transformation`: Transformation identifier (e.g. "semanticToText").
219///
220/// Returns the transformed Dc array (new vector).
221///
222/// Errors:
223/// - Unknown transformation name.
224/// - Transformation specific errors (if any future transform is fallible).
225pub fn transform_document(
226 dc_array: &[u32],
227 transformation: &DocumentTransformation,
228) -> Result<Vec<u32>> {
229 apply_document_transformation(transformation, dc_array)
230}
231
232#[cfg(test)]
233mod tests {
234 use crate::utilities::{assert_vec_u8_eq, assert_vec_u32_eq};
235
236 use super::*;
237
238 // Helper: construct a state. Assumes EiteState implements Default.
239 fn new_state() -> EiteState
240 where
241 EiteState: Default,
242 {
243 EiteState::default()
244 }
245
246 #[crate::ctb_test]
247 fn test_transform_document() {
248 // These filters aren't yet implemented, so they should have no effect
249 let dc_array = vec![1, 2, 3];
250 let sem = transform_document(
251 &dc_array,
252 &&DocumentTransformation::semantic_to_text_default(),
253 )
254 .unwrap();
255 assert_vec_u32_eq(&dc_array, &sem);
256 let code = transform_document(
257 &dc_array,
258 &&DocumentTransformation::code_to_text_default(),
259 )
260 .unwrap();
261 assert_vec_u32_eq(&dc_array, &code);
262 }
263
264 // --- import / export wrappers: sanity & equivalence ---
265
266 // NOTE: This does not assert specific Dc values (which are format dependent),
267 // only that the convert_formats wrapper matches manual composition when both succeed.
268 #[crate::ctb_test]
269 fn test_import_and_export_work() {
270 let input_bytes = b"Hello, EITE!";
271
272 // Path 1: separate import then export (ascii -> Dc -> ascii).
273 let (dc, import_log) =
274 import_document(&mut EiteState::new(), &Format::ASCII, input_bytes)
275 .expect("import failed");
276 assert!(!import_log.has_warnings());
277 let (exported, export_log) = export_document(
278 &mut EiteState::new(),
279 &Format::ASCII,
280 &dc,
281 &PrefilterSettings::default(),
282 )
283 .expect("export failed");
284 assert!(!import_log.has_warnings());
285
286 // Path 2: direct convenience hop (ascii -> utf8).
287 let (via_wrapper, roundtrip_log) = import_and_export(
288 &mut EiteState::new(),
289 &Format::ASCII,
290 &Format::utf8_default(),
291 input_bytes,
292 &PrefilterSettings::default(),
293 )
294 .expect("conv failed");
295 assert!(!roundtrip_log.has_warnings());
296
297 assert_vec_u8_eq(&exported, &via_wrapper);
298 }
299}