LSEG: Segment-Based Protocol for Data Interpretation (LSEG: A Segment-Based Protocol for Data Interpretation)

Alexey A. Nekludoff

ORCID: 0009-0002-7724-5762

DOI: 10.5281/zenodo.17786342

02 December 2025

Original language of the article: Russian

PDF
Canonical Version (Zenodo DOI):
Local Mirror (Astraverge.org):

Abstract

The paper presents LSEG (Language Segment Encoding)—a minimalistic and extensible segment-based protocol for interpreting data streams. Each segment begins with the byte 0x00, followed by LANG_ID, which determines the choice of parser for the subsequent bytes. The protocol does not constrain the internal structure of tables (alphabets) and allows arbitrary interpretation mechanisms: from simple single-byte tables to full-fledged Unicode decoders, binary formats, DSLs (JSON, XML, EDF), and AST representations.

LSEG provides:

high data compactness (savings up to 50% without compression),

improved compressibility (up to 70–80% with gzip/zstd),

stream self-synchronization,

clear separation of structure and interpretation mechanism.

Files using this protocol are recommended to be designated with the.lseg extension, and the corresponding MIME type: application/lseg.

The full version of the article is available at the following link: https://astraverge.org/ru/p/10038 (in Russian).