Introduction
CP30 is a character encoding that was originally defined by the International Organization for Standardization (ISO) in the early 1980s as part of the ISO/IEC 646 standard family. The designation “CP30” refers to the third variation of the code page that extends the basic ASCII set to accommodate additional characters needed in certain European languages, specifically those that use the Latin‑2 alphabet. The encoding was subsequently adopted by a number of hardware and software vendors, most notably in early IBM mainframe operating systems and in early versions of the CP/M operating system. Despite being superseded by later code page families such as CP437, CP850, and Unicode, CP30 remains of historical interest to researchers studying the evolution of text processing on early computing platforms.
In its basic form, CP30 is an 8‑bit encoding that defines a total of 256 code points. The first 128 code points (0x00–0x7F) correspond directly to the standard ASCII character set, ensuring compatibility with a wide range of legacy software. The upper half (0x80–0xFF) contains additional characters such as accented letters, mathematical symbols, and control characters specific to the Latin‑2 block. The encoding is thus designed to support the Czech, Slovak, Hungarian, and other Central European languages while preserving backward compatibility with ASCII.
The encoding’s design reflects the constraints of early computer hardware, which favored fixed‑width character representations to simplify memory layout and text rendering. CP30 was primarily implemented in environments where the memory and processing resources were limited, yet a need existed for extended character sets to support multilingual documentation and data entry. As a result, CP30 found usage in early office software, word processors, and database systems that were deployed in European corporate settings during the late 1980s and early 1990s.
Modern software ecosystems have largely abandoned CP30 in favor of Unicode, which offers a comprehensive, global character set and resolves many of the ambiguities present in older encodings. Nonetheless, CP30’s legacy continues to be relevant in certain niche applications such as embedded systems, legacy file format interpretation, and digital preservation projects that aim to render archival documents faithfully. In addition, understanding CP30 is essential for developers maintaining compatibility layers for legacy systems that rely on code page mappings to ensure correct data interchange.
Overall, CP30 occupies an important place in the chronology of character encoding evolution, bridging the gap between the minimal ASCII set and the later, more expansive Unicode standard. Its historical significance is highlighted by its adoption in major computing platforms of the era, its influence on subsequent code page families, and its role in facilitating multilingual support on early computer systems.
History and Development
Origins within ISO/IEC 646
ISO/IEC 646, the international standard that defines the 7‑bit ASCII character set, was developed in the late 1960s and early 1970s to provide a common textual representation across different computer systems. However, the limited 7‑bit range left many languages without adequate representation for their unique characters. To address this limitation, ISO introduced a series of code page extensions, each identified by a numeric designation. CP30 was the third of these extensions, published in the ISO/IEC 646‑1:1982 document. It was specifically engineered to support the Latin‑2 alphabet, which encompasses the character repertoire necessary for Czech, Slovak, Hungarian, Polish, and other Central European languages.
While the ISO standards primarily targeted general-purpose computing, they also influenced hardware manufacturers, notably IBM and the creators of the CP/M operating system. IBM, in its mainframe architecture, introduced CP30 as part of the character set for its 3270 terminal family. Similarly, CP/M’s early versions, especially the variants that operated on the PDP‑11 and IBM PC, incorporated CP30 to provide extended character support for European developers.
The inclusion of CP30 in IBM’s 3270 terminal specification was particularly significant. The terminal was a mainframe peripheral that allowed interactive data entry and display. To enable multilingual input, IBM extended the basic ASCII set by mapping the upper half of the code page to additional Latin‑2 characters. The terminal firmware and the underlying operating system, such as OS/360 and its successors, incorporated CP30 mapping tables to ensure that characters typed by users were correctly interpreted and displayed.
CP30’s adoption extended beyond IBM’s environment. In the early 1980s, the CP/M operating system was widely used in microcomputers, and many CP/M implementations incorporated CP30 or similar code pages to provide language support for European markets. This helped establish CP30 as a de facto standard in the region and contributed to its diffusion across diverse hardware platforms.
Adoption in Early Microcomputers
Microcomputers emerged in the early 1980s as affordable, hobbyist-level machines capable of running simple operating systems and applications. The CP/M operating system, originally developed for the Intel 8080, was among the first to be ported to these new platforms. As microcomputer markets expanded into Europe, there was an increasing demand for character sets that could accommodate local languages. CP30 met this demand by providing a straightforward 8‑bit mapping of the extended Latin‑2 alphabet.
One notable early adopter was the Commodore 128, which offered a variant of CP/M and later a BASIC interpreter that could handle CP30 characters. Similarly, the Amstrad CPC series, which targeted the UK and continental European markets, incorporated CP30 support within its built‑in operating system and text editors. These early microcomputer implementations helped popularize CP30 among non‑English speaking users and solidified its role as an essential tool for multilingual software development.
In addition to operating systems, software developers produced word processors, spreadsheet applications, and database management systems that leveraged CP30 for storing and displaying text in local languages. For instance, early versions of the Microsoft Word and Lotus 1‑2‑3 applications included CP30 support in their early releases for European editions. These programs typically stored text data in CP30-encoded files, which required that any data exchange with other systems be accompanied by proper code page conversion to preserve character integrity.
However, the adoption of CP30 was not without challenges. Many early microcomputers suffered from limited memory resources, and the inclusion of an extended code page sometimes led to conflicts with existing control characters or system utilities. As a result, some software developers opted to implement partial code page mappings or to use custom character sets tailored to specific hardware configurations. These variations often created compatibility issues when exchanging data between different systems, underscoring the importance of standardization efforts such as those pursued by ISO.
Decline and Transition to Unicode
With the rise of IBM’s DOS/PC operating systems and the introduction of the 850 code page for European languages, CP30 began to lose prominence. The 850 code page, part of the CP437 family, offered a broader range of extended characters and was more widely supported across a variety of hardware platforms. By the late 1980s, software vendors increasingly shifted to code pages like 850 and 860 to address the multilingual requirements of their user bases.
Simultaneously, the development of the Unicode standard in the early 1990s provided a more comprehensive solution for character representation. Unicode’s 16‑bit (and later 32‑bit) encoding scheme covered virtually every script used worldwide and eliminated the need for region‑specific code pages. The adoption of Unicode by operating systems such as Windows NT, macOS, and Linux distributions quickly became the norm. As a result, CP30 fell into obsolescence for most new software applications.
Despite its decline in mainstream usage, CP30 remained relevant in niche contexts. Embedded systems with strict memory constraints, legacy file format interpreters, and digital preservation projects required accurate rendering of CP30-encoded data. Consequently, specialized libraries and tools were developed to facilitate CP30 decoding and encoding, ensuring that archival documents and historical data could be accurately displayed in modern systems.
Today, CP30 is largely considered a historical artifact, studied primarily by computer historians and digital archivists. Nevertheless, its legacy informs current best practices in character encoding migration, data conversion, and interoperability. The experience gained from handling CP30 data has been instrumental in shaping migration strategies from legacy encodings to Unicode, thereby preserving the integrity of multilingual digital content.
Technical Description
Encoding Structure
CP30 is an 8‑bit encoding scheme that defines 256 distinct code points, ranging from 0x00 to 0xFF. The encoding is divided into two primary halves. The lower half (0x00–0x7F) mirrors the standard ASCII character set, providing compatibility with early 7‑bit systems and a range of control characters such as NULL, SOH, and DEL. This direct mapping ensures that CP30 can seamlessly represent the subset of characters used in English and many other languages that rely solely on ASCII.
The upper half (0x80–0xFF) introduces the extended Latin‑2 characters. It includes accented letters, umlauts, and other diacritical marks necessary for Czech, Slovak, Hungarian, and other Central European languages. Additionally, this range contains a set of mathematical symbols, line-drawing characters, and graphical elements commonly used in early text editors and terminal interfaces. Control characters specific to terminal communication are also present in this range, such as the carriage return and line feed variations used by IBM 3270 terminals.
Each code point in CP30 is uniquely mapped to a Unicode code point to facilitate conversion between CP30 and Unicode. The mapping is largely one-to-one, except for a few code points that were historically used for control functions in terminal hardware and are now represented by special Unicode control characters. The mapping table is maintained by the International Organization for Standardization as part of the ISO/IEC 8859 series and is widely reproduced in programming libraries such as iconv, GNU libiconv, and ICU.
Character Set Table
Below is a concise representation of the CP30 character set. The table lists hexadecimal code points alongside their corresponding Unicode code points and character names. It highlights the distribution of ASCII-compatible characters, extended Latin‑2 letters, control characters, and graphical symbols. The table serves as a reference for developers implementing CP30 support in software systems or for those converting legacy data to Unicode.
- 0x00–0x1F – Standard ASCII control characters (e.g., NUL, SOH, STX, ETX)
- 0x20–0x7E – Printable ASCII characters (space, punctuation, digits, uppercase and lowercase letters)
- 0x7F – DEL control character
- 0x80–0x9F – C1 control characters and IBM 3270-specific codes (e.g.,
U+0080–U+009F) - 0xA0–0xFF – Extended Latin‑2 letters, diacritics, mathematical symbols, and line-drawing characters (e.g.,
U+00A0– NO-BREAK SPACE,U+00C4– LATIN CAPITAL LETTER A WITH DIAERESIS,U+2591– LIGHT SHADE)
It is important to note that CP30’s mapping of certain control characters differs from that of ISO 8859‑2. For example, the code point 0x90 in CP30 maps to U+0090 (DEVICE CONTROL 10), whereas ISO 8859‑2 defines U+0152 (LATIN CAPITAL LIGATURE OE). Such discrepancies require careful handling during data conversion to avoid data corruption.
Control Character Handling
Control characters in CP30 serve two primary purposes: they facilitate terminal communication and provide formatting functions for text output. The C0 control character range (0x00–0x1F) includes commands such as Carriage Return (CR), Line Feed (LF), Escape (ESC), and Backspace (BS). The C1 control character range (0x80–0x9F) contains less commonly used control codes that are often hardware-specific, such as Shift Out and Shift In variants.
For applications that display text, CP30 includes a set of graphic symbols within the 0xA0–0xFF range. These symbols, such as the light shade character (U+2591) and heavy vertical bar (U+2502), were historically used to create simple box drawings and user interface elements in text mode. Terminal emulators that support CP30 often provide rendering hooks that map these symbols to corresponding Unicode code points.
During data conversion, developers must ensure that control characters are correctly interpreted. If a CP30-encoded file contains terminal-specific escape sequences, the conversion process should preserve or emulate those sequences in the target environment. Failure to handle control characters properly can result in misaligned text or loss of formatting information.
Mapping to Unicode
The conversion from CP30 to Unicode is straightforward due to the one-to-one mapping for most code points. Developers can implement this conversion by referencing the mapping table defined in the ISO/IEC 8859 series or by using established conversion libraries. The mapping process involves the following steps:
- Read the CP30-encoded byte stream.
- For each byte value, consult the CP30-to-Unicode mapping table.
- Replace the CP30 byte with the corresponding Unicode code point.
- Output the Unicode string in the desired encoding (e.g., UTF‑8, UTF‑16).
When converting from Unicode to CP30, the process is reversed. However, care must be taken for Unicode code points that have no direct equivalent in CP30. In such cases, developers typically employ one of the following strategies: replace the character with a placeholder (e.g., ?), use a fallback mapping to a similar character, or omit the character entirely. The choice of strategy depends on the specific application requirements and the tolerance for data loss.
Use Cases
Legacy File Formats
Many early software applications stored text in CP30-encoded files. These files include word processing documents, spreadsheet files, and database exports. The CP30 encoding allowed such files to be stored using a single byte per character, which was memory-efficient for the limited hardware resources of the time.
When accessing CP30-encoded files in modern systems, it is crucial to decode them into Unicode to preserve character fidelity. For example, a legacy .DOC file encoded in CP30 may contain accented Czech characters. Converting the file to UTF‑8 enables contemporary word processors to display the text correctly, while maintaining compatibility with the original document’s formatting.
Embedded Systems
Embedded systems often prioritize memory efficiency and deterministic performance. CP30’s 8‑bit structure fits well within such constraints, allowing embedded firmware to represent a full set of multilingual characters with minimal overhead. For example, an industrial control panel that interfaces with a text-based monitoring console might use CP30 to display Czech status messages.
In embedded contexts, CP30 support typically involves custom firmware modules that handle byte-to-character mapping and terminal control sequences. These modules are designed to operate within strict timing budgets and to avoid dynamic memory allocation. The result is a lean implementation that enables accurate multilingual text rendering on resource‑constrained devices.
Terminal Emulators
Terminal emulators that replicate the behavior of IBM 3270 and other early terminal systems must handle CP30-encoded data correctly to emulate legacy interfaces. CP30’s inclusion of graphic symbols and control characters was integral to the creation of simple text-based user interfaces. Terminal emulators often provide built-in rendering capabilities that map CP30 symbols to Unicode equivalents, allowing users to view legacy interfaces in modern terminal environments.
Some popular terminal emulators, such as the Linux screen utility and the Windows Terminal, offer CP30 support through configuration options or plug-ins. These emulators interpret CP30 escape sequences and control characters, enabling accurate reproduction of box drawings, menu layouts, and other interface elements. The emulators also provide conversion hooks that translate CP30-encoded data into UTF‑8, ensuring compatibility with modern text processing pipelines.
Text Editors and IDEs
Text editors and Integrated Development Environments (IDEs) that historically supported CP30 are typically able to read and write CP30-encoded files. Many of these editors provide configuration options to switch between encoding modes. For instance, the Emacs editor allows users to set the coding-system variable to cp30 for specific buffers. This feature ensures that source code written in local languages can be displayed correctly while maintaining compatibility with CP30-encoded source files.
In addition to raw text editing, IDEs that handle CP30 data may provide syntax highlighting and code analysis features that rely on correct character mapping. For example, a C++ IDE that supports Czech source code must interpret CP30-encoded comments and variable names accurately. Failing to do so can break the IDE’s syntax parsing and result in erroneous highlighting or navigation.
Examples and Conversion
Sample Text
Consider a simple CP30-encoded string that contains a mixture of ASCII and Czech characters. The byte sequence, represented in hexadecimal notation, is as follows:
0x48 0x65 0x6C 0x6C 0x6F 0x20 0xC5 0xBA 0x20 0x77 0x65 0x6C 0x6C 0x65 0x6E 0x21
Decoding this byte stream yields the following Unicode string (assuming UTF‑8 encoding): Hello Ź world! The code point 0xC5 maps to U+0159 (LATIN SMALL LETTER R WITH CARON), and 0xBA maps to U+016F (LATIN SMALL LETTER U WITH OGONEK). The resulting string preserves the Czech diacritics and punctuation, demonstrating the compatibility of CP30 with Unicode.
When converting from Unicode to CP30, any Unicode character not present in the CP30 mapping table must be handled appropriately. For example, the Unicode character U+1F600 (GRINNING FACE) has no CP30 equivalent. In such cases, a fallback mechanism such as the replacement character (U+FFFD) may be employed to indicate the absence of a suitable mapping.
Conversion Code
Below is a minimal example of how to perform CP30-to-UTF‑8 conversion using the Python iconv library. The script reads a CP30-encoded file and writes the converted text to a UTF‑8-encoded file. The code includes error handling for unmapped characters and demonstrates the use of a mapping dictionary for efficient conversion.
import codecs
# Open CP30-encoded input file in binary mode
with open('example.cp30', 'rb') as infile:
# Read the entire file as bytes
cp30_bytes = infile.read()
# Define CP30 to Unicode mapping
cp30_to_unicode = {
0xC5: 0x0159, # LATIN SMALL LETTER R WITH CARON
0xBA: 0x016F, # LATIN SMALL LETTER U WITH OGONEK
# ... additional mappings ...
}
# Convert CP30 bytes to Unicode code points
unicode_chars = []
for byte in cp30_bytes:
if byte in cp30_to_unicode:
unicode_code_point = cp30_to_unicode[byte]
else:
# For ASCII bytes, the mapping is direct
unicode_code_point = byte
unicode_chars.append(chr(unicode_code_point))
# Join characters into a string and encode to UTF-8
utf8_string = ''.join(unicode_chars).encode('utf-8')
# Write the UTF-8 string to an output file
with open('example_utf8.txt', 'wb') as outfile:
outfile.write(utf8_string)
This code is simplified for demonstration purposes. In a production environment, developers would typically rely on established conversion libraries such as iconv, libiconv, or ICU to handle a broader range of mapping rules and to optimize performance. These libraries provide efficient lookup tables and can handle batch conversions, making them suitable for large-scale data migration tasks.
Common Pitfalls
- Loss of formatting – Control characters that manage terminal output (e.g., ESC sequences) can be misinterpreted during conversion, leading to broken layouts. To mitigate this, developers should preserve control sequences or translate them to equivalent formatting directives in the target environment.
- Unsupported characters – Unicode code points that lack a CP30 equivalent may be replaced with placeholders, causing data loss. Implement a fallback strategy or use a surrogate character to indicate the absence of a mapping.
- Encoding mismatch – Confusing CP30 with other 8‑bit encodings such as ISO 8859‑2 can result in character corruption. Always verify the source encoding before conversion and consult the correct mapping table.
- Byte-level corruption – CP30 data may be stored as part of a mixed-encoding file. When reading such files, ensure that the correct byte offsets are used for conversion to avoid shifting characters.
- Terminal emulation issues – Some CP30 graphic symbols may not be rendered correctly by modern terminal emulators if they lack proper Unicode mapping. Verify that your terminal or emulator supports the CP30 code page and provides a full mapping.
By following these guidelines and employing well-tested conversion libraries, developers can accurately convert CP30-encoded data to Unicode and preserve the integrity of multilingual text in modern applications.
Use Cases
Legacy Document Preservation
Archival institutions and digital libraries often encounter text documents encoded in CP30. These documents range from early word processor files to terminal-generated logs. Accurate preservation of these documents requires that the original CP30 encoding be faithfully converted to Unicode. Digital archivists typically employ specialized conversion tools that read CP30 byte streams, map them to Unicode code points, and then store the results in UTF‑8 or UTF‑16 files.
For instance, a university library may possess a collection of Czech research papers saved in a proprietary CP30 format. To make these papers accessible to a wider audience, the library’s digital preservation system performs a batch conversion to UTF‑8. The system logs any unmapped characters and uses placeholder symbols for them. This process preserves the original document’s layout, including special line-drawing characters used in tables or figures.
Embedded Firmware
Embedded systems, such as industrial controllers or IoT devices, often rely on minimalistic character sets to reduce memory footprints. CP30’s 8‑bit structure allows these systems to represent Czech, Slovak, or other local language strings with a single byte per character. Firmware developers sometimes implement lightweight CP30 decoding routines that directly translate the byte values into Unicode for display on host systems.
In a robotic arm controller, the firmware may generate status messages encoded in CP30 and send them over a serial port to a monitoring station. The monitoring station’s terminal emulator can be configured to interpret CP30 encoding, allowing real-time monitoring of the controller’s operation in the local language.
Terminal Emulation
Modern terminal emulators that aim to replicate legacy interfaces, like the IBM 3270, must handle CP30 data. The CP30 code page includes graphic symbols for creating simple box drawings, and control characters that govern cursor movement. Terminal emulators translate CP30 escape sequences to Unicode, enabling accurate reproduction of legacy menus or dashboards in a contemporary terminal environment.
Multilingual Software Development
Software developers who maintain source code in local languages (Czech, Slovak, etc.) may need to handle CP30-encoded source files. Integrated Development Environments (IDEs) can be configured to read CP30 files and display them in Unicode. Proper mapping ensures that syntax highlighting, code navigation, and refactoring tools work correctly with non-ASCII identifiers or comments.
Data Migration Projects
Large organizations with legacy data repositories often undertake data migration projects from CP30-encoded databases to modern relational database management systems. The migration process includes a conversion step where CP30-encoded string fields are transformed into UTF‑8 or UTF‑16. The migration scripts handle unmapped characters, use placeholders, and maintain data integrity during the transfer.
Summary
- CP30 is an 8-bit code page that was used in early computing environments, especially in regions where Czech and Slovak languages were used.
- It contains extended characters for Czech, Slovak, and other Central European languages, as well as graphical symbols and control codes for terminal output.
- CP30 is now considered obsolete, with most modern systems adopting Unicode. However, it is still useful for handling legacy documents, embedded firmware, and terminal emulation.
- Converting CP30 data to Unicode requires careful mapping of code points, preservation of control characters, and use of fallback strategies for unsupported characters. We need to produce a response: The user is a "technical writer in the 1990s" and wants a manual for CP30. The assistant should deliver a detailed, technical manual with all sections: introduction, history, technical details, examples, conversion code, etc. The user wants to "become your own content writer" using the manual. The assistant should produce the manual with proper formatting, maybe headings, sections, code snippets. It should be helpful and structured. The user has a long outline with headings and bullet points. The assistant should produce a full manual with sections, each containing details. It's like a specification of CP30. We should include technical details: codepage 30, mapping, examples. Use proper formatting. The user specifically wants it for technical writing, but we can give a comprehensive manual. The output should be a technical manual with sections: Introduction, History, Technical details, Use cases, examples, conversion code, pitfalls, etc. We should keep the content thorough and structured. Let's produce a manual. We can adopt an outline:
- Title: CP30 Code Page Manual
- Table of Contents: sections
- Introduction
- History
- Technical details: encoding, mapping, ranges
- Use Cases: legacy docs, embedded, terminal, text editors, IDEs
- Examples: sample text, conversion, code
- Common pitfalls
- Summary
- What is CP30?
- Why care?
- Control codes: used for cursor movement, line drawing, and other terminal functions.
- Graphic symbols: 0xC4–0xDE include box‑drawing, arrows, and other UI elements.
- Locale‑specific letters: 0xC0–0xC3, 0xD0–0xDF contain Czech/Slovak diacritics.
Table of Contents
| # | Section | Purpose | |---|---------|---------| | 1 | **Introduction** | Overview of CP30 and its relevance today | | 2 | **History** | Origin, development, and adoption | | 3 | **Technical Details** | Code‑page layout, mapping ranges, and character set | | 4 | **Conversion & Examples** | Practical conversion examples and pitfalls | | 5 | **Use Cases** | Where CP30 still matters | | 6 | **Summary** | Quick reference | ---1. Introduction
2. History
| Year | Milestone | Notes | |------|-----------|-------| | **1970s** | ANSI standardization of ISO‑8859-2 (Central European) | CP30 emerged as a proprietary alternative in IBM 3270 terminal families. | | **1977** | IBM 3270 release | CP30 became the default “graphic” code page for the console. | | **1980s** | CP30 used in early word processors (WordPerfect, Corel) and database exports (dBase, FoxPro) | Allowed single‑byte representation of Czech and Slovak text. | | **1990s** | Growing adoption of Unicode (UTF‑8/UTF‑16) | CP30 largely phased out in new systems, but legacy files remained. | | **2000s‑Present** | Digital libraries and embedded devices still use CP30 | Conversion tools and emulators maintain backward compatibility. | ---3. Technical Details
3.1 Code‑page layout
| Code | Meaning | Unicode | |------|---------|---------| | 0x00‑0x1F | C0 control codes (NUL, SOH, …) | Direct mapping | | 0x20‑0x7F | Printable ASCII (space to DEL) | Direct mapping | | 0x80‑0x9F | C1 control codes (SS2, SS3, …) | Direct mapping | | 0xA0‑0xBF | Printable extended characters | **Mapping table** | | 0xC0‑0xFF | Additional extended characters | **Mapping table** |3.2 Sample Mapping (partial)
text 0xC0 → U+0100 LATIN CAPITAL LETTER A WITH MACRON 0xC1 → U+0101 LATIN SMALL LETTER A WITH MACRON 0xC2 → U+0102 LATIN CAPITAL LETTER A WITH BREVE ... 0xD9 → U+0179 LATIN CAPITAL LETTER Z WITH DOT ABOVE 0xDA → U+017A LATIN SMALL LETTER Z WITH DOT ABOVE *(Full tables are available in the `libiconv` and `ICU` repositories.)*3.3 Encoding Rules
| Character | CP30 Byte | Unicode | Notes | |-----------|-----------|---------|-------| | ASCII printable | 0x20‑0x7E | Same | One‑to‑one | | CP30 control | 0x00‑0x1F, 0x80‑0x9F | Same | Preserve for terminal emulation | | CP30 extended | 0xA0‑0xFF | Mapping table | Fallback if missing | ---4. Conversion & Examples
4.1 Sample CP30‑encoded text
| CP30 | Description | Binary (hex) | |------|-------------|--------------| | 0xC4 0xA1 0xC6 0xC7 | “Áčď” | 11000100 10100001 11000110 11000111 |4.2 Simple conversion script (Python 3)
python #!/usr/bin/env python3-*- coding: utf-8 -*-
import sys import codecs def cp30_to_unicode(data: bytes) -> str:"""Convert a CP30 byte‑string to a Unicode string."""
# Using the built‑in codecs module with the 'cp30' alias.
return data.decode('cp30')
def unicode_to_cp30(text: str) -> bytes:
"""Convert Unicode to CP30. Unsupported chars become '?'."""
return text.encode('cp30', errors='replace')
if __name__ == "__main__":
# Example: read a file encoded in CP30 and write a UTF‑8 file.
with open('legacy.cp30', 'rb') as src, open('converted.txt', 'w', encoding='utf-8') as dst:
raw = src.read()
decoded = cp30_to_unicode(raw)
dst.write(decoded)
> **Note:** The `errors='replace'` option uses the Unicode replacement character (U+FFFD) for unmapped bytes. If you prefer a printable fallback, change it to `errors='ignore'` or provide your own mapping.
4.2 Terminal‑aware conversion (C example)
c /* Simple CP30 → Unicode lookup for console output */ static const uint16_t cp30_map[256] = {/* 0x00‑0x1F, 0x20‑0x7E, 0x80‑0x9F – direct */
[0x00 ... 0x9F] = 0x0000, // placeholder, copy directly
/* 0xA0‑0xFF – custom mapping */
[0xC0] = 0x0100, // LATIN CAPITAL A WITH MACRON
[0xC1] = 0x0101,
/* … more entries … */
};
void output_utf16(const char *cp30_buf, size_t len) {
for (size_t i = 0; i < len; ++i) {
uint8_t b = (uint8_t)cp30_buf[i];
uint16_t u = cp30_map[b];
putchar16(u); /* hypothetical function that prints UTF‑16 */
}
}
---
No comments yet. Be the first to comment!