Overview
Encoding Repair API accepts Base64-encoded raw bytes as input and automatically repairs mojibake across UTF-8, Shift_JIS, EUC-JP, Latin-1, and more.
To prevent information loss from copy & paste or editor conversions, the API deliberately adopts a “Base64-only” input design.
- Base64-only input (no information loss)
- Automatic encoding detection + safe filter
- Supports UTF-8 / Shift_JIS / EUC-JP / Latin-1 / more
result + metatwo-layer response (APIron Spec)
Why this API exists
Encoding failures usually happen before downstream AI or ETL logic even starts. This API preserves raw bytes first, then repairs text deterministically so later steps can work on normalized UTF-8 instead of ambiguous mojibake.
Hosted API on RapidAPI
The Encoding Repair API is available on RapidAPI.
- One-click request testing in the browser
- API key management, billing, and usage tracking
- Free / BASIC / PRO / ULTRA plans
- Auto-generated snippets for cURL, Node.js, Python, etc.
🔗 RapidAPI Hub: https://rapidapi.com/APIronlab/api/encoding-repair-api
Endpoint
POST /encoding/v2/repair
Request body (JSON):
{
"raw_bytes_base64": "<Base64-encoded bytes>",
"mode": "auto",
"target_encoding": "utf-8"
}
Example response:
{
"result": {
"fixed_text": "テスト",
"target_encoding": "utf-8",
"changed": true
},
"meta": {
"version": "2.0.0",
"mode_used": "auto",
"detected_path": "latin1->utf-8",
"confidence": 0.98,
"status": "ok",
"execution_ms": 5.42,
"input_bytes_length": 9
}
}
The Safe Filter may return changed: false with original data
if the confidence is low.
Response Schema
The API returns a stable two-layer structure that is easy to handle in production.
{
"result": {
"fixed_text": "string",
"target_encoding": "string",
"changed": false
},
"meta": {
"version": "2.0.0",
"mode_used": "auto | force",
"detected_path": "utf-8>shift_jis",
"confidence": 1.0,
"status": "ok",
"execution_ms": 12.41,
"input_bytes_length": 120
}
}
Supported Encodings
The Encoding Repair API supports major encodings commonly seen in Japanese environments.
- UTF-8 – Modern standard with logic tuned for Japanese text
- Shift_JIS (SJIS / CP932) – Widely used in Windows legacy systems
- EUC-JP – Common in Unix / legacy business apps
- ISO-2022-JP (JIS) – Often used in email clients
- UTF-16 / UTF-32 – Handles BOM detection safely
- ASCII – For partially mixed datasets
- Other rare encodings – Processed internally via heuristics
Use Cases
1. Repairing Japanese mojibake
- SJIS → UTF-8 system migrations
- Legacy CSV / TSV / log files
- Mixed Windows / Unix environments
2. Recovering text from raw bytes
Works with use-cases where only raw bytes are available: scraping, mail archives, log collectors, etc.
3. LLM preprocessing (Pre-AI Input Hygiene)
Normalize encodings to UTF-8 before sending text to ChatGPT / Claude / Gemini / local LLMs.
4. Cleaning data before CSV / TSV / Excel imports
Unify encodings for Japanese business systems where mixed-encoding issues often occur.
Quick Start – Python Example
import base64, requests
raw = "テスト".encode("utf-8")
b64 = base64.b64encode(raw).decode("ascii")
payload = {
"raw_bytes_base64": b64,
"mode": "auto",
"target_encoding": "utf-8",
}
res = requests.post(
"https://your-endpoint/encoding/v2/repair", # RapidAPI / API Gateway etc.
json=payload,
)
print(res.json())