Encoding Repair API – APIron Lab

Overview

Encoding Repair API accepts Base64-encoded raw bytes as input and automatically repairs mojibake across UTF-8, Shift_JIS, EUC-JP, Latin-1, and more.

To prevent information loss from copy & paste or editor conversions, the API deliberately adopts a “Base64-only” input design.

Base64-only input (no information loss)
Automatic encoding detection + safe filter
Supports UTF-8 / Shift_JIS / EUC-JP / Latin-1 / more
result + meta two-layer response (APIron Spec)

Why this API exists

Encoding failures usually happen before downstream AI or ETL logic even starts. This API preserves raw bytes first, then repairs text deterministically so later steps can work on normalized UTF-8 instead of ambiguous mojibake.

Hosted API on RapidAPI

The Encoding Repair API is available on RapidAPI.

One-click request testing in the browser
API key management, billing, and usage tracking
Free / BASIC / PRO / ULTRA plans
Auto-generated snippets for cURL, Node.js, Python, etc.

🔗 RapidAPI Hub: https://rapidapi.com/APIronlab/api/encoding-repair-api

Endpoint

POST /encoding/v2/repair

Request body (JSON):

{
  "raw_bytes_base64": "<Base64-encoded bytes>",
  "mode": "auto",
  "target_encoding": "utf-8"
}

Example response:

{
  "result": {
    "fixed_text": "テスト",
    "target_encoding": "utf-8",
    "changed": true
  },
  "meta": {
    "version": "2.0.0",
    "mode_used": "auto",
    "detected_path": "latin1->utf-8",
    "confidence": 0.98,
    "status": "ok",
    "execution_ms": 5.42,
    "input_bytes_length": 9
  }
}

The Safe Filter may return changed: false with original data if the confidence is low.

Response Schema

The API returns a stable two-layer structure that is easy to handle in production.

{
  "result": {
    "fixed_text": "string",
    "target_encoding": "string",
    "changed": false
  },
  "meta": {
    "version": "2.0.0",
    "mode_used": "auto | force",
    "detected_path": "utf-8>shift_jis",
    "confidence": 1.0,
    "status": "ok",
    "execution_ms": 12.41,
    "input_bytes_length": 120
  }
}

Supported Encodings

The Encoding Repair API supports major encodings commonly seen in Japanese environments.

UTF-8 – Modern standard with logic tuned for Japanese text
Shift_JIS (SJIS / CP932) – Widely used in Windows legacy systems
EUC-JP – Common in Unix / legacy business apps
ISO-2022-JP (JIS) – Often used in email clients
UTF-16 / UTF-32 – Handles BOM detection safely
ASCII – For partially mixed datasets
Other rare encodings – Processed internally via heuristics

Use Cases

1. Repairing Japanese mojibake

SJIS → UTF-8 system migrations
Legacy CSV / TSV / log files
Mixed Windows / Unix environments

2. Recovering text from raw bytes

Works with use-cases where only raw bytes are available: scraping, mail archives, log collectors, etc.

3. LLM preprocessing (Pre-AI Input Hygiene)

Normalize encodings to UTF-8 before sending text to ChatGPT / Claude / Gemini / local LLMs.

4. Cleaning data before CSV / TSV / Excel imports

Unify encodings for Japanese business systems where mixed-encoding issues often occur.

Quick Start – Python Example

import base64, requests

raw = "テスト".encode("utf-8")
b64 = base64.b64encode(raw).decode("ascii")

payload = {
    "raw_bytes_base64": b64,
    "mode": "auto",
    "target_encoding": "utf-8",
}

res = requests.post(
    "https://your-endpoint/encoding/v2/repair",  # RapidAPI / API Gateway etc.
    json=payload,
)
print(res.json())

Encoding Repair API (v2.0)