Encoding Repair API (v2.0)

Base64-only encoding repair for mojibake across UTF-8 / Shift_JIS / EUC-JP / Latin-1.

A utility API to safely repair mojibake in Japanese and multilingual text by decoding raw bytes via Base64.

Pre-AI Input Hygiene / Encoding Utility
View on RapidAPI GitHub Repo Back to APIron Lab

Overview

Encoding Repair API accepts Base64-encoded raw bytes as input and automatically repairs mojibake across UTF-8, Shift_JIS, EUC-JP, Latin-1, and more.

To prevent information loss from copy & paste or editor conversions, the API deliberately adopts a “Base64-only” input design.

Why this API exists

Encoding failures usually happen before downstream AI or ETL logic even starts. This API preserves raw bytes first, then repairs text deterministically so later steps can work on normalized UTF-8 instead of ambiguous mojibake.

Hosted API on RapidAPI

The Encoding Repair API is available on RapidAPI.

🔗 RapidAPI Hub: https://rapidapi.com/APIronlab/api/encoding-repair-api

Endpoint

POST /encoding/v2/repair

Request body (JSON):

{
  "raw_bytes_base64": "<Base64-encoded bytes>",
  "mode": "auto",
  "target_encoding": "utf-8"
}

Example response:

{
  "result": {
    "fixed_text": "テスト",
    "target_encoding": "utf-8",
    "changed": true
  },
  "meta": {
    "version": "2.0.0",
    "mode_used": "auto",
    "detected_path": "latin1->utf-8",
    "confidence": 0.98,
    "status": "ok",
    "execution_ms": 5.42,
    "input_bytes_length": 9
  }
}

The Safe Filter may return changed: false with original data if the confidence is low.

Response Schema

The API returns a stable two-layer structure that is easy to handle in production.

{
  "result": {
    "fixed_text": "string",
    "target_encoding": "string",
    "changed": false
  },
  "meta": {
    "version": "2.0.0",
    "mode_used": "auto | force",
    "detected_path": "utf-8>shift_jis",
    "confidence": 1.0,
    "status": "ok",
    "execution_ms": 12.41,
    "input_bytes_length": 120
  }
}

Supported Encodings

The Encoding Repair API supports major encodings commonly seen in Japanese environments.

Use Cases

1. Repairing Japanese mojibake

2. Recovering text from raw bytes

Works with use-cases where only raw bytes are available: scraping, mail archives, log collectors, etc.

3. LLM preprocessing (Pre-AI Input Hygiene)

Normalize encodings to UTF-8 before sending text to ChatGPT / Claude / Gemini / local LLMs.

4. Cleaning data before CSV / TSV / Excel imports

Unify encodings for Japanese business systems where mixed-encoding issues often occur.

Quick Start – Python Example

import base64, requests

raw = "テスト".encode("utf-8")
b64 = base64.b64encode(raw).decode("ascii")

payload = {
    "raw_bytes_base64": b64,
    "mode": "auto",
    "target_encoding": "utf-8",
}

res = requests.post(
    "https://your-endpoint/encoding/v2/repair",  # RapidAPI / API Gateway etc.
    json=payload,
)
print(res.json())

Links