Skip to main content

SF-0224 · Scenario · Medium

How to import data containing special characters using salesforce Data Loader?

✓ Verified by Vikas Singhal · Last reviewed 5/17/2026 · Updated for Spring '26

If accented letters (é, ñ, ü), CJK characters, emoji, or smart quotes turn into garbage like é or ? after a load, you have an encoding problem. Salesforce stores strings as UTF-8, but Data Loader has to be told the source file is UTF-8 too — otherwise it reads bytes assuming Windows-1252 (the legacy default) and ships the wrong code points.

The fix in three steps

  1. Save the CSV as UTF-8. In Excel: Save As → CSV UTF-8 (Comma delimited) (.csv). In Google Sheets: download as CSV (it’s always UTF-8). In Notepad: Save As → Encoding: UTF-8. In VS Code: bottom-right encoding indicator → reopen / save with encoding UTF-8.
  2. Open Data Loader → Settings → Settings. Set:
    • Read UTF-8 encodingchecked
    • Write UTF-8 encodingchecked (so success/error files come back correctly too)
  3. Re-run the job. The special characters now flow through intact.

Why this matters

Excel on Windows often saves CSVs in Windows-1252 (CP1252) by default. That encoding handles Western European characters reasonably but mangles everything else. Without the UTF-8 toggle, Data Loader passes the bytes to the API unchanged, and Salesforce interprets the raw bytes as UTF-8 — which gives you mojibake.

Always insist on UTF-8 in, UTF-8 out.

Quick diagnostic

If you load José and Salesforce shows José, you wrote UTF-8 bytes but Data Loader read them as Windows-1252 (or vice versa). Tick the UTF-8 read setting and reload.

If you see literal question marks (?) the data was downgraded somewhere — probably saved from Excel as plain CSV without the UTF-8 variant. Re-save the source.

Special character checklist

  • Accented Latin (é, ñ, ü, Ç) — UTF-8 toggle handles it.
  • CJK (Chinese, Japanese, Korean) — UTF-8 is mandatory; ensure your CSV editor isn’t silently transliterating.
  • Right-to-left scripts (Arabic, Hebrew) — UTF-8 handles the code points; rendering depends on the field’s display, not the data.
  • Emoji and supplementary plane characters (😀, 𝓗) — Salesforce supports these as long as the field is configured for Unicode (most are). Use UTF-8.
  • Smart quotes / em-dashes (", ', ) — Word and some web tools convert ASCII punctuation. Either accept them (UTF-8 carries them fine) or strip them in pre-processing.

Tips for production loads

  • Byte Order Mark (BOM) at the start of a UTF-8 file is harmless to Data Loader but breaks some downstream tools — strip it if your validators complain.
  • Excel re-opens UTF-8 CSVs in CP1252 unless you explicitly Import from Text. Use a CSV-aware viewer (VS Code, Sublime, Notepad++) to verify before loading.
  • Long unicode strings still count against text-field byte limits, not just character limits — multi-byte characters use more storage.

Verified against: Data Loader Guide — Configuring, Metadata API Developer Guide. Last reviewed 2026-05-17 for Spring ‘26 release.