php - Detect encoding and make everything UTF-8 -
i'm reading out lots of texts various rss feeds , inserting them database.
of course, there several different character encodings used in feeds, e.g. utf-8 , iso-8859-1.
unfortunately, there problems encodings of texts. example:
1) "ß" in "fußball" should in database: "Ÿ". if "Ÿ", displayed correctly.
2) sometimes, "ß" in "fußball" looks in database: "ß". displayed wrongly, of course.
3) in other cases, "ß" saved "ß" - without change. displayed wrongly.
what can avoid cases 2 , 3?
how can make same encoding, preferably utf-8? when must use utf8_encode(), when must use utf8_decode() (it's clear effect when must use functions?) , when must nothing input?
can me , tell me how make same encoding? perhaps function mb-detect-encoding()? can write function this? problems are: 1) how find out encoding text uses 2) how convert utf-8 - whatever old encoding is
edit: function work?
function correct_encoding($text) { $current_encoding = mb_detect_encoding($text, 'auto'); $text = iconv($current_encoding, 'utf-8', $text); return $text; }
i've tested doesn't work. what's wrong it?
if apply utf8_encode() utf8 string return garbled utf8 output.
i made function addresses issues. it´s called encoding::toutf8().
you dont need know encoding of strings is. can latin1 (iso 8859-1), windows-1252 or utf8, or string can have mix of them. encoding::toutf8() convert utf8.
i did because service giving me feed of data messed up, mixing utf8 , latin1 in same string.
usage:
require_once('encoding.php'); use \forceutf8\encoding; // it's namespaced now. $utf8_string = encoding::toutf8($utf8_or_latin1_or_mixed_string); $latin1_string = encoding::tolatin1($utf8_or_latin1_or_mixed_string);
download:
https://github.com/neitanod/forceutf8
update:
i've included function, encoding::fixuft8(), fix every utf8 string looks garbled.
usage:
require_once('encoding.php'); use \forceutf8\encoding; // it's namespaced now. $utf8_string = encoding::fixutf8($garbled_utf8_string);
examples:
echo encoding::fixutf8("fédération camerounaise de football"); echo encoding::fixutf8("fédération camerounaise de football"); echo encoding::fixutf8("fÃÂédÃÂération camerounaise de football"); echo encoding::fixutf8("fédération camerounaise de football");
will output:
fédération camerounaise de football fédération camerounaise de football fédération camerounaise de football fédération camerounaise de football
update: i've transformed function (forceutf8) family of static functions on class called encoding. new function encoding::toutf8().
Comments
Post a Comment