php - Detect encoding and make everything UTF-8 -


i'm reading out lots of texts various rss feeds , inserting them database.

of course, there several different character encodings used in feeds, e.g. utf-8 , iso-8859-1.

unfortunately, there problems encodings of texts. example:

1) "ß" in "fußball" should in database: "Ÿ". if "Ÿ", displayed correctly.

2) sometimes, "ß" in "fußball" looks in database: "ß". displayed wrongly, of course.

3) in other cases, "ß" saved "ß" - without change. displayed wrongly.

what can avoid cases 2 , 3?

how can make same encoding, preferably utf-8? when must use utf8_encode(), when must use utf8_decode() (it's clear effect when must use functions?) , when must nothing input?

can me , tell me how make same encoding? perhaps function mb-detect-encoding()? can write function this? problems are: 1) how find out encoding text uses 2) how convert utf-8 - whatever old encoding is

edit: function work?

function correct_encoding($text) {     $current_encoding = mb_detect_encoding($text, 'auto');     $text = iconv($current_encoding, 'utf-8', $text);     return $text; } 

i've tested doesn't work. what's wrong it?

if apply utf8_encode() utf8 string return garbled utf8 output.

i made function addresses issues. it´s called encoding::toutf8().

you dont need know encoding of strings is. can latin1 (iso 8859-1), windows-1252 or utf8, or string can have mix of them. encoding::toutf8() convert utf8.

i did because service giving me feed of data messed up, mixing utf8 , latin1 in same string.

usage:

require_once('encoding.php');  use \forceutf8\encoding;  // it's namespaced now.  $utf8_string = encoding::toutf8($utf8_or_latin1_or_mixed_string);  $latin1_string = encoding::tolatin1($utf8_or_latin1_or_mixed_string); 

download:

https://github.com/neitanod/forceutf8

update:

i've included function, encoding::fixuft8(), fix every utf8 string looks garbled.

usage:

require_once('encoding.php');  use \forceutf8\encoding;  // it's namespaced now.  $utf8_string = encoding::fixutf8($garbled_utf8_string); 

examples:

echo encoding::fixutf8("fédération camerounaise de football"); echo encoding::fixutf8("fédération camerounaise de football"); echo encoding::fixutf8("fÃÂédÃÂération camerounaise de football"); echo encoding::fixutf8("fédération camerounaise de football"); 

will output:

fédération camerounaise de football fédération camerounaise de football fédération camerounaise de football fédération camerounaise de football 

update: i've transformed function (forceutf8) family of static functions on class called encoding. new function encoding::toutf8().


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -