java - What's the difference between a string in the source code and a string read from a file? -


there file named "dd.txt" in disk, it's content \u5730\u7406

now ,when run program

public static void main(string[] args) throws ioexception {     fileinputstream fis=new fileinputstream("d:\\dd.txt");     bytearrayoutputstream baos=new bytearrayoutputstream();     byte[] buffer=new byte[fis.available()];     while ((fis.read(buffer))!=-1) {         baos.write(buffer);     }     string s1="\u5730\u7406";     string s2=baos.tostring("utf-8");     system.out.println("s1:"+s1+"\n"+"s2:"+s2); } 

and got different result

s1:地理 s2:\u5730\u7406 

can tell me why? , how can read file , same result s1 in chinese?

when write \u5730 in java code, it's interpreted single unicode character (a unicode literal) compiler. when write same file, it's 6 regular characters (because there's nothing interpreting it). there reason why you're not writing 地理 directly file?

if wish read file containing unicode literals, you'll need parse values yourself, throwing away \u , parsing unicode codepoint yourself. it's lot easier write proper unicode suitable encoding (e.g. utf-8) in file in first place if control creation of file, , under normal circumstances should never come across files containing these escaped unicode literals.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -