java - What's the difference between a string in the source code and a string read from a file? -
there file named "dd.txt" in disk, it's content \u5730\u7406
now ,when run program
public static void main(string[] args) throws ioexception { fileinputstream fis=new fileinputstream("d:\\dd.txt"); bytearrayoutputstream baos=new bytearrayoutputstream(); byte[] buffer=new byte[fis.available()]; while ((fis.read(buffer))!=-1) { baos.write(buffer); } string s1="\u5730\u7406"; string s2=baos.tostring("utf-8"); system.out.println("s1:"+s1+"\n"+"s2:"+s2); }
and got different result
s1:地理 s2:\u5730\u7406
can tell me why? , how can read file , same result s1 in chinese?
when write \u5730
in java code, it's interpreted single unicode character (a unicode literal) compiler. when write same file, it's 6 regular characters (because there's nothing interpreting it). there reason why you're not writing 地理
directly file?
if wish read file containing unicode literals, you'll need parse values yourself, throwing away \u
, parsing unicode codepoint yourself. it's lot easier write proper unicode suitable encoding (e.g. utf-8) in file in first place if control creation of file, , under normal circumstances should never come across files containing these escaped unicode literals.
Comments
Post a Comment