bencoding - Bencoded string length in java -
i bit confused bencoding.
according specification when bencode string need use following format:
length:string
string spam becomes 4:spam
my question: 4 qty of symbols of bencoded string, or qty of utf-8 bytes?
for instance, if going bencode string gâteau
what number should specified length of string?
i think have specify 7, , final form should 7:gâteau
it because symbol â took 2 bytes accoring utf-8 encoding, , rest symbols in string took 1 byte according utf-8 encoding.
also heard not recommended store bencoded data in java string instance.
in other words, when bencode data block, should store byte array , should not convert java string value avoid encoding issues.
are assumptions correct?
according specification, bencoded string sequence of bytes, , have specify qty of bytes sequence it's length.
and, specification: "all character string values utf-8 encoded".
and case "gâteau" should specify 7 length, because character â takes 2 bytes.
Comments
Post a Comment