Node.js StringDecoder 和 Buffer.toString 的区别

2024-11-01 16:29:19
推荐回答(1个)
回答1:

stringDecoder.end([buffer])
Added in: v0.9.3
buffer A Buffer containing the bytes to decode.
Returns any remaining input stored in the internal buffer as a string. Bytes representing incomplete UTF-8 and UTF-16 characters will be replaced with substitution characters appropriate for the character encoding.
If the buffer argument is provided, one final call to stringDecoder.write() is performed before returning the remaining input.
参数buffer的类型是,是一个待解码的缓冲区。
返回任何在内部缓冲区存灶改储的剩余输入字符。剩余的不完整的UTF-8或UTF-16编码的字节将会被替换为合适的字符串编码。
如果有提供buffer参数,那么会调用一次stringDecoder.write(),然后返回剩余的输入字符。
stringDecoder.write(buffer)
Added in: v0.1.99
buffer A Buffer containing the bytes to decode.
Returns a decoded string, ensuring that any incomplete multibyte characters at the end of the Buffer are omitted from the returned string and stored in an internal buffer for the next call to stringDecoder.write() or stringDecoder.end().
参数buffer的类型是,是一个待解码的缓冲区。
返回一个解码后的字符串,确保任何在Buffer末拆掘尾的不完整的多字节字符都将隐御判会从返回的字符串中被省略,并且存储在内部缓冲区中直到下一个stringDecoder.write()或者stringDecoder.end()被调用。
const buf1 = Buffer.from('西山居');
//输出

const buf2 = Buffer.from([0, 0, 0xe8, 0xa5, 0xbf, 0xe5, 0xb1, 0xb1, 0xe5, 0xb1, 0x85]);

const buf3 = Buffer.from([0xe8, 0xa5, 0xbf, 0xe5, 0xb1, 0xb1, 0xe5, 0xb1, 0x85, 0, 0 ]);

const buf4 = Buffer.from([0xe8, 0xa5, 0xbf, 0xe5, 0, 0, 0xb1, 0xb1, 0xe5, 0xb1, 0x85]);

buf1.toString();
//返回 '西山居'
buf2.toString();
//返回 '\u0000\u0000西山居'
buf3.toString();
//返回 '西山居\u0000\u0000'
buf4.toString();
//返回 '西�\u0000\u0000��'

const StringDecoder = require('string_decoder').StringDecoder;
const decoder = new StringDecoder('utf-8');
decoder.write(buf1);
//返回 '西山居'
decoder.write(buf2);
//返回 '\u0000\u0000西山居'
decoder.write(buf3);
//返回 '西山居\u0000\u0000'
decoder.write(buf4);
//返回 '西�\u0000\u0000��'

decoder.end(buf1);
//返回 '西山居'
decoder.end(buf2);
//返回 '\u0000\u0000西山居'
decoder.end(buf3);
//返回 '西山居\u0000\u0000'
decoder.end(buf4);
//返回 '西�\u0000\u0000��'1234567891011121314151617181920212223242526272829303132333435363712345678910111213141516171819202122232425262728293031323334353637

咋一看,StringDecoder和Buffer.toString([encoding])并没有什么区别。但是真正的区别在下面:
When a Buffer instance is written to the StringDecoder instance, an internal buffer is used to ensure that the decoded string does not contain any incomplete multibyte characters. These are held in the buffer until the next call to stringDecoder.write() or until stringDecoder.end() is called.
当一个Buffer实例被写到StringDecoder实例的时候,一个内部的buffer将被用来确保待解码的字符串不会包含任何不完整的多字节字符。它们会被保留在buffer中直到下一个stringDecoder.write()被调用或者stringDecoder.end()被调用。
const StringDecoder = require('string_decoder').StringDecoder;
const decoder = new StringDecoder('utf-8');

decoder.write(Buffer.from([0xe8]));
//返回 ''
decoder.write(Buffer.from([0xa5]));
//返回 ''
decoder.end(Buffer.from([0xbf]));
//返回 '西'