CSV and Character Encoding - Avoid Character Garbling with utf8 with BOM
I am the representative, Nishimura! It's been a while since I wrote a blog.
To avoid garbled text in Excel, I added a BOM.
learningBOX is an LMS with multilingual support
Our e-learning system, learningBOX, is a multilingual LMS. learningBOX's UI is currently only available in Japanese and English, but the data of teaching materials and learners' answers are flexible enough to support not only Japanese and English, but also Chinese, Korean, Vietnamese, and other languages around the world. However, the data of teaching materials and learners' answers can be flexibly supported not only in Japanese and English but also in Chinese, Korean, Vietnamese and other languages around the world.
In fact, teaching materials created in each country's language are used in training programs for foreigners.
Multilingual support in CSV was incomplete.
Despite the claim of multilingual support, there was a deficiency in the CSV support. In the currently released version (2.14.28), CSV encoding is fixed to Shift_JIS (Windows-31J).
Therefore, all characters except Japanese, English, and some Chinese and Latin characters will be garbled.
implemented to be able to choose UTF-8
There was a proposal to switch the character encoding of CSV output from learningBOX to UTF-8, but we decided that the impact on users would be too great if we suddenly changed the specification, so we made it possible to choose between UTF-8 and Shift_JIS for the character encoding of CSV. (Default is Shift_JIS)
Back to ContentsBOM is added to prevent garbled characters in Excel
When I open a UTF-8 CSV without BOM in Excel, the characters are garbled. BOM stands for "Byte Order Mark" and indicates that the character code of the file is "UTF-8".
Therefore, by using UTF-8 with BOM, you can now open the file in Excel without garbling the characters.
A long time ago, it was common to avoid garbled characters by making it in UTF-16LE, but relatively recent Excel is now better utf8 with BOM. In addition, the character code of the Web world is becoming unified into utf8, considering the ease of handling outside of Excel, I decided to utf8 instead of UTF-16LE.
Reference sites:.How to output Unicode csv that opens correctly in both Win and Mac Excel
Back to ContentsHow to add a BOM in PHP
The entity of the BOM is 3 bytes of data and is expressed as "\xEF\xBB\xBF\xBF". In the case of PHP, the BOM can be added by doing the following
$csv = "\xEF\xBB\xBF".$csv;
September
Depending on the results of the verification, it is expected to be released in an update next week or the week after.
For more information on the status of the learningBOX release, please click here.Release note.
We are looking for a learningBOX developer!
We are looking for learningBOX developers. We have various openings for backend engineers, frontend engineers, quality assurance engineers, and project managers.
We are looking for development engineers mainly at our headquarters in Tatsu, but we are also actively recruiting infrastructure engineers in Tokyo, so please apply if you are interested, However, if you have experience in Linux operation or have studied Linux or networking at an infrastructure school, you may be welcome!
For more information about employment opportunities at learningBOX, Inc, here.