CSV and Character Encoding - Avoid Character Garbling with utf8 with BOM

Share on Facebook
Post by X

I am the representative, Nishimura! It's been a while since I wrote a blog.
To avoid garbled text in Excel, I added a BOM.

learningBOX is an LMS with multilingual support

Our e-learning system, learningBOX, is a multilingual LMS. learningBOX's UI is currently only available in Japanese and English, but the data of teaching materials and learners' answers are flexible enough to support not only Japanese and English, but also Chinese, Korean, Vietnamese, and other languages around the world. However, the data of teaching materials and learners' answers can be flexibly supported not only in Japanese and English but also in Chinese, Korean, Vietnamese and other languages around the world.
In fact, teaching materials created in each country's language are used in training programs for foreigners.

Multilingual support - e-learning

Multilingual support in CSV was incomplete.

Despite the claim of multilingual support, there was a deficiency in the CSV support. In the currently released version (2.14.28), CSV encoding is fixed to Shift_JIS (Windows-31J).
Therefore, all characters except Japanese, English, and some Chinese and Latin characters will be garbled.

CSV - garbled
Back to Contents

implemented to be able to choose UTF-8

Garbled text - e-learning

There was a proposal to switch the character encoding of CSV output from learningBOX to UTF-8, but we decided that the impact on users would be too great if we suddenly changed the specification, so we made it possible to choose between UTF-8 and Shift_JIS for the character encoding of CSV. (Default is Shift_JIS)

Back to Contents

BOM is added to prevent garbled characters in Excel

When I open a UTF-8 CSV without BOM in Excel, the characters are garbled. BOM stands for "Byte Order Mark" and indicates that the character code of the file is "UTF-8".

Therefore, by using UTF-8 with BOM, you can now open the file in Excel without garbling the characters.

A long time ago, it was common to avoid garbled characters by making it in UTF-16LE, but relatively recent Excel is now better utf8 with BOM. In addition, the character code of the Web world is becoming unified into utf8, considering the ease of handling outside of Excel, I decided to utf8 instead of UTF-16LE.

Reference sites:.How to output Unicode csv that opens correctly in both Win and Mac Excel

Back to Contents

How to add a BOM in PHP

The entity of the BOM is 3 bytes of data and is expressed as "\xEF\xBB\xBF\xBF". In the case of PHP, the BOM can be added by doing the following

$csv = "\xEF\xBB\xBF".$csv;

Back to Contents

September

Depending on the results of the verification, it is expected to be released in an update next week or the week after.
For more information on the status of the learningBOX release, please click here.Release note.

Back to Contents

We are looking for a learningBOX developer!

We are looking for learningBOX developers. We have various openings for backend engineers, frontend engineers, quality assurance engineers, and project managers.
We are looking for development engineers mainly at our headquarters in Tatsu, but we are also actively recruiting infrastructure engineers in Tokyo, so please apply if you are interested, However, if you have experience in Linux operation or have studied Linux or networking at an infrastructure school, you may be welcome!
For more information about employment opportunities at learningBOX, Inc, here.

Back to Contents
Share on Facebook
Post by X
Back to List
How can we help you?