Sina Show Censorship Research

Jeffrey Knockel
Sina Show censorship warning
“系统过滤,你发送的信息含有非法字符,请重新输入!” (System filter, your message contains illegal characters, please re-enter!)

Check out the...

Censorship Analysis

Sina Show 3.4 has keywords built into many of its binary files, stored in the plain GBK-encoded:

Note that SinaShow.exe's, ChatRoom.dll's, and Props.dll's lists are identical. Although these keywords are stored adjacently in each binary file, they may belong to separate categories originally intended for separate purposes (e.g., censoring usernames versus censoring chat). The only keywords that appear referenced in the program code are 108 keywords in SinaShow.exe (corresponding to lines 1114-1221, inclusive, from SinaShow.exe's list downloadable above), which are used to censor chat messages. They are also available separately below:

The remaining keywords appear to be presently unused.

Sina Show 3.4 also comes installed with a file named Word_410.ucw, which it downloads updates for from http://www.51uc.com/uc_interface/down_policy/Word_410.ucw. This file is a custom binary container storing sensitive GBK-encoded keywords encrypted with a non-standard implementation of Blowfish in ECB mode with an 8-byte key 'Dey,1blE'. Each keyword in this file has a category number, but only words in category 5 are presently being used, which are being used to censor chat messages:

Code

decrypt.py is a python script that will extract and decrypt keywords from Word_410.ucw.