Strip /uXXXX From String and Replace it With the Correct Unicode Character

About a month ago, when reading DBPedia data into a database, I discovered ‘/uXXXX’ appearing where pretty unicode characters should be within my strings. The strings were to be compared to … other strings, which would have the proper unicode characters, so I had to replace the ‘/uXXXX’ in my strings. I couldn’t find a class to do this, but found enough information to understand what needed to be done.

The below function is what I came up with.

/**
 * Strips /uXXXX from a string and replaces it with the correct unicode character (for example: '\u1E09')
 * 
 * @param slashed string containing '/uXXXX' to be replaced with their Unicode characters
 * @return Unicode string with '/uXXXX' converted into Unicode.
 * @author Michael Robinson mike@pagesofinterest.net
 */
public String unslashUnicode(String slashed){
 
	ArrayList<String> pieces = new ArrayList<String>();
 
	while(true){//while there is /uXXXX in the string
 
		if(slashed.contains("\\u")){
 
			pieces.add(slashed.substring(0,slashed.indexOf("\\u")));//add the bit before the /uXXXX
 
			char c = (char) Integer.parseInt(slashed.substring(slashed.indexOf("\\u")+2,slashed.indexOf("\\u")+6), 16);
 
			slashed = slashed.substring(slashed.indexOf("\\u")+6,slashed.length());
 
			pieces.add(c+"");//add the  unicode
		}
		else{
			break;
		}
	}
	String temp = "";
 
	for(String s : pieces){
		temp = temp + s;//put humpty dumpty back together again
	}
	slashed = temp + slashed;
 
	return slashed;
}

Note that my strings only ever contained unicode slashed as ‘/uXXX’, never as ‘/UXXXX’. The above class, therefore, will need some modification if it is to be used with capital ‘u’ slashed unicode characters.

Like this post? Move it on along with:

email Email | delicious delicious | digg Digg | Tweet this post Tweet | reddit Reddit | newsvine Newsvine | furl Furl | google Google | StumbleUpon Stumble | Hao Hao HaoHao


Trackback:

Comments: 0 | Comments Feed


Scroll to top

Related posts:

  1. URL, Base64, Character, XML and ECMAScript Conversion Scroll to comments This tool helped me convert rubbish characters (e.g. ♫) to proper strings when I accidentally broke my Wordpress Database’s collation. It also converts URL, Base64, XML and ECMAScript strings. In short, it is a lifesaver. Coder’s Toolbox – Online String Converter. Like this post? Move it on along with: Email | [...]...
  2. Perl Script to Insert DBpedia Infobox Data into a MySQL Database This script parses out the Wikipedia page, DBPedia Infobox Predicate and Infobox subject, and inserts them into a MySQL table. I thought I'd share it with The Internet in case someone else wanted to work with DBPedia infobox data in the same way....
  3. Find and Replace Text Within Multiple Files in Linux – Avoid RSI After updating 100+ pages manually, I realized that I had neglected to add "index.php" to the end of certain links. Usually this would be fine, but the links in question are opened in Shadowbox, which will fail on pretty, "index.php"-less links....
  4. Return an NSMutableString as NSString Avoiding “Uncaught Error 11″ with Cocoa Scroll to comments Another stumbling block on the road to Slider completion was this: NSUncaughtSystemExceptionException — Uncaught system exception: signal 11 This vague and unhelpful error message (in this case) was caused by my trying to return an NSMutableString in place of an NSString: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16...
  5. Using PNG Transparency + the jQuery Colour Change Plugin A tutorial describing how to achieve the "colour change on roll-over" effect used in this site's RSS and Twitter links in the sidebar....

No commentsTrackback

Comments are closed.