java


Java Replace Unicode Characters in a String


I have a string which contains multiple unicode characters. I want to identify all these unicode characters, ex: \ uF06C, and replace it with a back slash and four hexa digits without "u" in it.
Example:
Source String: "add \uF06Cd1 Clause"
Result String: "add \F06Cd1 Clause"
How can achieve this in Java?
Edit:
Question in link Java Regex - How to replace a pattern or how to is different from this as my question deals with unicode character. Though it has multiple literals, it is considered as one single character by jvm and hence regex won't work.
The correct way to do this is using a regex to match the entire unicode definition and use group-replacement.
The regex to match the unicode-string:
A unicode-character looks like \uABCD, so \u, followed by a 4-character hexnumber string. Matching these can be done using
\\u[A-Fa-f\d]{4}
But there's a problem with this:
In a String like "just some \\uabcd arbitrary text" the \u would still get matched. So we need to make sure the \u is preceeded by an even number of \s:
(?<!\\)(\\\\)*\\u[A-Fa-f\d]{4}
Now as an output, we want a backslash followed by the hexnum-part. This can be done by group-replacement, so let's get start by grouping characters:
(?<!\\)(\\\\)*(\\u)([A-Fa-f\d]{4})
As a replacement we want all backlashes from the group that matches two backslashes, followed by a backslash and the hexnum-part of the unicode-literal:
$1\\$3
Now for the actual code:
String pattern = "(?<!\\\\)(\\\\\\\\)*(\\\\u)([A-Fa-f\\d]{4})";
String replace = "$1\\\\$3";
Matcher match = Pattern.compile(pattern).matcher(test);
String result = match.replaceAll(replace);
That's a lot of backslashes! Well, there's an issue with java, regex and backslash: backslashes need to be escaped in java and regex. So "\\\\" as a pattern-string in java matches one \ as regex-matched character.
EDIT:
On actual strings, the characters need to be filtered out and be replaced by their integer-representation:
StringBuilder sb = new StringBuilder();
for(char c : in.toCharArray())
if(c > 127)
sb.append("\\").append(String.format("%04x", (int) c));
else
sb.append(c);
This assumes by "unicode-character" you mean non-ASCII-characters. This code will print any ASCII-character as is and output all other characters as backslash followed by their unicode-code. The definition "unicode-character" is rather vague though, as char in java always represents unicode-characters. This approach preserves any control-chars like "\n", "\r", etc., which is why I chose it over other definitions.
Try using String.replaceAll() method
s = s.replaceAll("\u", "\");

Related Links

Android Retrieving JSON Object from URL
How to implement a sticky row in a recyclerview
Connection to DB2400 Database from Netbeans
There is no Action mapped for namespace [/] and action name [loginauth] associated with context path [/Login_Reg]
Processing - frameRate decimals
Sort java object based on a key
looping load data from database
How to run two routes parallel at start in Apache Camel on Spring?
Scanner reads after loop ends
NullPointerException: Attempt to invoke virtual method 'android.view.View android.widget.RelativeLayout.findViewById(int)' on a null object reference [duplicate]
using another class in a camelprocessor
Register page with Spring Security
Running JAR file in C# without providing path to the PROCESS
JAVA: apply width and height to fragment
Initialize a hashmap in a more compact way without external libraries
Setting Image Anchor properties of image using jxl

Categories

HOME
yii2
jdo
proxy
cookies
magnific-popup
gremlin
google-api-php-client
view
webstorm
spring-jdbc
indesign
podio
autotools
android-youtube-api
iggrid
usergrid
commonmark
msp430
telephony
highlight.js
ef-migrations
pythonanywhere
visual-composer
swingx
plunker
physics-engine
functional-testing
dcevm
icloud-api
brunch
quote
phonegap
airconsole
copying
calibre
jmonkeyengine
ios5
dartium
swift3.0.2
sql-server-agent
estimote
keydown
no-www
eclipse-gef
thin
angular-resource
ionicons
pebble-watch
rdfs
bind9
webdriverjs
crosswalk-runtime
radtreelist
py2app
pagedlist
prettytensor
pickadate
pagerank
netmq
iad
srand
sdhc
drawbitmap
appfabric-cache
operation
nsight
sniffer
npapi
block-device
android-nested-fragment
wordpress-theme-customize
rebol3
balanced-payments
ms-project-server-2010
commoncrypto
bulkloader
eventual-consistency
mqx
spring-validator
ocunit
gil
uiviewanimation-curve
subscript
eclipse-templates
gcj
file-comparison
inotifycollectionchanged
iphone-web-app
lpeg
firefox-5
exchange-server-2003
pascal-fc
geneva-server
.net-1.0
lzh
misv

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App