java


Java Replace Unicode Characters in a String


I have a string which contains multiple unicode characters. I want to identify all these unicode characters, ex: \ uF06C, and replace it with a back slash and four hexa digits without "u" in it.
Example:
Source String: "add \uF06Cd1 Clause"
Result String: "add \F06Cd1 Clause"
How can achieve this in Java?
Edit:
Question in link Java Regex - How to replace a pattern or how to is different from this as my question deals with unicode character. Though it has multiple literals, it is considered as one single character by jvm and hence regex won't work.
The correct way to do this is using a regex to match the entire unicode definition and use group-replacement.
The regex to match the unicode-string:
A unicode-character looks like \uABCD, so \u, followed by a 4-character hexnumber string. Matching these can be done using
\\u[A-Fa-f\d]{4}
But there's a problem with this:
In a String like "just some \\uabcd arbitrary text" the \u would still get matched. So we need to make sure the \u is preceeded by an even number of \s:
(?<!\\)(\\\\)*\\u[A-Fa-f\d]{4}
Now as an output, we want a backslash followed by the hexnum-part. This can be done by group-replacement, so let's get start by grouping characters:
(?<!\\)(\\\\)*(\\u)([A-Fa-f\d]{4})
As a replacement we want all backlashes from the group that matches two backslashes, followed by a backslash and the hexnum-part of the unicode-literal:
$1\\$3
Now for the actual code:
String pattern = "(?<!\\\\)(\\\\\\\\)*(\\\\u)([A-Fa-f\\d]{4})";
String replace = "$1\\\\$3";
Matcher match = Pattern.compile(pattern).matcher(test);
String result = match.replaceAll(replace);
That's a lot of backslashes! Well, there's an issue with java, regex and backslash: backslashes need to be escaped in java and regex. So "\\\\" as a pattern-string in java matches one \ as regex-matched character.
EDIT:
On actual strings, the characters need to be filtered out and be replaced by their integer-representation:
StringBuilder sb = new StringBuilder();
for(char c : in.toCharArray())
if(c > 127)
sb.append("\\").append(String.format("%04x", (int) c));
else
sb.append(c);
This assumes by "unicode-character" you mean non-ASCII-characters. This code will print any ASCII-character as is and output all other characters as backslash followed by their unicode-code. The definition "unicode-character" is rather vague though, as char in java always represents unicode-characters. This approach preserves any control-chars like "\n", "\r", etc., which is why I chose it over other definitions.
Try using String.replaceAll() method
s = s.replaceAll("\u", "\");

Related Links

Processing - BufferedOutputStream exception
How to forward from one java servlet to another in same session
Get Value from Key, Value is a List
Programmatically clear cached background process
Custom unchecked exception doesn't make throwing and caller methods return [closed]
Differentiate in JAXB between optional enum field non present in a SOAP message and optional enum field present in a SOAP message with wrong value
How to draw smooth continuous line in java swing that also varies in Size?
Determining Consecutive Numbers in a 2D Integer array
Obtaining the powerset of set with subsets of a certain size in Java
Can I start a HTML element in one jsp page, close it in another?
(Java) Printing text letter by letter in console- ft. Lag
Why am I getting Dark theme when I have declared it to be light?
how readUTF() method of DataInputStream finish form Keyboard?
How can I initialize interdependent final references?
Beginner: Min Value in Array (Java)
Matrix Multiplication using different classes - Java

Categories

HOME
ibm-bluemix
hook
gerrit
cookies
magnific-popup
homebrew
q
jgroups
yum
constraint-programming
slick-slider
python-unittest
fancybox-3
quicklook
quickbooks
collectd
iolanguage
libtiff
invantive-sql
connection-string
firefox-webextensions
swiftlint
physics-engine
dcevm
claims-based-identity
shopware
fish
centos6.5
dynamic-featured-image
bootstrap-material-design
ejabberd-module
facebook-apps
c++-amp
quote
binary-data
google-sites-2016
repo
normal-distribution
wpf-controls
bytecode-manipulation
android-ble
retina-display
hockeyapp
dynamics-crm-2013
abstract-class
starteam
twitch
powershell-dsc
nand2tetris
isbn
checkboxlist
s
btrace
wptoolkit
wso2carbon
jquery-nestable
ionicons
jlink
sqlclient
snmptrapd
linode
jspdf-autotable
ableton-live
rotativa
android-textview
elements
intrusion-detection
phishing
react-native-listview
moveit
ubuntu-10.04
phpcas
django-unittest
iiviewdeckcontroller
geonetwork
relocation
issuu
bitcoinj
dml
rebol3
bundles
ember-charts
ivyde
dir
soundtouch
dataservice
jelly
typoscript2
bluepill
pysimplesoap
tridion-worldserver
errai
fireworks
tomcat-valve
appendto
mongomapper
subviews
uimenucontroller
zune
genealogy
multiple-languages

Resources

Database Users
RDBMS discuss
Database Dev&Adm
javascript
java
csharp
php
android
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App