java


How to read and write the non english characters(special character say marathi, tamil,hindi etc) in Java?


Read non-English character from the excel file suppose reading the Marathi language and after that write this language to XML file. When I am reading this Marathi language from excel and inspect in java code its show exactly the Marathi language but after reading when I am writing this to XML through java code its give the some symbols corresponding this Marathi language. So please suggest me how to handle this situation. Please find the attached code for same.
public void excelToXML(String path) {
FileWriter fostream;
PrintWriter out = null;
String strOutputPath = "C:\\Temp\\";
try {
File file = new File(path);
InputStream inputStream = new FileInputStream(file);
Workbook wb = WorkbookFactory.create(inputStream);
List<String> sheetNames = new ArrayList<String>();
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
sheetNames.add(wb.getSheetName(i));
}
fostream = new FileWriter(strOutputPath + "\\" + "iTicker" + ".xml");
out = new PrintWriter(new BufferedWriter(fostream));
// out.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
out.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
out.println("<root xmlns:xsi=\"http://www.w3.org/3921/XMLSchema-instance\">");
for (String sheetName : sheetNames) {
if(sheetName.equals("Sheet3")){
System.out.println(sheetName);
break;
}
Sheet sheet = wb.getSheet(sheetName);
boolean firstRow = true;
ArrayList<String> myStringArray = new ArrayList<String>();
Iterator<Cell> cells = sheet.getRow(0).cellIterator();
while (cells.hasNext()) {
myStringArray.add(cells.next().toString());
}
for (Row row : sheet) {
if (firstRow == true) {
firstRow = false;
continue;
}
if (!sheetName.equals("Sheet1")) {
out.println("\t<element>");
}
for (int i = 0; i < myStringArray.size(); i++) {
if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty()
&& row.getCell(i).toString().length() > 0) {
if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){
out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
} else{
long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString());
out.println(formatElement("\t\t", myStringArray.get(i), String.valueOf(ePochValue)));
}
} else {
blankValues.add(sheetName +":" + "coloumn header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank.");
}
}
if (!sheetName.equals("Sheet1")) {
out.println("\t</element>");
}
}
}
out.write("</root>");
out.flush();
out.close();
if(blankValues != null && blankValues.size() >0){
FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values.");
}
} catch (Exception e) {
new DTHException(e.getMessage());
e.printStackTrace();
}
}
private static String formatCell(Cell cell)
{
if (cell == null) {
return "";
}
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BLANK:
return "";
case Cell.CELL_TYPE_BOOLEAN:
return Boolean.toString(cell.getBooleanCellValue());
case Cell.CELL_TYPE_ERROR:
return "*error*";
case Cell.CELL_TYPE_NUMERIC:
return df.format(cell.getNumericCellValue());
case Cell.CELL_TYPE_STRING:
return cell.getStringCellValue();
default:
return "<unknown value>";
}
}
private static String formatElement(String prefix, String tag, String value) {
StringBuilder sb = new StringBuilder(prefix);
sb.append("<");
sb.append(tag);
if (value != null && value.length() > 0) {
sb.append(">");
sb.append(value);
sb.append("</");
sb.append(tag);
sb.append(">");
} else {
sb.append("/>");
}
return sb.toString();
}
In below line I am getting the exact Marathi value when inspecting this row.getCell(i) value but after writing this value getting the different output.
out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
Your code has two big problems.
1) You're obviously using Windows (path C:\\Temp) but - as Axel Richter already stated in the comment - you are using the default encoding for the output file. Creating a FileWriter directly with a file name gives you the platform's default encoding, which is Windows ANSI for Windows. Not what you want, because later on you write the XML header declaration with UTF-8 as the encoding.
You should never rely on the platform's default encoding. Create the PrintWriter always with explicit encoding via OutputStreamWriter and a FileOutputStream like so:
PrintWriter writer = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("iTicker.xml"), StandardCharsets.UTF_8)));
2) It is bad practice to write XML manually by hand as you do. And if you do, you should take care of special characters like "<", ">" and "&". It is always recommended to use a library for that, which does the escaping automatically. Part of the Java standard library is e.g. an implementation of the interface XMLStreamWriter.
Here an example of how easy it is to use:
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
public class WriteXml {
public static void main(String[] args) {
try {
File outFile = new File("iTicker.xml");
// Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding.
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));
XMLStreamWriter xmlWriter =
XMLOutputFactory.newInstance().createXMLStreamWriter(out);
xmlWriter.writeStartDocument("UTF-8", "1.0");
xmlWriter.writeCharacters("\n");
xmlWriter.writeStartElement("root");
xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance");
xmlWriter.writeCharacters("\n ");
xmlWriter.writeStartElement("element");
// Some special characters and (I hope) some Marathi letters
xmlWriter.writeCharacters("<>&\": मराठी वर्णमाला");
xmlWriter.writeEndElement(); // element
xmlWriter.writeCharacters("\n");
xmlWriter.writeEndElement(); // root
xmlWriter.writeEndDocument();
xmlWriter.close(); // should be better in a finally block
out.close(); // should be better handled automatically by try-with-resources
} catch(Exception e) {
e.printStackTrace();
}
}
}
This creates the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
<element><>&": मराठी वर्णमाला</element>
</root>

Related Links

Declaring the field of a class without prefixing it with the class
ElementCollection persist to database despite rollback transaction
Issues with Hibernate Validator after upgrading from Hibernate 3 to 4 with Spring 4
How to inflate new layout on OnRecievedError in Webview?
Java 1.7 allows integer values with '_', why not while parsing Integer [closed]
Generate one xsd and one wsdl file
Trying to write a simple program in java to calculate shipping price using else if statements and i'm thoroughly lost
javax.crypto.BadPaddingException: error:1e06b065:Cipher functions:EVP_DecryptFinal_ex:BAD_DECRYPT exception
how to get thread dump of a process running different VM through java code
could not get answer for particular situaition 415 (unsupported media type) angularjs post
How to restrict user login based on location latitude longitude
Running Threads at the same time in some cases and one after another in another case
How stores objects update view in flux pattern
Is there an API-method to reverse an array [duplicate]
Does Apache Strom field grouping work across multiple workers?
JNI error has occurred when running java with OpenCV from terminal

Categories

HOME
testing
multithreading
blogger
debugging
image-processing
lodash
mean-stack
callback
datastax-java-driver
volttron
spring-xd
alignment
quickfix
ghc
angular-ui
reactcsstransitiongroup
shared-hosting
introduction
cx-freeze
orleans
arabic
ehcache
language-agnostic
autosys
minitab
gsmcomm
lightswitch-2013
fabric8
.net-4.0
jmonkeyengine
restlet
space-complexity
service-discovery
firebase-admin
http-redirect
segment
scorm
avconv
worksheet
sage-one
dropbox-php
yii2-extension
google-perftools
r-forge
minimization
smart-table
savon
file-writing
phishing
asp.net-4.5
gridview-sorting
rvest
icu4j
mutation-observers
wireshark-dissector
android-listview
service-accounts
p2
umbraco6
c3
cctv
code-access-security
sgen
terminal-services
xsockets.net
odata4j
bulkloader
free-variable
chuck
frameset
appconkit
work-stealing
xmlspy
oncheckedchanged
site.master
wise
revisions
ext3
castle-monorail
compiler-specific
virtual-functions
zend-decorators
thread-local-storage
defensive-programming
windows-live-messenger
lzh

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App