java


How to read and write the non english characters(special character say marathi, tamil,hindi etc) in Java?


Read non-English character from the excel file suppose reading the Marathi language and after that write this language to XML file. When I am reading this Marathi language from excel and inspect in java code its show exactly the Marathi language but after reading when I am writing this to XML through java code its give the some symbols corresponding this Marathi language. So please suggest me how to handle this situation. Please find the attached code for same.
public void excelToXML(String path) {
FileWriter fostream;
PrintWriter out = null;
String strOutputPath = "C:\\Temp\\";
try {
File file = new File(path);
InputStream inputStream = new FileInputStream(file);
Workbook wb = WorkbookFactory.create(inputStream);
List<String> sheetNames = new ArrayList<String>();
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
sheetNames.add(wb.getSheetName(i));
}
fostream = new FileWriter(strOutputPath + "\\" + "iTicker" + ".xml");
out = new PrintWriter(new BufferedWriter(fostream));
// out.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
out.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
out.println("<root xmlns:xsi=\"http://www.w3.org/3921/XMLSchema-instance\">");
for (String sheetName : sheetNames) {
if(sheetName.equals("Sheet3")){
System.out.println(sheetName);
break;
}
Sheet sheet = wb.getSheet(sheetName);
boolean firstRow = true;
ArrayList<String> myStringArray = new ArrayList<String>();
Iterator<Cell> cells = sheet.getRow(0).cellIterator();
while (cells.hasNext()) {
myStringArray.add(cells.next().toString());
}
for (Row row : sheet) {
if (firstRow == true) {
firstRow = false;
continue;
}
if (!sheetName.equals("Sheet1")) {
out.println("\t<element>");
}
for (int i = 0; i < myStringArray.size(); i++) {
if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty()
&& row.getCell(i).toString().length() > 0) {
if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){
out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
} else{
long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString());
out.println(formatElement("\t\t", myStringArray.get(i), String.valueOf(ePochValue)));
}
} else {
blankValues.add(sheetName +":" + "coloumn header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank.");
}
}
if (!sheetName.equals("Sheet1")) {
out.println("\t</element>");
}
}
}
out.write("</root>");
out.flush();
out.close();
if(blankValues != null && blankValues.size() >0){
FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values.");
}
} catch (Exception e) {
new DTHException(e.getMessage());
e.printStackTrace();
}
}
private static String formatCell(Cell cell)
{
if (cell == null) {
return "";
}
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BLANK:
return "";
case Cell.CELL_TYPE_BOOLEAN:
return Boolean.toString(cell.getBooleanCellValue());
case Cell.CELL_TYPE_ERROR:
return "*error*";
case Cell.CELL_TYPE_NUMERIC:
return df.format(cell.getNumericCellValue());
case Cell.CELL_TYPE_STRING:
return cell.getStringCellValue();
default:
return "<unknown value>";
}
}
private static String formatElement(String prefix, String tag, String value) {
StringBuilder sb = new StringBuilder(prefix);
sb.append("<");
sb.append(tag);
if (value != null && value.length() > 0) {
sb.append(">");
sb.append(value);
sb.append("</");
sb.append(tag);
sb.append(">");
} else {
sb.append("/>");
}
return sb.toString();
}
In below line I am getting the exact Marathi value when inspecting this row.getCell(i) value but after writing this value getting the different output.
out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
Your code has two big problems.
1) You're obviously using Windows (path C:\\Temp) but - as Axel Richter already stated in the comment - you are using the default encoding for the output file. Creating a FileWriter directly with a file name gives you the platform's default encoding, which is Windows ANSI for Windows. Not what you want, because later on you write the XML header declaration with UTF-8 as the encoding.
You should never rely on the platform's default encoding. Create the PrintWriter always with explicit encoding via OutputStreamWriter and a FileOutputStream like so:
PrintWriter writer = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("iTicker.xml"), StandardCharsets.UTF_8)));
2) It is bad practice to write XML manually by hand as you do. And if you do, you should take care of special characters like "<", ">" and "&". It is always recommended to use a library for that, which does the escaping automatically. Part of the Java standard library is e.g. an implementation of the interface XMLStreamWriter.
Here an example of how easy it is to use:
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
public class WriteXml {
public static void main(String[] args) {
try {
File outFile = new File("iTicker.xml");
// Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding.
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));
XMLStreamWriter xmlWriter =
XMLOutputFactory.newInstance().createXMLStreamWriter(out);
xmlWriter.writeStartDocument("UTF-8", "1.0");
xmlWriter.writeCharacters("\n");
xmlWriter.writeStartElement("root");
xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance");
xmlWriter.writeCharacters("\n ");
xmlWriter.writeStartElement("element");
// Some special characters and (I hope) some Marathi letters
xmlWriter.writeCharacters("<>&\": मराठी वर्णमाला");
xmlWriter.writeEndElement(); // element
xmlWriter.writeCharacters("\n");
xmlWriter.writeEndElement(); // root
xmlWriter.writeEndDocument();
xmlWriter.close(); // should be better in a finally block
out.close(); // should be better handled automatically by try-with-resources
} catch(Exception e) {
e.printStackTrace();
}
}
}
This creates the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
<element><>&": मराठी वर्णमाला</element>
</root>

Related Links

Guice: Injecting a Module and then installing it
How do I get a java swing combobox to display a column from an array instead of what it is displaying?
Android API 21 only allows creation of empty activity
Java locks with synchronized keyword
Loop through the array that was passed into he method. Display the values in the array?
Hibernate envers - fetch latest data for non-audited and historical data for audited fields
create instance of Subclass in java
How to integrate floating numbers into a java server calculator?
How to create Dynamic Web Project using Bndtools OSGi Project.
How to increase heap space in JCreator
ArrayList of arrays get index
Spring Bean created before AutowiredAnnotationBeanPostProcessor added to AbstractBeanFactory.beanPostProcessors
terminal - run java app in background and how to close it?
Loop hole in loop's logic
Is it ok to use empty methods as “semi-abstract” method? (PhP)
how to prevent user login from two ip address? [duplicate]

Categories

HOME
oracle11g
smarty
alfresco
setup-deployment
filtering
webpack-2
retrofit
directx
vault
systemc
ab-testing
jqwidget
google-pagespeed
physics-engine
php-openssl
restful-authentication
nhapi
rst2pdf
applozic
saas
vision
google-cloud-endpoints-v2
twilio-api
create-table
titanium-mobile
gammu
nssegmentedcontrol
simplexml
scaffold
abstract-class
hue
sas-visual-analytics
ensembles
galen
node-gyp
sencha-touch-2.3
cubic-spline
upstart
errordocument
jlink
gabor-filter
diagnostics
dtexec
spring-mongodb
titanium-android
time-and-attendance
passport-google-oauth
ado.net-entity-data-model
pcf
hill-climbing
ios4
tform
natvis
cartesian-product
mutation-observers
dukescript
ideamart
skos
directoryservices
fscommand
cctv
nsbutton
internet-connection
document-classification
sgen
p4java
resty-gwt
centos5
viadeo
industrial
commoncrypto
dataservice
flash-builder4.5
buster.js
yui-compressor
jquery-mobile-dialog
hamiltonian-cycle
datawindow
tridion-worldserver
dotnethighcharts
work-stealing
netdna-api
imac
cxxtest
paster
web-application-design
ext3
castle-monorail
sef
simpletest
phonon
ugc
lzh

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App