• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Character Encoding issue with Tomcat 5.5.9

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
These are the steps that I followed to implement character encoding.

We are upgrading the application to support data entered in French.
The application is supposed to store and retrieve data entered in French.
French Alphabets � Even Before making any changes for character encoding , Except the chars � � and � ,everything was stored and retrieved properly
French AlphabetsA a <code>(� �, � �), (� �), B b, C c (� �), D d, E e (� �, � �, � �,
� �), F f, G g, H h, I i (� �, � �), J j, K k, L l, M m, N n
(� �), O o (� �), (� �), P p, Q q, S s, T t, ),
V v, W w, X x, (� �), Z </code>

When the control passes to servlet from the JSP, the chars � � and � are not preserved. They get passed as rectangles. On storing to the database they get stored as inverted question marks.

To handle this, I followed the following steps:

1.The following servlets are added.
<code>
import java.io.IOException;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;

public class UTF8EncodingFilter implements javax.servlet.Filter
{
public void init( FilterConfig filterConfig ) throws ServletException
{
// This would be a good place to collect a parameterized
// default encoding type. For brevity, we're going to
// use a hard-coded value in this example.
}
public void doFilter( ServletRequest request,
ServletResponse response,
FilterChain filterChain )
throws IOException, ServletException
{
// Wrap the response object. You should create a mechanism
// to ensure the response object only gets wrapped once.
// In this example, the response object will inappropriately
// get wrapped multiple times during a forward.
response = new UTF8EncodingServletResponse((HttpServletResponse)response );
// Specify the encoding to assume for the request so
// the parameters can be properly decoded/.
request.setCharacterEncoding( "UTF-8" );
response.setContentType("UTF-8");
System.out.println("UTF8EncodingFilter : doFilter() -> Both request &reponse are in UTF-8 Format.");
filterChain.doFilter( request, response );
}
public void destroy()
{
// no-op
}
}

------------------------------------------------------------------------------------------------------------

/*
* Created on Oct 30, 2007
*
* To change the template for this generated file go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/


import javax.servlet.http.HttpServletResponse;

public class UTF8EncodingServletResponse
extends javax.servlet.http.HttpServletResponseWrapper
{
private boolean encodingSpecified = false;
public UTF8EncodingServletResponse( HttpServletResponse response )
{
super( response );
}
public void setContentType( String type )
{
String explicitType = type;
// If a specific encoding has not already been set by the app,
// let's see if this is a call to specify it. If the content
// type doesn't explicitly set an encoding, make it UTF-8.
if (!encodingSpecified)
{
String lowerType = type.toLowerCase();
// See if this is a call to explicitly set the character encoding.
if (lowerType.indexOf( "charset" ) < 0)
{
// If no character encoding is specified, we still need to
// ensure the app is specifying text content.
if (lowerType.startsWith( "text/" ))
{
// App is sending a text response, but no encoding
// is specified, so we'll force it to UTF-8.
explicitType = type + "; charset=UTF-8";
}
}
else
{
// App picked a specific encoding, so let's make
// sure we don't override it.
encodingSpecified = true;
}
}
// Delegate to supertype to record encoding.
super.setContentType( explicitType );
}
}

</code>
WEB.XML changes
-----------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app id="WebApp">
<display-name>App Name</display-name>
<filter>
<filter-name>UTF8 Filter</filter-name>
<filter-class>com.vie.remoteDiagnostics.view.servlets.UTF8EncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>

<filter-mapping>
<filter-name>UTF8 Filter</filter-name>
<servlet-name>UploadServlet</servlet-name>
</filter-mapping>


-UploadServlet is the servlet to which the request is passed on submit of the input page.

-Instead of <servlet-name>UploadServlet</servlet-name> if <url-patter>/*</url-pattern> is used, then the chars are getting converted to format like this �? ��), F f, G g, H h, I i (�? ��, �? ��), J j, K k, L l, M m, N n
-(�? ��), O o (�? ��), (�? �?),

JSP page changes:
-----------------

<HEAD>
<META http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
(The above tag is added only on the page where the data is displayed. Not on the page where the user provides the input. if this include tag is provided on the input page, the entered characters are getting converted to same format as given above like �? ��), O o (�? ��), (�? �. )

Tomcat: 5.5.9 changes
---------------------

In Server.xml in conf directory

The URIEncoding="UTF-8" is added:

<Connector port="8080" maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8" />

<Connector port="8009" enableLookups="false" redirectPort="8443" uriencoding="UTF-8" protocol="AJP/1.3" />


Copied �Catalina.bat� and Catalina.xml files to the bin folder coz there where not there initially in Tomcat 5.5.9.

-Dfile.encoding=UTF-8 is added to the Catalina.bat as below

%_EXECJAVA% %JAVA_OPTS% %CATALINA_OPTS% %DEBUG_OPTS% -Dfile.encoding=UTF-8 -Djava.endorsed.dirs="%JAVA_ENDORSED_DIRS%" -classpath "%CLASSPATH%" -Dcatalina.base="%CATALINA_BASE%" -Dcatalina.home="%CATALINA_HOME%" -Djava.io.tmpdir="%CATALINA_TMPDIR%" %MAINCLASS% %CMD_LINE_ARGS% %ACTION%
goto end

So after making all these changes, still the characters � � and � are getting stored as inverted question marks.

When I searched in Tomcat bug list, it says that there is a patch (http://issues.apache.org/jira/browse/OFBIZ-281)that has to be applied for encoding in Tomcat 5.5.9. but the patch points to the CatalinaContainer.java class that is in the OFBiz framework (opensource framework ). I could'nt see this class in Tomcat 5.5.9 installation.

It would be really great if someone could throw some light on this.
Is there anything wrong with the steps followed? or
Is this really a Tomcat 5.5.9 issue.
Please help
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi !

I'm french,and i had the same problem with my own language ...
to get the � and � character working i used a filter

org.springframework.web.filter.CharacterEncodingFilter

The French encoding is ISO-8859-15 , to get this character working.

but, i had to implement my own filter to do this : ( http://java.sun.com/products/servlet/Filters.html )



to get the special character working on POST action.

Hope it will help !
[ December 13, 2007: Message edited by: NhyMbuS ]
 
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am having problems with the following 4 characters:

� � � �. When these characters are passed as an input in the textbox and it goes to oracle its stored as S R x and a box character respectively. When it is retrieved also it is retrieved in the same format.

I have the filter already in place that sets the request type as UTF-8, what I do not have however is the response.setContentType.

My question here is will UTF-8 work or do I have to change the format as ISO-8859-15 for these characters alone. It would be a problem for me since other characters are working fine with UTF-8. Any help would be greatly appreciated.
 
Ranch Hand
Posts: 473
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The solution is to put

<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

in your form jsps....

and put attribute

URIEncoding="UTF-8"

in the <Connector port="8080" ....... /> tag of the server.xml file of tomcat.


Thanks


Maki Java
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic