• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

db2 port

 
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, I'm posting anonymously because I'm having a lot of trouble logging in right now. I even made a new account, quit my browser, and then tried logging in, and had no luck. Also, your "change password" component seems to ask for the user's e-mail address once the notification link is clicked... shouldn't that be predetermined by the link the user clicks in the notification e-mail?

Anyway, I'm writing because I'm a developer with a web development group at a US university that's looking at using JForum as a component for intranets and extranets we develop for our clients at the university. We primarily use DB2 as our database, so I'm partway through doing a naive implementation of a DB2 port for Jforum. I say 'naive implementation,' because I'm basically copying your PostgreSQL stuff and changing the syntax to be DB2-friendly.

One question I had was whether you're conducting actions that require getting a new ID from a sequence and using it in INSERT statements as one transaction. Our application servers are load balanced, so we're in need of software that's transaction-safe. Is JForum going to be written in a manner that takes into consideration that multiple implementations will be accessing the same database?

I'll likely have other questions as I finish up translating things for DB2 and begin testing. I'm planning on conducting this stuff alongside other projects. My department is hoping that this goes well, since we're in need of a decent Java-based forum package, and upgrading our Jive license so it will work with our load balanced application servers is cost prohibitive.
[originally posted on jforum.net by Anonymous]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anonymous wrote:Hi, I'm posting anonymously because I'm having a lot of trouble logging in right now. I even made a new account, quit my browser, and then tried logging in, and had no luck. Also, your "change password" component seems to ask for the user's e-mail address once the notification link is clicked... shouldn't that be predetermined by the link the user clicks in the notification e-mail?



hhmm.. see, the biggest mistake I made in jforum was to not write unit tests in the beggining.. now this nasty bugs are only noticied when users like you have trouble with the system :? . Sorry about the inconvenience. I registered the bug at Jira: http://www.jforum.net/jira/browse/JF-123 . I'm going to fix it asap.


One question I had was whether you're conducting actions that require getting a new ID from a sequence and using it in INSERT statements as one transaction. Our application servers are load balanced, so we're in need of software that's transaction-safe. Is JForum going to be written in a manner that takes into consideration that multiple implementations will be accessing the same database?



You aren't the first to ask about that. Other times I have said that, currently, I don't syncrhonize() the calls to insert statements. I registered it at http://www.jforum.net/jira/browse/JF-124. But I must admit that I'm not very experienced with concurrent programming, so any help / tips about that is very welcome.
For now, all I have in mind is to sync() the call tho the statement who executes the INSERT statement and the one which gets the latest id. Is that enough?


I'll likely have other questions as I finish up translating things for DB2 and begin testing. I'm planning on conducting this stuff alongside other projects. My department is hoping that this goes well, since we're in need of a decent Java-based forum package, and upgrading our Jive license so it will work with our load balanced application servers is cost prohibitive.



As I said before, the lack of unit testing in jforum is really inadmissible. An user, Marc, started writing regression tests using jwebunit, but the work is in the beggining yet. I'll start to write test cases for the core classes in the next days if I could finish some other priority tasks.

As JForum is getting well know now ( and is a recent project, anyway ), I think you will be the first that will ( try to ) run jforum in cluster.
Please let me know what can I do to help you.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
as the anonyomous user said it should be handled with database transactions. Synchronization only helps for a single jvm, in a cluster the database engine has to synchronize db access.

I think it will require some refactoring to move the mysql db stuff from the generic sql models in the mysql package. I am not a mysql user, but as far as I know the driver will throw an exception if a transaction is started. So the transaction handling belongs in the generic part and the mysql part may be without transaction handling.

In order for commiters only using mysql in deployment, they will have to write their transaction handling with hsqldb.
[originally posted on jforum.net by marc]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

marc wrote:as the anonyomous user said it should be handled with database transactions. Synchronization only helps for a single jvm, in a cluster the database engine has to synchronize db access.



hm, right. But, using the same logic, to have jforum running on multiple machines without any problem, other pieces of code should be refactored as well. Jira uses a expensive and proprietary component to do that.
Again, I have no experience of developing applications to run correctly in cluster, so any tipps, suggestions of papers are welcome.

marc wrote:
I think it will require some refactoring to move the mysql db stuff from the generic sql models in the mysql package.



I can do that. Let me just finish the SafeHtml implementation.

marc wrote:
I am not a mysql user, but as far as I know the driver will throw an exception if a transaction is started.



Actually, newest versions of mysql do support transactions.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have noticed the cache today when I wanted to change the order of my categories and the update to the db was only visible after I restarted tomcat.

Wouldn't it make sense to use an open source cache solution like oscache or jcache ?

These cache solutions already offer a lot of options like clustering or timeouts.

I don't know whether a fully clustered-cache is really needed for the view-part of the forum. A time out to refresh the cache from the db every minute or so should be sufficient for most installations. (of course it would require sticky sessions, to make sure a users sees his own posting the very moment after pressing submit)

With a nice cache solution the admin will be able to configure it to suit his needs.
[originally posted on jforum.net by marc]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

marc wrote:
Wouldn't it make sense to use an open source cache solution like oscache or jcache ?

These cache solutions already offer a lot of options like clustering or timeouts.



Really? I read about oscache, but didn't know that cluster support.. Cool.

marc wrote:A time out to refresh the cache from the db every minute or so should be sufficient for most installations.



I think that a button "restart cache" is better. Is not hard to have this in the admin.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, me again (same guy as before... my account nick is mdellabitta).

Glad to hear that you're on the bugs I reported... Maybe posting in the contributors forum isn't the correct way to notify you. Sorry about that... I know what it's like to receive various intersecting bug reports from different channels. It can become overwhelming.

In terms of concurrency, I think Marc is correct in that this should be handled by the DB. The developers of whatever DB one is using probably have spent a lot of time working on the problem, and unless you're going to deploy in a distributed environment where the web tier is separate from the application tier, why duplicate their effort?

Jive's non-enterprise version (at least, the one we're running) maintains some sort of Apache HTTPD-style "scoreboard" type file on the application server that caches runtime data, which prevents clustering. Who knows if this was a architectural optimization, or a ploy to get people to upgrade to the enterprise version (which runs on a cluster). Anyway, that's preventing us from moving it over to our new servers. Right now, it's running on one of a cluster of web servers in a configuration that short circuits the load balancer so all requests for that particular port point to the server that Jive's running on. Not ideal for us.

I'm not sure if you're maintaining any data in memory or in local storage that prohibits clustering. I noticed that you're using ThreadLocal stuff at one point, which i'm familiar with as a pattern that's recommended for using Hibernate in a web application environment. It seems like that's the right way to go with anything that's maintaining state throughout the lifetime of a request, but I didn't delve too deeply into the request lifecycle to double check your stuff (and who knows if I'm qualified to check up on your code to begin with).

Anyway, DB transactions that grab a primary key from a sequence and use it to write a record in the same transaction are probably key (no pun intended). I have experience as an end user of another open source library that used MySQL as a development platform and didn't test against any other DB, and it was sort of shaky. MySQL is definitely coming along in terms of industrial strength features, but sometimes it takes a diverging road in terms of behavior when encountering edge cases. I guess as long as you rely on users of other databases for bug reports, and you try out your code against PostgreSQL to keep you honest, you'll be doing fine though.

Anyway, looking forward to working with you on this one.
[originally posted on jforum.net by Anonymous]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rafael Steil wrote:
hm, right. But, using the same logic, to have jforum running on multiple machines without any problem, other pieces of code should be refactored as well.



Is there anything else you are aware of besides the avatar upload? I see a configuration option called cache.dir, but it doesn't seem to do anything. If there is no cache in jforum it shouldn't be too difficult to make it ready for clustering.

As for the avatar upload I think enterprise users are likely to disable it anyhow, and if someone really needs it, they can put the avatars on shared disk or they can add an rsync call in the upload method.
[originally posted on jforum.net by Anonymous]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anonymous wrote: I see a configuration option called cache.dir, but it doesn't seem to do anything. If there is no cache in jforum it shouldn't be too difficult to make it ready for clustering.



Actually, I do a lot of caching in memory using flat static instances. I know that this is bad. Putting those data structure in ServletContext solves the problem?

The cache.dir is intented to be used to store html files, like posts and topics, to lower the access to the database and etc.. But so far I haven't made the code yet.


As for the avatar upload I think enterprise users are likely to disable it anyhow, and if someone really needs it, they can put the avatars on shared disk or they can add an rsync call in the upload method.



Well, to disable avatar upload is easy, and shoulnd' be a problem.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ahhh... able to login now. Feels so much better!

Would an option to disable caching be feasible as a stopgap measure? Or would that negate the performance gains from clustering?
[originally posted on jforum.net by mdellabitta]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mdellabitta wrote:Ahhh... able to login now. Feels so much better!

Would an option to disable caching be feasible as a stopgap measure? Or would that negate the performance gains from clustering?



Well.. it's an option.. but will have a performance penalty, specially in very large forums. I think that refactoring the code to work right in cluster is the best to do. The problem is that I personally don't have too much experience with this.. You guys do?

We can refatore some pieces of code without problem.. the hard part is to test all the system running on multiple machines.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would also like to have an option to disable the cache. A well tuned database can be incredibly fast.

[originally posted on jforum.net by marc]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

marc wrote:I would also like to have an option to disable the cache. A well tuned database can be incredibly fast.



Not so fast as reading directly from the RAM ;)

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rafael Steil wrote: The problem is that I personally don't have too much experience with this.. You guys do?



I happen to be in the c-jdbc (clustered jdbc) team. However, I do not have experience with any of the cache projects I have mentioned above.

http://c-jdbc.objectweb.org/

Rafael Steil wrote:Not so fast as reading directly from the RAM ;)


Databases normally do use caching, they don't read from disk for every request. There is an interprocess communications overhead, though.

I didn't look at the jforum cache, but are you sure the cache will not run out of memory for a big forum ? It is pretty hard to write a bullet-proof cache, and I think it is outside of the scope of a forum software.
[originally posted on jforum.net by marc]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

marc wrote:
I happen to be in the c-jdbc (clustered jdbc) team. However, I do not have experience with any of the cache projects I have mentioned above.

http://c-jdbc.objectweb.org/



Yeah, I read about this project some time ago. Looks great.

marc wrote:
I didn't look at the jforum cache, but are you sure the cache will not run out of memory for a big forum ? It is pretty hard to write a bullet-proof cache, and I think it is outside of the scope of a forum software.



No, it will not, because the data structures are very tiny. I don't make cache of everthing. The cached parts are:

:arrow: The forums ( /forums/list.page ) info. Cache is net.jforum.repository.ForumRepository

:arrow: The first page of each forum ( /forums/show/ID.page ). Cache is net.jforum.repository.TopicRepository

:arrow: BB tags: net.jforum.repository.BBCodeRepository

:arrow: Permissions for the logged users, with a limit of 50 controled by a LRU cache. net.jforum.repository.SecurityRepository

:arrow: The user sessions, controled by net.jforum.SessionFacade

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Wow, there are quite some unclear topics here.

Okay, first let's say that database clustering should be transparent from the DB client point of view, acting exaclty as a single DB. I don't know about the details nor if all databases work this way.

One of the reason I like firebird is that it is quite a standard database: when you wrote an application that works on it, it is often very easy to get it running on another DB too.

I still agree that synchronization should be done on the database level. Having no cache and having an application working this way simply allows it to be clusterable. Anyway, computers are fast enough so that jforum would run very fast without any form of caching. With fairly optimized SQL queries, I think a P3-600 could handle maybe 500 or 1000 posts an hour. Think of that: this is only one to two posts per second (very few work for the computer) and that would be a forum with a lot of users.

Clustering on the app server level is impossible at the moment because of the static instances and the primary key fetching strategies.

I also think that caching should be completely left out at the moment. It is quite complex to durably increase performances with caching when your architecture changes quite often. This often becomes a pain to keep it in sync. Often there are no benefits anymore.

I think refactoring the code to use an O/R mapper would solve all these issues at once. It would enable support for most of the databases for free, ease the code maintenance, easily allow to plug a clustered cache and ease the adoption of DB transactions.

just my 2 cent added to the post...
[originally posted on jforum.net by Anonymous]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, disabling the cache will have a performance penalty. And, again, nothing is faster than in-memory data. OScache does clustering using JGroups or JMS, and so each node has its own cache ( eg, oscache does not send objects through the network ).

I agree with many of your points, and is not a problem to have an option "Disable Caching"...
Support an O/R tool is quite acceptable, but it will not solve all the possible problems anyway. Also, it should not be intrusive / should not require huge modifications in the core code ( aka, the system should not be dependent of the O/R tool ).

And, last but not least, help with code is welcome ;). The CVS is public accessible.

Rafael
ps: 1000 posts / hour means 24.000 posts / day. Most of the sites don't even get a little close of that.
[originally posted on jforum.net by Rafael Steil]
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic