Hibernate and GeoServer: seeking for scalability and robustness

I thought it would have been worth spending a few minutes to let people know about this development that we are performing at GeoSolutions.
Being not only GeoServer developers but also GeoServer hungry users, we have been a bit unpleased in the past some the scalability problems that it was showing due to the fact that:

  1. GeoServer was keeping all its configuration into memory
  2. GeoServer was making use XML files to handle its internal configuration

Now a lot of work has been lately for the upcoming 2.0 version of GeoServer, to cope with point 2 above, however point 2 has not been touched yet.
If you use GeoServer the way we use it, with thousand of layers and with 10 to 100 new layers added daily (usually remote sensing data), you might agree with us that we need to:

  1. Not load and keep the entire configuration in memory
  2. Use a database to store the configuration

In a few words, we need to improve scalability and robustness while tring to not jeopardize performance, we need to be enterprise-ready.

At GeoSolutions we have decided to tackle this problems by implementing a new GeoServer internal catalog that leverages on Hibernate as its persistence engine and that would also not bring the whole configuration into memory. Our goal is to be able to support at least Postgis and Oracle as the target database, but as you know, many more are supported by Hibernate (spatialite wi  ontheradar as well).
The range of features that this work would open up is pretty wide, just think about using Hibernate distributed caching, simplified GeoServer replication, etc., etc.

The work is in progress, we have started to describe the details on the GeoServer wiki .
If you are interesting in supporting somehow (funding or human resources) this effort, please, drop me a few lines at simone.giannecchiniATgeo-solutions.it.

5 Comments

  1. Amos
    Posted August 28, 2009 at 11:27 am | Permalink

    That’s awesome Simone. That will really open up possibilities for dynamically altering configurations without having to teach back-end applications how to do REST. I assume that GeoServer will notice and respond to changes in its configuration?

  2. Andrea Aime
    Posted August 28, 2009 at 4:32 pm | Permalink

    Amos, it depends. To avoid disastrous performance outcomes some data will have to be cached, a LRU cache of last used data stores, styles, and other heavy to create and setup resources will have to kept up.
    This also means GS won’t query every single object each time, if we don’t do that we’ll get scalability at the expense of performance.
    I guess some REST call to force GS to force a reload from DB (full or selective) will be required.

  3. Posted August 31, 2009 at 4:57 pm | Permalink

    Ciao Amos,
    the use case you are talking about is exactly one of the use cases that is driving this work.
    At first stage, my goal is not pure speed but robustness and scalability.
    Once we get the first bit working, the focus will be on replication and exploitation of distributed hibernated caches for performances.
    I am not running after the user willing to upload and serve 100 shapefiles or 100 geotiffs, I am trying to lay down a path for using the GeoServer to serve thousands of layers in a dynamic environment where new data is configured in an unsupervised fashion, etc. etc. etc.

  4. Posted September 8, 2009 at 1:03 pm | Permalink

    +1 for Amos’s idea: automatically detect changes in configuration.

  5. Posted November 25, 2009 at 12:35 pm | Permalink

    As we move rapidly towards a production environment with Geoserver (2.x), we too area interested in an RDBM (Oracle) based configuration. I would enjoy participating in that endeavor. Let me know how I can help.
    Peter

Download GeoServer