Developer’s corner

[[savot]]

SAVOT - Simple Access to VOTable - Release 2.6

Read before any use of available Java classes

You can send any idea or comment concerning SAVOT to question@astro.unistra.fr

Introduction

The main goals of this work are :

  • to provide Java VOTable parsers based on different parsing models
  • to provide small Java VOTable parsers which can be included in applets

These different parsers are based on Pull and SAX like parsing methods :

  • SAVOT is a generic name for our work
  • SAVOT Pull is the parser based on the pull parsing method
  • SAVOT SAX is the parser based on the SAX parsing method

The Data model

The data model has been created to be able to load a VOTable document into memory.
The data model is independent from the parsers.
This model is based on the VOTable following schema (Documentation) :

 Click to enlarge

The Pull parsing mode

(packages : cds.savot.pull, cds.savot.model and cds.savot.common)

SAVOT pull parsing can be implemented in two ways : FULL or SEQUENTIAL

  • The FULL mode loads the whole XML document into the memory (internal data model cds.savot.model).
    • DOM parsers are often unable to load large XML document because they use too many memory.
    • SAVOT has been designed to load very large document into the memory.
    • In this mode it is possible to manipulate the whole data into memory through the internal data model API (cds.savot.model).
    • After modifications, the VOTable document can be saved through the writer (cds.savot.writer).
  • The SEQUENTIAL mode loads an XML document RESOURCE per RESOURCE into the memory and unload the previous RESOURCE.
    • The memory needs will be limited to the size of the most important RESOURCE of the VOTable document.
    • The internal data model API (cds.savot.model) can also be used to create VOTable documents from scratch.

Usefull informations about the work which is done around the pull parsing method.

The SAXLike parsing mode

(package : cds.savot.sax and cds.savot.common, cds.savot.model is optional)

In some use cases, it can be important to use a SAX parsing mode because it is possible to execute actions in the different steps of the parsing.

In this mode SAVOT does not save the data into memory, the developer has to manage a part of the process.
This mode is also a good solution if the available memory is short or if the VOTable files are very large.
Compared to the Pull mode, it requires often more work on the developer side.

Quick start with SAVOT Pull !

How to start with the Pull Parser ?

* The usual questions…

Q - Which packages ?
  • cds.savot.model to put information in the memory
  • cds.savot.pull for the VOTable parsing
  • cds.savot.common common things used for all modes
  • kxml2-min for the XML parsing
  • cds.savot.writer (optional) if you want de create VOTable documents (edition or creation from scratch)

You can download these packages in the Download corner

Q - And the CLASSPATH ?

Put the four above packages in the CLASSPATH

Q - Does it work ?

Download one of the samples and execute it

If it works, cheers, if not goto *

To start a basic source code, you must choose in which mode, FULL or SEQUENTIAL, you want to parse the VOTable file.

Example 1 : FULL mode

In this example we show how to create an object which contains the whole VOTable document (FULL mode).

      // the whole VOTable file is put into memory
      SavotPullParser sb = new SavotPullParser(source, SavotPullEngine.FULL); !!! parsing of  the whole source 
 
      System.out.println("Resource name : " + ((SavotResource)sb.getVOTable().getResources().getItemAt(0)).getName());
 
      // get the VOTable object
      SavotVOTable sv = sb.getVOTable(); !!! sv is now a reference to a VOTable object
 
	try {
        BufferedWriter bw = null;
 
        if (target != null) {
          bw = new BufferedWriter(new FileWriter(target));
        }
 
        // for each resource
        for (int l = 0; l < sb.getResourceCount(); l++) {
 
          SavotResource currentResource = (SavotResource)(sv.getResources().getItemAt(l));
 
          // for each table of the current resource
          for (int m = 0; m < currentResource.getTableCount(); m++) {
 
            // get all the rows of the table
            TRSet tr = currentResource.getTRSet(m);
 
            System.out.println("Number of items in TRset (= number of <TR></TR>) : " + tr.getItemCount());
 
            // for each row
            for (int i = 0; i < tr.getItemCount(); i++) {
 
              // get all the data of the row
              TDSet theTDs = tr.getTDSet(i);
              String currentLine = new String();
              System.out.println("Number of items in TDSet for the index " + (i+1) + " tr (= number of <TD></TD>) : " + theTDs.getItemCount());
 
              // for each data of the row
              for (int j = 0; j < theTDs.getItemCount(); j++) {
                currentLine = currentLine + theTDs.getContent(j);
                System.out.println("<"+theTDs.getContent(j)+">");
              }
 
              if (target != null) {
                if (target.compareTo("") != 0) {
                  bw.write(currentLine);
                  bw.newLine();
              }
            }
            else
              System.out.println(currentLine);
          }
        }
        if (target != null) {
          bw.flush();
          bw.close();
        }
      }
    }	...

If you want to try this example, execute the PullFullSample2 class

Example 2 : SEQUENTIAL mode

In this example we show how to use the SEQUENTIAL mode

    // begin the parsing
    SavotPullParser sb = new SavotPullParser(source, SavotPullEngine.SEQUENTIAL);!!! parsing starting 
 
    // get the next resource of the VOTable file
    SavotResource currentResource = sb.getNextResource();  !!! get the next resource
 
    // while a resource is available
    while (currentResource != null) {
 
      // for each table of this resource
      for (int i = 0; i < currentResource.getTableCount(); i++) {
        tr = currentResource.getTRSet(i);
 
        if (tr != null) {
          System.out.println("Number of items in TRset (= number of <TR></TR>) : " + tr.getItemCount());
 
          // for each row of the table
          for (int j = 0; j < tr.getItemCount(); j++) {
 
            // get all the data of the row
            TDSet theTDs = tr.getTDSet(j);
 
            String currentLine = new String();
 
            System.out.println("Number of items in TDSet for the index " + (j+1) + " tr (= number of <TD></TD>) : " + theTDs.getItemCount());
 
            // for each data of the row
            for (int k = 0; k < theTDs.getItemCount(); k++) {
              currentLine = currentLine + theTDs.getContent(k);
              System.out.println("<" + theTDs.getContent(k) + ">");
            }
          }
        }
      }
      // get the next resource
      currentResource = sb.getNextResource();
    }

If you want to try this example, execute the PullSeqSample class

Quick start with SAVOT SAXLike !

* The usual questions…

Which packages ?
  • cds.savot.sax for the VOTable parsing
  • cds.savot.common common things used for all modes
  • kxml2-min for the XML parsing
  • cds.savot.model (optional) to put information in the memory (you can use your own data model)

You can download these packages in the Download corner

And the CLASSPATH ?

Put the above packages in the CLASSPATH

In this mode the developer must implement a SavotSAXConsumer interface which contains all the methods which will be executed during the parsing.

The developer decided what is done when :

  • a FIELD element starts
  • a content is present between <TD>..</TD>


See the following example (SavotSAXSample).

Example : SAVOT SAX

In this trivial example, the <VOTABLE …> attributes, the <RESOURCE …> attributes and the <TD&gt…</TD> content are printed on the standard output.

import java.util.Vector;
import cds.savot.sax.*;
 
public class SavotSAXSample implements SavotSAXConsumer {  
  public SavotSAXSample() {
  }  // attributes is a Vector containing couples of (attribute name, attribute value)
  // exemple : (attributes.elementAt(0), attributes.elementAt(1)), (attributes.elementAt(2), attributes.elementAt(3)), ...
 
  /**
   *
   * @param attributes Vector
   */
  public void showAttributes(Vector attributes) {    for (int i = 0; i < attributes.size(); i = i + 2) {
      System.out.println("Attribute name : " + attributes.elementAt(i) + " Attribute value : " + attributes.elementAt(i + 1));
    }
  }  // start elements
  public void startVotable(Vector attributes) {
    showAttributes(attributes);
  }  public void startDescription(){
  }  public void startResource(Vector attributes){
    showAttributes(attributes);
  }  public void startTable(Vector attributes){
  }
 
...  // end elements
 
   public void endVotable(){}   public void endDescription(){}   public void endResource(){}   public void endTable(){}
 
...   // TEXT
 
   public void textTD(String text){ 
     System.out.println(text);
   }   public void textMin(String text){}   public void textMax(String text){}
 
...   // document
   public void startDocument(){}   public void endDocument(){}
} 

The following lines must be included in your application :

...
    SavotSAXSample consumer = new SavotSAXSample();
    SavotSAXParser sb = new SavotSAXParser(consumer, file);
...

The SavotSaxSample consumer will be taken into account during the parsing process.

FAQ

Q : Why not DOM ?

Parsers which implements DOM need often very large memory size (20 times the XML document size and it is not a joke !!), so we decided to use a pull parser to load the document in our own data model.

Q : Why different parsers ?

We need parsers for different use cases, sometimes for very small applications, so we cannot use a 2MB parser which would be 10 times bigger than the application itself !!! Sometimes it is very interesting to load all the document in the memory and sometimes it is better to use a SAX parser. It depends on what you want really to do.

Statistics (SAVOT Pull)

Test hardware configuration :

  • Laptop Intel Pentium 4M 2.2GHz
  • 512 DDR MB
  • 40 GB HD

Test software configuration :

  • Microsoft Windows XP SP2 (Windows XP is a trademark of Microsoft)
  • Sun JDK 1.4.2 release 6 (Sun is a trademark of Sun microsystems)

These tests have been done with the pull parser kXML

All the VOTable document is loaded into the SAVOT internal data model and is available in memory for access through the API

VOTable files from Simbad database
File Size (KBytes) Resources Tables Data Cells Parsing time (seconds)
simbad1.xml 9 2 8 64 0.32
simbad2.xml 70 20 109 1009 0.37
simbad3.xml 398 200 747 6831 0.5
simbad4.xml 2854 2000 5821 54515 1.3
simbad5.xml 29360 20000 61747 557944 10.45
VOTable files from VizieR database
File Size (KBytes) Resources Tables Data Cells Parsing time (seconds)
m31.xml 3260 135 166 189020 1.68
3c273.xml 9634 1 1 639991 3.6

Download and links

Download corner

Here we have put some links pointing to XML parsers which have been tested

kXML TinyXML Xerces Crimson NanoXML

savot.txt · Last modified: 2010/11/15 10:15 by administrator

Thanks for acknowledging the CDS developer’s resources (libraries, source code, etc.)

© UDS/CNRS

Contact