Metadata extraction: from FITS files to databases with MEx

Note: this hands-on session runs only in the morning; in the afternoon you will deploy DALToolKit to create data access services using your database.

Abstract

Publishing images and spectra in a VO-compliant format is a two step procedure: first making sure all data is valid and described in a common, homogeneous way, and then providing query interfaces to the data, using the validated data descriptions.

MEx is a tool to aid the first step (metadata ingestion) where the data descriptions are extracted from FITS headers into a data repository. A set of required and optional metadata are defined for each type of data, and are extracted from the FITS files through mapping rules. The data repository (most commonly a database) separates the metadata ingestion from the second step, the query service.

The goal of this session is to load a simple MySQL database and deploy DALToolKit for providing SIA and SSA interfaces to the data.

Participants should bring their own data to ingest, data of which they have a good understanding (of the meaning of the FITS keywords). When a custom database structure should be supported, a small amount of programming might be required.

External References

Advisors (ESO)

  • Remco Slijkhuis
  • Bruno Rino
  • Jean-Christophe Malapert

Software Requirements


Download

http://www.euro-vo.org/dcaworkshop2008/HandsOn/mex_daltoolkit/

Bug report and feature requests

http://mex-eso.blogspot.com/

Session outline

Publishing data in the VO as a two-step procedure:

  • gathering metadata (knowledge of the data, e.g. FITS keywords) in a homogeneous fashion: MEx (ESO)
  • building a (web) service that searches data using the metadata and allows access to the data: DALToolKit (ESA)

The session is organized in two parts:

  • a first one using ESO data
  • a second one using your own data.

1st run: demo data, demo database

  • (one time only) setup a database
    • create SQL tables
  • gathering meta-data into the database
    • map FITS keywords to "concepts" AKA model items: Mapping Editor
      • get a sample FITS header
      • define mappings for (at least) the required metadata
      • test mappings against sample FITS file header
    • ingest metadata into database: MEx
      • execute mappings against all the FITS files
  • build service
    • MEx creates default DALToolKit configuration

2nd run: your data, your database

  • your data:
    • the Mapping Editor, revisited
    • several types of data in one package: Catalogue Builder
  • your data definition
    • model items configuration
    • your own mapedit
  • your database:
    • MEx scripting
    • adapt DALToolKit configuration

Steps for using Simple Image Access Protocol

Setup a database

make sure the database is running and connect yourself to the database server as "superuser"
mysql -u root -p

Now, create a database:

create database esodata;

And finally, create a SIA table in the esodata database:

use esodata;
source samples/db/sia.sql

Check the table structure:

show tables;
desc SIA;

get a FITS header

Extract a FITS header using the following command:
java -jar lib/fitshead-1.0.jar -x 0 samples/images/GOODS_ISAAC_03_H_V1.5.fits > sampleheader.txt

define/test mappings

  • Open your favorite browser and go to the mapping editor interface:
http://vops1.hq.eso.org:8080/mapedit

  • upload the model item list specific for this workshop: config/modelitem_definitions.txt
  • select data type: image.reduced
  • upload your fits header using upload (text)
  • click "Validate" to use the default rules
  • edit mapping rules that generate errors or warnings
  • fix errors
  • Note: min and max RA and Dec are mandatory for DALToolKit
  • when all are valid, download mappings file

Ingest

set-up a directory for sharing in tomcat: unzip samples/files.zip into $CATALINA_BASE/webapps Ingest data in the database:
  • edit mex configuration config/mex-daltookit.properties (e.g. db password, folder to copy data to)
  • run mex on files + mappings file
java -jar mex-java.jar --type SIA -d samples/images -m samples/mappings/isaac.txt

build service

  • edit DALToolKit configuration if needed
  • deploy DALToolKit service
cd DALToolkit
vim build.properties.local
ant deploy
SIAP-v1.0-mex

Tips for using Simple Spectral Access Protocol v0.1

  • Database creation script:
samples/db/ssa-0.1.sql

  • Example FITS header extraction:
java -jar lib/fitshead-1.0.jar -x 0 samples/spectra/esa/3C273_0136550101.fits > sampleheader.txt

  • Data type for mapedit:
    • spectrum.1d.ssa01

  • Ingestion using provided mappings:
java -jar mex-java.jar --type SSA-0.1 -d samples/spectra/esa -m samples/mappings/esa-0.1.txt

  • DALToolKit service config:
SSAP-v0.1-mex

Tips for using Simple Spectral Access Protocol v1.0

  • Database creation script:
samples/db/ssa.sql

  • Example FITS header extraction:
java -jar lib/fitshead-1.0.jar -x 0 samples/spectra/fors2/GOODS_FORS2_GDS_J033202.99-274301.2_904509_V2.0.fits > sampleheader.txt

  • Data type for mapedit:
    • spectrum.1d.ssa10

  • Ingestion using provided mappings:
java -jar mex-java.jar --type SSA-1.0 -d samples/spectra/fors2 -m samples/mappings/fors2.txt

  • DALToolKit service config:
SSAP-v1.0-mex

Topic revision: r16 - 26 Jun 2008 - 10:26:17 - BrunoRino
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback