Data Management with gLite LFC

The aim of this document is to introduce file management in gLite. LCG based file catalog and related storage elemement (SE) interactions have been kept separated by the current gLite middleware. Specialized CLI tools exist in order to handle with the LFC catalog or with the storage elements:

  • "lfc-" commands interact with the LFC catalogue server that maps logical filenames to "site URLs".
  • "lcg-" commands include ones used to copy files to and from an SE, and to replicate files.

The lcg-* commands performs also interaction with both the SEs and the catalogue server.

Preliminary Operations

Several environment variables need to be set before you start to ensure that the correct catalog service is used. The default settings for these variables in your account should be correct, however this needs to be checked. The variables which need to be checked having these exact values are:

Variable Value Description
LCG_GFAL_INFOSYS egee-bdii.cnaf.infn.it:2170 reference BDII
LCG_CATALOG_TYPE lfc Catalogue type
LFC_HOST lfcserver.cnaf.infn.it LFC server fqhn
LFC_HOME /grid/yourvo/yourname LFC home directory

Getting started

For each of the supported VO a separate "top level" directory exists under the "/grid" directory. You can see all the files that are stored for the gilda VO. First ensure you have a running VOMS proxy.

Listing the catalogue

To list the catalogue you can use the following
[morgan@localhost ~]$ lfc-ls -l 
drwxrwxr-x   2 168      113                       0 Mar 05  2007 GAUSS
drwxrwxr-x   2 168      113                       0 Jan 14  2008 IFCA
drwxr-xr-x   1 root     113                       0 Feb 09  2006 PIPPO
drwxrwxr-x   4 307      113                       0 Jan 29  2008 ceballos
drwxrwxr-x   4 186      113                       0 Jan 14  2007 curto

this command supports almost all the attributes of the Unix ls command.

As we set the LFC_HOME directory we are listing only its content. Use the absolute path to list the parent directories ex. lfc-ls /grid/planck/GAUSS.

Before creating and uploading any of your own files it is common to create a personal directory for storage by using the lfc-mkdir command:

[morgan@localhost ~]$ lfc-mkdir /grid/planck/USER
[morgan@localhost ~]$ export LFC_HOME=/grid/planck/USER

Upload file into SE

The next step is to upload a file into the directory you just created. First create locally a simple text file and then copy it to a SE, the command used for this is lcg-cr (LCG copy and register). Type the following to store this file on the grid2.fe.infn.it Storage Element : (use lgc-infosites to get the list of SE).

[morgan@localhost ~]$ echo "delme" > file.txt
[morgan@localhost ~]$ lcg-cr --vo planck -l lfn:/grid/planck/morgan/file.txt -d grid2.fe.infn.it file:/home/morgan/file.txt 
guid:ecbf74c6-5257-4ffa-98f2-fbc5c20e0e6d
[morgan@localhost ~]$ lcg-ls -l lfn:/grid/planck/morgan/
-rw-rw-r--   1   122   113       6 file.txt

Of course, the guid you will have is different, since it is an unique identifier for each file (except for the case when you are replicating a file, as we will see). Check that the file is there by listing the contents of your directory.

Before continuing it is worth noting the difference between the command used to store the file and the creation of the directory in previous. The directory created is just a virtual directory and only exists within the catalog of lfn's. On the other hand the file physically exists on an SE but has an additional "virtual" filename in the catalog. This is connected to why the commands just handling the lfn namespace tend to start with "lfc" whilst the commands manipulating the file directly tend to start with "lcg".

For several purpose, FTS for instance, is useful to know the file Storage URL (they can be many if the file has replicas somewhere). The appropriate command is lcg-lr (list-replicas) [lfn | guid]

[morgan@localhost ~]$ lcg-lr --vo planck  lfn:/grid/planck/morgan/file.txt
srm://grid2.fe.infn.it/planck/generated/2008-08-18/file54ba4262-f1ce-479a-a5a9-0ac7655f5eda

It is also possible to crate a symbolic link to this file:

[morgan@localhost ~]$ lfc-ln -s /grid/planck/morgan/file.txt /grid/planck/morgan/newfile.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/
-rw-rw-r--   1 122      113                       6 Aug 18 20:38 file.txt
lrwxrwxrwx   1 122      113                       0 Aug 18 20:46 newfile.txt -> /grid/planck/morgan/file.txt
[morgan@localhost ~]$ 

This link of course has the same Storage URL of its parent:

[morgan@localhost ~]$ lcg-lr --vo planck  lfn:/grid/planck/morgan/newfile.txt
srm://grid2.fe.infn.it/planck/generated/2008-08-18/file54ba4262-f1ce-479a-a5a9-0ac7655f5eda
Having already uploaded a file the next step is to show downloading a file. To download the file you already uploaded using the new lfn you have just created use the command:
[morgan@localhost ~]$ lcg-cp --vo planck lfn:/grid/planck/morgan/newfile.txt file:$HOME/test.txt
[morgan@localhost ~]$ cat test.txt 
delme

Replicate file between SE

gLite supports file replication. A file can be stored on multiple SE's and then a running job can access the closest SE with the file on it, thus giving faster access times to the data. This also helps protect against failures/access difficulties with a particular SE. Using lcg-rep command we replicate a file and than we can check the number of replicas associated to it:

[morgan@localhost ~]$ lcg-rep --vo planck lfn:/grid/planck/morgan/file.txt -d grid003.ca.infn.it
[morgan@localhost ~]$ lcg-lr --vo planck  lfn:/grid/planck/morgan/file.txt
srm://grid003.ca.infn.it/dpm/ca.infn.it/home/planck/generated/2008-08-18/fileaab26271-0424-45c2-8f01-a1f5d2049c14
srm://grid2.fe.infn.it/planck/generated/2008-08-18/file54ba4262-f1ce-479a-a5a9-0ac7655f5eda

Note how the path to where each file is stored is different. This demonstrates how the use of a "lfn" avoids the need to understand the local filesystem where the replica is actually stored.

Remove files and directories

You can delete a file from SE with lcg-del (notice the -a option to delete all replicas!):

[morgan@localhost ~]$ lcg-del -a lfn:/grid/planck/morgan/file.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/

It is also possible to remove a directory on file catalog

[morgan@localhost ~]$ lfc-mkdir  /grid/planck/morgan/test
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/
drwxrwxr-x   0 122      113                       0 Aug 18 20:57 test
[morgan@localhost ~]$ lfc-rm -r /grid/planck/morgan/

Advanced operations on LFC

Suppose you have just created an entry, for example having uploaded a file with a valid LFN, you can change, for some reason known to you, the logical file name.

[morgan@localhost ~]$ lcg-cr --vo planck -l lfn:/grid/planck/morgan/test.txt file:/home/morgan/delme.txt -d $VO_PLANCK_DEFAULT_SE 
guid:28a7476b-0205-4475-be9b-f12eff0d1fca
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/
-rw-rw-r--   1 122      113                       6 Aug 19 12:35 new.txt

The variable VO_PLANCK_DEFAULT_SE points to the default SE of the VO PLANCK, it is very useful when working on a WN.

Sometimes can happen you have a file stored on a Storage Element, but for some reason you have not the file registered on the catalog. You can overcome through lcg-aa, which add an alias for the file.

[morgan@localhost ~]$ lcg-cr --vo planck -d grid2.fe.infn.it file:/home/morgan/file.txt 
guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583
[morgan@localhost ~]$ lcg-aa --vo planck guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583 lfn:/grid/planck/morgan/test.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan
lrwxrwxrwx   1 122      113                       0 Aug 19 12:20 test.txt -> /grid/planck/generated/2008-08-19/file-faf5e641-c9ea-485f-aeea-d59e7450d7e8
[morgan@localhost ~]$ lcg-la --vo planck guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583
lfn:/grid/planck/generated/2008-08-19/file-faf5e641-c9ea-485f-aeea-d59e7450d7e8
lfn:/grid/planck/morgan/test.txt
As you may note, because you didn't specify an LFN at registration time, a default has been put, and the one you added with aa is equivalent to a symbolic link. The correct execution could also have been verified with lcg-la.

It may be useful, mostly for application, to know the transport URL for the file; this is done with lcg-gt (get TURL). You must provide the SURL of the file

lcg-lr --vo planck guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583
srm://grid2.fe.infn.it/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b
[morgan@localhost ~]$ lcg-gt srm://grid2.fe.infn.it/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b gsiftp
gsiftp://grid2.fe.infn.it:2811/storage/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b
02889f57-2afd-410a-86c8-605f1be3d88e
[morgan@localhost ~]$ lcg-gt srm://grid2.fe.infn.it/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b rfio
rfio://grid2.fe.infn.it:5001/storage/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b
e237f3a2-a6cb-4341-8f84-cc5ec4919ed7

Notice how the TURL, for the same file, is different on the basis of the requested transport protocol.

The LFC allows to insert a comment on existing entries. This is useful to make more meaningful LFC entries

[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan
-rw-rw-r--   1 122      113                       6 Aug 19 12:35 new.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/new.txt
-rw-rw-r--   1 122      113                       6 Aug 19 12:35 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-ls -l --comment  /grid/planck/morgan/new.txt
-rw-rw-r--   1 122      113                       6 Aug 19 12:35 /grid/planck/morgan/new.txt Simulation done for tests

As you can see, the comment is displayed through an option of lfc-ls. You may also delete the already inserted comment.

[morgan@localhost ~]$ lfc-delcomment  /grid/planck/morgan/new.txt
[morgan@localhost ~]$ 

Similarly to an UNIX filesystem it is possible to change the rights associated to an entry, through lfc-chmod

[morgan@localhost ~]$ lfc-ls -l  /grid/planck/morgan/new.txt
-rw-rw-r--   1 122      113                       6 Aug 19 12:35 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-chmod  750  /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-ls -l  /grid/planck/morgan/new.txt
-rwxr-x---   1 122      113                       6 Aug 19 12:35 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-chmod  770  /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-ls -l  /grid/planck/morgan/new.txt
-rwxrwx---   1 122      113                       6 Aug 19 12:35 /grid/planck/morgan/new.txt

LFC has a very powerful instrument given by access control list, which allows to a file/directory owner to grain finely access rights for that entry for any other users. For example, let's create a new directory and see which are the default access rights with lfc-getacl :

[morgan@localhost ~]$ lfc-mkdir /grid/planck/morgan/accesso
[morgan@localhost ~]$ lfc-getacl !$
lfc-getacl /grid/planck/morgan/accesso
# file: /grid/planck/morgan/accesso
# owner: /C=IT/O=INFN/OU=Personal Certificate/L=INAF Trieste/CN=Giuliano Taffoni
# group: planck
user::rwx
group::rwx              #effective:rwx
other::r-x
default:user::rwx
default:group::rwx
default:other::r-x

Note that ownership is expressed through DN of certificate, while group is expressed through VO membership. Then it is shown the present ACL for the entry :

  • user and group have full privileges
  • other can just read
Finally it is shown the default, which applies for each new entry created within this directory.

Now lets's change default ACL, with read/write permission for user and group, and no privileges for others. The syntax we apply here is modify (-m) default (d:) for user (u::), and the same of course for group and others. The conventions for rights is the usual UNIX-like (7 all, 6 rw...)

[morgan@localhost ~]$ lcg-cr --vo planck  -n 8 -l lfn:/grid/planck/morgan/accesso/due.txt file:/home/morgan/delme.txt -d grid2.fe.infn.it
guid:4a5b5c0a-98c9-4fa5-9360-f0350f3e89e8
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/accesso
-rw-rw----   1 122      113                       6 Aug 19 12:53 due.txt

Notice the new acl set on the dir will apply to all the files uploaded to the Grid.

Notice that lcg-cr supports also multi-channel data transmission (as gridftp) =-n option sets the number of channels.

Gilda LFC recursive access tools

Gilda group provide also a set of tools to operate recursively into a catalog directory. The tools can be downloaded here Those tools involve:
  • copy from file catalogs to local filesystem whole catalog directory tree;
  • cancel a whole catalog directory tree removing related files from the storage elements as well (all replicas);
  • copy and registry a whole catalog directory tree;

lcg-rec-cr - Recursive copy and register

This command copies from the local filesystem to a storage element and register to the file catalog the whole source directory structure.

[morgan@localhost ~]$ lcg-rec-cr 
lcg-rec-cr
----------
LCG utils based recursive copy and register to file catalog utility
 
Usage: lcg-rec-cr -vo  -lp  -cp   [-fc ] [-se ] [-v]

The mandatory parameters are:

  • -vo The user needs to specify the virtual organization.
  • -lp The local path from where the user wants to copy recursively to file catalog/SE.
  • -cp The catalog path that will be used as a starting point in the file catalog.

Optional parameters are:

  • -fc It is possible to change the defalut file catalog normally defined by the content of the $LFC_HOST environment variable.
  • -se It is possible to specify a destination SE for the file storage; if it is not specified the default SE configured in the catalog will be used (VO_COMETA_DEFAULT_SE).
  • -v Use this flag to swith the command in verbose mode. Summary info will be preinted at top while all lcg/lfc-* used commands will be shown.

lcg-rec-cp - Recursive copy from file catalog/SE to local filesystem

This command copy from file catalog/storage elements into the local filesystem keeping the original file hierarchy structure registered in the file catalog. To get help about the usage of this command just use the (-h, --help) option.

[morgan@localhost ~]$ lcg-rec-cp
lcg-rec-cp
----------
LCG utils based recursive copy from file catalog utility
 
Usage: lcg-rec-cp -vo  -cp  -lp  [-fc ] [-v]

The mandatory parameters are:

  • -vo The user needs to specify the virtual organization.
  • -lp The local path where the user wants to copy recursively to file catalog/SE.
  • -cp The catalog path that will be used as a starting point in the file catalog.

Optional parameters are:

  • -fc It is possible to change the defalut file catalog normally defined by the content of the $LFC_HOST environment variable.
  • -v Use this flag to swith the command in verbose mode. Summary info will be preinted at top while all lcg/lfc-* used commands will be shown.

lcg-rec-del - Recursive deletion from file catalog/SE

This command copy delete recursively files and directories registered into the file catalog and removes all replicas of the file content from storage elements. To get help about the usage of this command just use the (-h, --help) option.

[morgan@localhost ~]$ ./lcg-rec-del -h
lcg-rec-del
----------
LCG utils based recursive delete from file catalog utility and related replicas
 
Usage: lcg-rec-del -vo  -cp  [-fc ] [-v]

The mandatory parameters are:

  • -vo The user needs to specify the virtual organization.
  • -cp The catalog path that will be used as a starting point in the file catalog.

Optional parameters are:

  • -fc It is possible to change the defalut file catalog normally defined by the content of the $LFC_HOST environment variable.
  • -v Use this flag to swith the command in verbose mode. Summary info will be preinted at top while all lcg/lfc-* used commands will be shown.

-- TaffoniGiuliano - 18 Aug 2008

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 19 Aug 2008 - TaffoniGiuliano
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback