Using DRMAA with Unicluster Express

Using DRMAA with Unicluster Express

code_000000237891Small.jpgDistributed Resource Management Application API (DRMAA) is a high-level API that allows Grid applications to submit, monitor and control jobs to one or more DRM systems. Grid Engine comes with support for C/C++ and java, and one can also download bindings for ruby and python. There is also a nice collection of HowTos that should provide a great start for anyone looking to start writing DRMAA applications. The latest version of Unicluster Express (UCE) bundles Grid Engine 6.1u3, which is installed under $GLOBUS_LOCATION/sge. The $GLOBUS_LOCATION refers to the UCE installation directory (/usr/local/unicluster by default), and all of the DRMAA libraries and java files are located in the $GLOBUS_LOCATION/sge/lib directory. In order to run DRMAA applications, one has to set $LD_LIBRARY_PATH to point to the appropriate (architecture dependent) directory. For my development (64-bit linux) cluster with default UCE installation I used the following setup:

$ source /usr/local/unicluster/unicluster-user-env.sh
$ export LD_LIBRARY_PATH=/usr/local/unicluster/sge/lib/lx24-amd64
$ export JAVA_HOME=/opt/jdk
$ export PATH=$JAVA_HOME/bin:$PATH

A very simple example of a java DRMAA application that submits a job to Grid Engine is shown below:

$ cat SimpleJob.java 
import org.ggf.drmaa.DrmaaException;
import org.ggf.drmaa.JobTemplate;
import org.ggf.drmaa.Session;
import org.ggf.drmaa.SessionFactory;
public class SimpleJob {
  public static void main(String[] args) {
    SessionFactory factory = SessionFactory.getFactory();
    Session session = factory.getSession();
    try {
      session.init("");
      JobTemplate jt = session.createJobTemplate();
      jt.setRemoteCommand("/home/veseli/simple_job.sh");
      String id = session.runJob(jt);
      System.out.println("Your job has been submitted with id " + id);
    } 
    catch (DrmaaException e) {
      System.out.println("Error: " + e.getMessage());
    }
  }
}

One can compile and run the above example using something like the following:

$ javac -classpath /usr/local/unicluster/sge/lib/drmaa.jar SimpleJob.java 
$ java -classpath .:/usr/local/unicluster/sge/lib/drmaa.jar SimpleJob
Your job has been submitted with id 14
$ qstat -f 
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@horatio.psvm.univa.com   BP    1/1       0.36     lx24-amd64  
14 0.55500 simple_job veseli       r     06/20/2008 12:24:59     	1          
----------------------------------------------------------------------------
all.q@romeo.psvm.univa.com     BP    0/1       0.39     lx24-amd64    
----------------------------------------------------------------------------
all.q@yorick.psvm.univa.com    BP    0/1       0.45     lx24-amd64    
----------------------------------------------------------------------------
headnodes.q@petruchio.psvm.uni IP    0/1       0.15     lx24-amd64    
----------------------------------------------------------------------------
special.q@horatio.psvm.univa.c BIP   0/1       0.36     lx24-amd64    

I should point out that DRMAA is designed to be independent of any particular DRM. Those users that need job submission features or flags specific to Grid Engine can either use the “native specification” attribute, or they can use the “job category” attribute together with “qtask” files. In order to set native specification attribute in java one would use setNativeSpecification() method of the JobTemplate class (before the job submission line in the code):

jt.setNativeSpecification("-q special.q");

This method, however, makes your application dependent on the specific DRM you are working with at the moment. The above line will be interpreted correctly by Grid Engine, but may not be understood by other DRMs. In most cases a better solution is to use the job category attribute instead, and specify the DRM-dependent flags in the qtask file. For example, in order to submit your job to a particular Grid Engine queue in the java code one would have something like

jt.setJobCategory("special");

and use the qtask file to translate the “special” job category into appropriate Grid Engine flags:

$ cat ~/.qtask
special -q special.q

The cluster global qtask file (defines cluster wide defaults) in UCE resides at $GLOBUS_LOCATION/sge/default/common/qtask. As shown above, user-specific qtask files that override and enhance cluster-wide definitions are found at ~/.qtask.

منبع : http://gridgurus.typepad.com

 

Aromatic Clouds

Aromatic Clouds?

conference_000003749151XSmall.jpg

If you weren’t at OSGC you missed a number of interesting presentations. From my perspective, one of the most intriguing technologies was EUCALYPTUS: Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems.

Before I go on, I would like you to notice that anybody who is able to make an acronym out of eucalyptus has some time on their hands. Fortunately, they used this time to implement an open-source infrastructure for Elastic Computing. In particular, the goal of the project is to, "foster community research and development of Elastic/Utility/Cloud service implementation technologies, resource allocation strategies, service level agreement (SLA) mechanisms and policies, and usage models."

In my opinion, the most interesting facets of this project are:

  • It is compatible with the Amazon EC2 tools out of the box yet it is agnostic and thus is capable of supporting any number of client interfaces;
  • Any team can assemble a development environment for tools that they wish to deploy to the EC2 Cloud;
  • A group could create their own Cloud system which could use EC2 for Utility computing resources;
  • It is the first step towards creating an open-standard for Cloud computing.

My hope is that this project will not only get us all thinking about what we really need from a Cloud but also what we could improve... I plan to start working with this software as soon as it is available later this month.

منبع : http://gridgurus.typepad.com

 

ابر رايانه به تشخيص پوکي استخوان کمک مي کند

ابر رايانه به تشخيص پوکي استخوان کمک مي کند
 
تهران-خبرگزاری جمهوری اسلامی (ایرنا): دانشمندان سوييسي از دستاورد جديدي در شبيه سازي ابر رايانه اي خبر دادند که مي تواند تا اندازه زيادي به تشخيص و درمان پوکي استخوان کمک کند.

به گزارش خبرگزاري يونايتدپرس، در حاليکه پوکي يا کاهش تراکم استخوان همراه با افزايش خطر شکستگي معمولا در مراحل پيشرفته تشخيص داده مي شود، محققان "موسسه فناوري سوييس" و " آزمايشگاه تحقيقاتي IBM زوريخ" گفتند، تنها در چند دقيقه جامع ترين شبيه سازي ساختارهاي استخواني انسان را نشان دادند.
دانشمندان گفتند که دستاورد آنها مي تواند به ساخت ابزارهاي باليني بهتري براي تشخيص و درمان عارضه پوکي استخوان منجر شود که شايع ترين بيماري استخواني است .
IBM همچنين در بيانيه اي اعلام کرد: با اين شبيه سازي ها محققان مي توانند نوعي "نقشه حرارتي " پويا از استحکام استخوان تهيه کنند که دقيقا نشان مي دهد کجا ساختار استخوان آسيب مي بيند و احتمالا چه باري موجب شکستگي مي شود.
برپايه اين بيانيه، از چنين شبيه سازي هاي قوي مي توان مرتب در توموگرافي هاي آتي رايانه اي استفاده کرد و اين شبيه سازي ها توانايي پزشکان را براي تجزيه و تحليل خطر شکستگي ها افزايش مي دهد و از اين رو موجب بهبود درمان مي شود.

 

About Grid Engine Advanced Reservations

About Grid Engine Advanced Reservations

code_000000237891Small.jpgAdvanced reservation (AR) capability is one of the most important new features of the upcoming Grid Engine 6.2 release. New command line utilities allow users and administrators to submit resource reservations (qrsub), view granted reservations (qrstat), or delete reservations (qrdel). Also, some of the existing commands are getting new switches. For example, the “-ar “ option for qsub indicates that the submitted job is a part of an existing advanced reservation. Given that AR is a new functionality, I thought that it might be useful to describe how it works on a simple example (using 6.2 Beta software). Advanced resource reservations can be submitted to Grid Engine by queue operators and managers, and also by a designated set of privileged users. Those users are defined in ACL “arusers”, which by default looks as follows:

$ qconf -sul
arusers
deadlineusers
defaultdepartment
$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries NONE

The “arusers” ACL can be modified via the “qconf -mu” command:

$ qconf -mu arusers
veseli@tolkien.ps.uud.com modified "arusers" in userset list
$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries veseli

Once designated as a member of this list, the user is allowed to submit ARs to Grid Engine:

[veseli@tolkien]$ qrsub -e 0805141450.33 -pe mpi 2
Your advance reservation 3 has been granted
[veseli@tolkien]$ qrstat
ar-id   name       owner        state start at             end at               duration
-----------------------------------------------------------------------------------------
      3            veseli       r     05/14/2008 14:33:08  05/14/2008 14:50:33  00:17:25
[veseli@tolkien]$ qstat -f 
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/0/4          0.04     lx24-x86      

For the sake of simplicity, in the above example we have a single queue (all.q) that has 4 job slots and a parallel environment (PE) mpi assigned to it. After reserving 2 slots for the mpi PE, there are only 2 slots left for running regular jobs until the above shown AR expires. Note that the "–e" switch for qrsub designates requested reservation end time in the format YYMMDDhhmm.ss. It is also worth pointing out that the qstat output changed slightly with respect to previous software releases in order to accommodate display of existing reservations. If we now submit several regular jobs, only 2 of them will be able to run:

[veseli@tolkien]$ qsub regular_job.sh 
Your job 15 ("regular_job.sh") has been submitted
...
[veseli@tolkien]$ qsub regular_job.sh 
Your job 19 ("regular_job.sh") has been submitted
[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/2/4          0.03     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1        

However, if we submit jobs that are part of the existing AR, those are allowed to run, while jobs submitted earlier are still pending:

[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 20 ("reserved_job.sh") has been submitted
[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 21 ("reserved_job.sh") has been submitted
[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/4/4          0.02     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     20 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        
     21 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        
############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1        

The above example illustrates how ARs work. As long as particular reservation is valid, only jobs that are designated as part of it can utilize resources that have been reserved. I think that AR will prove to be extremely valuable tool for planning grid resource usage, and I’m very pleased to see it in the new Grid Engine release.

منبع : http://gridgurus.typepad.com

 

Steaming Java

Steaming Java

code_000000237891Small.jpg

When Rich asked us to walk through a software development process, I immediately thought back to a conversation that I had with my friend Leif Wickland about building high-performance Java applications. So I immediately emailed him asking him for his best practices. We have both produced code that is as fast, if not faster than C compiled with optimization (for me it was using a 64-bit JRE on a x86_64 architecture with multiple cores).

That is not to say that if you were to spend time optimizing the equivalent C-code that it would not be made to go faster. Rather, the main point is that Java is a viable HPC language. On a related note, Brian Goetz of Sun has a very interesting discussion on IBM's DeveloperWorks, Urban performance legends, revisited on how garbage collection allows faster raw allocation performance.

However I digress… Here is a summary of what we both came up with (in no particular order):

           

  1. It is vitally important to "measure, measure, measure," everything you do.  We can offer any set of helpful hints but the likelihood that all of them should be applied is extremely low.        
  2. It is equally important to remember to only optimize areas in the program that are bottlenecks. It is a waste of development time for no real gain.        
  3. One of the most simple and overlooked things that help your application is to overtly specify method parameters that are read-only using the final modifier. Not only can it help the compiler with optimization but it also is a good way of communicating your intentions to your teammates. Furthermore, i f you can make your method parameters final, this will help even more. One thing to be aware of is that not all things that are declared final behave as expected (see Is that your final answer? for more detail).        
  4. If you have states shared between threads, make whatever you can final so that that the VM takes no steps to ensure consistency. This is not something that we would have expected to make a difference, but it seems to help.        
  5. An equally ignored practice is using the finally clause. It i s very important to clean up the code in a try block. You could leave open streams, SQL queries, or perhaps other objects lying around taking up space.        
  6. Create your data structures and declare your variables early. A core goal is to avoid allocating short-lived variables. While it is true that the garbage collector may reserve memory for variables that are declared often, why make it have to try to guess your intentions. For example, if a loop is called repeatedly, there is no need to say, for (int i = 0; … when you should have declared i earlier. Of course you have to be careful not to reset counters from inside of loops.        
  7. Use static for values that are constants. This may seem obvious, but not everybody does.        
  8. For loops embedded within other loops:                

                              

    • Replace your outer loop with fixed-pool of threads. In the next release of java, this will be even easier using the fork-join keywords. This has become increasingly important with processors with many cores.                         
    • Make sure that your innermost loop is the longest even if it doesn't necessarily map directly to the business goals. You shouldn't force the program to create a new loop too often as it wastes cycles.        
    • Unroll your inner-loops. This can save an enormous amount of time even if it isn't pretty. The quick test I just ran was 300% faster. If you haven' t unrolled a loop before, it is pretty simple:        
              unrollRemainder = count%LOOP_UNROLL_COUNT;
             
              for( n = 0; n < unrollRemainder; n++ ) {
                  // do some stuff here.
              }
             
              for( n = unrollRemainder; n < count; n+=LOOP_UNROLL_COUNT ) {
                  // do stuff for n here
                  // do stuff for n+1 here
                  // do stuff for n+2 here
                  …
                  // do stuff for n+LOOP_UNROLL_COUNT - 1 here
              }
              Notice that both n and unrollRemainder were declared earlier as recommended previously.
           
  9. Preload all of your input data and then operate on it later. There is absolutely no reason that you should be loading data of any kind inside of your main calculation code. If the data doesn't fit or belong on one machine, use a Map-Reduce approach to distribute it across the Grid.        
  10. Use the factory pattern to create objects.                

                              

    • Data structures can be created ahead of time and only the necessary pieces are passed to the new object.                         
    • Any preloaded data can also be segmented so that only the necessary parts are passed to the new object.                         
    • You can avoid the allocation of short-lived variables by using constructors with the final keyword on its parameters.                         
    • The factory can perform some heuristic calculations to see if a particular object should even be created for future processing.
           
  11. When doing calculations on a large number of floating-point values, use a byte array to store the data and a ByteWrapper to convert it to floats. This should primarily be used for read only (input) data. If you are writing floating-point values you should do this with caution as it may take more time than using a float array. One major advantage that Java has when you use this approach is that you can switch between big and little-endian data rather easily.        
  12. Pass fewer parameters to methods. This results in less overhead. If you can pass a static value it will pass one fewer parameter.        
  13. Use static methods if possible. For example, a FahrenheitToCelsius(float fahrenheit); method could easily be made static. The main advantage here is that the compiler will likely inline the function.        
  14. There is some debate whether you should make particular methods final if they are called often. There is a strong argument to not do this because the enhancement is small or nonexistent (see Urban Performance Legends or once again Is that your final answer?). However my experience is that a small enhancement on a calculation that is run thousands of times can make a significant difference. Both Leif and I have seen measurable differences here. The key is to benchmark your code to be certain.

منبع : http://gridgurus.typepad.com

 

Grid Interoperability and Interoperation

Grid Interoperability and Interoperation

integration_000006229427Small.jpg

The high expectations raised by grid computing have favored the development and deployment of a growing number of grid infrastructures and middlewares. However, the interaction between these grids is still limited, so reducing the potential large-scale application of grid technology, in spite of efforts made by grid community. In this sense, the Open Grid Forum (OGF) is developing open standards for grid software interoperability, while the OGF's Grid Interoperation Now Community Group (GIN-CG) is coordinating a set of interoperation efforts among production grids. It is therefore clear that, according to OGF (as Laurence Field explains in his article entitled "Getting Grids to work together: interoperation is key to sharing"), there is a big difference between these two terms:

  • Interoperability is the native ability of grids and grid technologies to interact directly via common open standards.
  • Interoperation is a set of techniques to get production grid infrastructures to work together in the short term.

Since most common open standards to provide grid interoperability are still being defined and only a few have been consolidated, grid interoperation techniques, like adapters and gateways, are needed. An adapter is, according to different dictionaries of computer terms, “a device that allows one system to connect to and work with another”. On the other hand, a gateway is conceptually similar to an adapter, but it is implemented as an independent service, acting as a bridge between two systems. The main drawback of adapters is that grid middleware or tools must be modified to insert the adapters. Gateways can be accessed without changes on grid middleware or tools, but they can become a single point of failure or a scalability bottleneck.

GridWay provides support for some of the few established standards like DRMAA, JSDL or WSRF to achieve interoperability but, in the meanwhile, it also provides components to allow interoperation, like Middleware Access Drivers (MADs) acting as adapters for different grid services, and the GridGateWay, which is a WSRF GRAM service encapsulating an instance of GridWay, thus providing a gateway for resource management services.

GridWay 4.0.2, coinciding with the release of Globus Toolkit 4 and its new WS GRAM service, introduced an architecture for the execution manager module based on a MAD (Middleware Access Driver) to interface several grid execution services, like pre-WS GRAM and WS GRAM, even simultaneously. That architecture was presented in the paper entitled "A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services" (E. Huedo, R. S. Montero and I. M. Llorente). GridWay 5.0 took advantage of this modular architecture to implement an information manager module with a MAD to interface several grid information services, and a transfer manager module with a MAD to interface several grid data services. Moreover, the scheduling process was decoupled from the dispatch manager through the use of an external and selectable scheduler module.

GridWay components

The resulting architecture, which is shown above, provides direct interoperation between different middleware stacks. In fact, we demonstrated at OGF22 the interoperation of three important grid infrastructures, namely EGEE (gLite-based), TeraGrid and OSG (both Globus-based), being coordinately used through a single GridWay instance by means of the appropriate adapters. To set an example, the application was written using the DRMAA OGF standard. GridWay documentation provides a lot of information on how to integrate GridWay in the main middleware stacks, like gLite, pre-WS and WS Globus, or ARC, and provides information on how to develop new drivers for other middlewares.

OGF22 interoperation demo

Regarding the GridGateWay, it is being used for provisioning resources from several infrastructures. For example, the German Astronomy Community Grid (GACG or AstroGrid-D) uses a GridGateWay as a central resource broker, providing metascheduling functionality to Globus-based submission tools (e.g. for workflow execution) without modification. GridAustralia also uses a GridGateWay as a WSRF interface for its central GridWay Metascheduler instance, allowing reliable, remote job submission.

Astrogrid-D metascheduling architecture
Picture by AstroGrid-D

More information about the GridGateWay component is provided in its web page, as well as in this blog entry, which shows how to build Utility Computing infrastructures with this Globus-based gateway technology.


Eduardo Huedo

Reprinted from blog.dsa-research.org

منبع : http://gridgurus.typepad.com

Grid Engine 6.2 Beta Release

Grid Engine 6.2 Beta Release

package_000005071512XSmall.jpgGrid Engine 6.2 will come with some interesting new features. In addition to advance resource reservations and array job interdependencies, this release will also contain a new Service Domain Manager (SDM) module, which will allow distributing computational resources between different services, such as different Grid Engine clusters or application servers. For example, SDM will be able to withdraw unneeded machines from one cluster (or application server) and assign it to a different one or keep it in its “spare resource pool”. It is also worth mentioning that Grid Engine (and SDM) documentation is moving to Sun’s wiki. The 6.2 beta release is available for download here.

منبع : http://gridgurus.typepad.com

 

About Parallel Environments in Grid Engine

About Parallel Environments in Grid Engine

Support for parallel jobs in distributed resource management software is probably one of those features that most people do not use, but those who do appreciate it a lot. Grid Engine supports parallel jobs via parallel environments (PE) that can be associated with cluster queues. New parallel environment is created using the qconf -ap command, and editing the configuration file that pops up. Here is an example of a PE slightly modified from the default configuration:

$ qconf -sp simple_pe
pe_name           simple_pe
slots             4
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $round_robin
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min

In the above example, “slots” defines number of parallel tasks that can be run concurrently. The “user_lists” (“xuser_lists”) parameter should be a comma-separated list of user names that are allowed (denied) use of the given PE. If “user_lists” is set to NONE, any user that is not explicitly disallowed via the “xuser_lists” parameter. The “start_proc_args” and “stop_proc_args” represent command line of startup and shutdown procedures for the parallel environment. These commands are usually scripts customized for a specific parallel library intended for a given PE. They get executed for each parallel job, and are used, for example, start any necessary daemons that enable parallel job execution. The standard output (error) of these commands are redirected into .po(pe). files in the job’s working directory, which is usually user’s home directory. It is worth noting that the customized PE startup and shutdown scripts can make use of several internal variables, such as $pe_hostfile and $job_id, that are relevant for the parallel job. The $pe_hostfile variable in particular points to a temporary file that contains list of machines and parallel slots allocated for the given job. For example, setting “start_proc_args” to “/bin/cp $pe_hostfile /tmp/machines.$job_id” would copy $pe_hostfile to the /tmp directory. Some of those internal variables are also available to job scripts as environment variables. In particular $PE_HOSTFILE and $JOB_ID environment variables will be set and will correspond to $pe_hostfile and $job_id, respectively. The “allocation_rule” parameter helps scheduler decide how to distribute parallel processes among the available machines. It can take an integer that fixes the number of processes per host, or special rules like $pe_slots (all processes have to be allocated on a single host), $fill_up (start filling up slots on the best suitable host, and continue until all slots are allocated), and $round_robin (allocate slots one by one on each allocated host in a round robin fashion until all slots are filled). The “control_slaves” parameter is slightly confusing. It indicates whether or not the Grid Engine execution daemon creates parallel tasks for a given application. In most cases (e.g., for MPI or PVM) this parameter should be set to FALSE, as custom Grid Engine PE interfaces are required for getting control of parallel tasks to work. Similarly, the “job_is_first_task” parameter is only relevant if control_slaves is set to TRUE. It indicates whether or not the original job script submitted execution is part of the parallel program. The “urgency_slot” parameter is used for jobs that request range of parallel slots. If an integer value is specified, that number is used as prospective slot amount. If “min”, “max”, or “avg” is specified, the prospective slot amount will be determined as the minimum, maximum or average of the slot range, respectively. After a parallel environment is configured and added to the system, it can be associated with any existing queue by setting the “pe_list” parameter in the queue configuration, and at this point users should be able to submit parallel job. On the GE project site one can find a number of nice How-To documents related to integrating various parallel libraries. If you do not have patience to build and configure one of those, but you would still like to see how stuff works, you can try adding a simple PE (like the one shown above) to one of your queues, and use a simple ssh-based master script to spawn and wait on the slave tasks:

#!/bin/sh
#$ -S /bin/sh
slaveCnt=0
while read host slots q procs; do
  slotCnt=0
  while [ $slotCnt -lt $slots ]; do
    slotCnt=`expr $slotCnt + 1`
    slaveCnt=`expr $slaveCnt + 1`
    ssh $host "/bin/hostname; sleep 10" > /tmp/slave.$slaveCnt.out 2>&1  &
  done
done < $PE_HOSTFILE
while [ $slaveCnt -gt 0 ]; do
  wait 
  slaveCnt=`expr $slaveCnt - 1`
done
echo "All done!"

After saving this script as "master.sh" and submitting your job using something like "qsub -pe simple_pe 3 master.sh" (where 3 is the number of parallel slots requested), you should be able to see your "slave" tasks running on the allocated machines. Note, however, that you must have password-less ssh access to the designated parallel compute hosts in order for the above script to work.

منبع : http://gridgurus.typepad.com

 

The Role of Open Source in Grid Computing

The Role of Open Source in Grid Computing

Grid Guru Ian Foster has a great piece in International Science Grid This Week. He talks about the significance of choosing open source licenses in the history of Globus, leading to a field dominated by open source software.

منبع : http://gridgurus.typepad.com

The MapReduce Panacea Myth?

The MapReduce Panacea Myth?

code_000000237891Small.jpg

Everywhere I go I read about how the MapReduce algorithm will and continues to change the world with its pure simplicity… Parallel programming is hard but MapReduce makes it easy... MapReduce: ridiculously easy distribute programming… Perhaps one day programming tools and languages will catch up with our processing capability but until then, MapReduce will allow us all to process very large datasets on massively parallel systems without having to bother with complicated interprocess communication using MPI. 

I am a skeptic, which is not to say I have anything against a generalized framework for distributing data to a large number of processors. Nor does it imply that I enjoy MPI and its coherence arising from cacophonous chatter (if all goes well). I just don’t think MapReduce is particularly "simple". The key promoters of this algorithm such as Yahoo and Google have serious-experts MapReducing their particular problem sets and thus they make it look easy.  You and your colleagues need to understand your data in some detail as well. I can think of a number of examples of why this is so.

First, let’s say that you are tasked with processing thousands of channels of continuously recorded broadband data from a VLBI based radio-telescope (or any other processing using beam-forming techniques for that matter). You cannot simply chop the data into nice time-based sections and send it off to be processed. Any signal processing that must be done to the data will produce terrible edge effects at each of the abrupt boundaries. Your file-splits must do something to avoid this behavior such as padding additional data on either side of the cut. This in turn will complicate the append phase after the processing is done. Thus you need to properly remove the padded data – if the samples do not align in a coherent way, then you will introduce a spike filled with energy into your result.

Alternatively, you might have been tasked with solving a large system of linear equations. For example say you are asked to produce a regional seismic tomography map with a resolution down to a few hundred meters using thousands of earthquakes each with tens of observations. You could easily produce a sparse system of equations that creates a matrix with something on the order of one million columns and several tens if not hundreds of thousands of rows. Distributed algorithms for solving such a system are well known but require our cranky friend MPI. However we can map this problem to several independent calculations as long as we are careful no to bias the input data as in the previous example. I will not bore you with the possibilities but suffice it to say that researchers have been producing tomographic maps for many years by carefully selecting the data and model calculated at any one time.

I know what many of you are thinking – I’ve read it before: MapReduce is meant for "non-scientific”"problems. But is a sophisticated search-engine any different? What makes it any less "scientific" than the examples I provided?  Consider a search-engine that maintains several (n) different document indexes distributed throughout the cloud. A user then issues a query which is mapped to n servers.  Let’s assume for the sake of time, each node returns its top m results to the reduce phase.  These m results are then sorted and returned to the user. The assumption here is that there is no bias in the distribution of indexed documents relevant to a user’s query.  Perhaps one or more documents beyond the first m found in one particular index are far more relevant than the other (n+1) * m results from the other indexes.  But the user will never know.  Should the search engine return every single result to the reduce phase at the expense of response time?  Is there a way to distribute documents to the individual indexes to avoid well-known (but not all) biases?  I suggest that these questions are the sorts of things that give one search-engine an edge over another.  Approaches to these sorts of issues might well be publishable in referred journals.  In other words, it sounds scientific to me.

I hope that by now you can see why I say that using MapReduce is only simple if you know how to work with (map) your data (especially if it is wonderfully-wacky).  There is an inherent risk of bias in any map reduce algorithm. Sadly this implies that processing data in parallel is still hard no matter how good of a programmer you are nor how sophisticated your programming language is.

منبع : http://gridgurus.typepad.com

Open Evolution

Open Evolution

happy-monkey_000005269056XSmall.jpg

Proprietary standards can bring success at first but cannot last. At least that is the conclusion we are forced to draw from two interesting articles in the 22 March issue of the Economist: Break down these walls and Everywhere but nowhere. I highly recommend that you read them particularly if you think that Ian’s Grid definition requiring open-standards is debatable.

The core lesson comes from the original big players in the nascent internet such as AOL, CompuServe, and Prodigy. These companies provided their users with electronic mail (not necessarily what we consider email today), chat rooms, discussion boards, and access to a wide-range of information. However these services were restricted to users of each particular service. You simply could not access information from one provider if you subscribed to another.

However, it was not long before products based upon open standards that provided these same services (and more) became more attractive to users simply because they allowed people to venture outside of the closed communities to which they subscribed. Once these users got out, they never turned back. The original content-providers became nothing more than access points to the web. Consequently these service providers quickly lost their luster and thus their valuation. Only AOL was able to (and still struggles to) survive, having redefined itself as a web-portal with paid advertising – just like the services that nearly killed it.

Today, the hottest products in the digital world are the social-networking sites like Facebook and MySpace as well as virtual worlds such as Second Life. Their popularity and usefulness to individuals has given them significant momentum in the marketplace as the “next big-thing”. Consequently these companies have been given enormous valuations despite having no business model beyond the fact that they have hordes of captive-users. While these products typically come with an API so that users can add useful and interesting features, it is no substitute for true-operational freedom.  People want to interact others without having to switch systems or maintain two distinct profiles.

How long will it be before social-networking products appear that are not only based upon open-standards but also offering better features and more accessibility?  You can bet that it will be soon given the amount of potential money involved.  Then the reckoning will come and these companies, once flying high, will either be forced to adapt or perish.

What does this teach us about computing beyond the desktop, howsoever you wish to define it, be that a Grid, Cloud, or whatnot?  Personally, I think it is clear: we must develop to open-standards or perish. I cannot see how the Grid market is immune to pressures of interoperability and freedom of choice.  To paraphrase the Economist, why stay within a closed community when you can roam outside its walled garden, into the wilds of open computing!!!

I hope to see you all at the Open Source Grid and Cluster Conference.

منبع : http://gridgurus.typepad.com

 

There's an Analyst Lurking in that Business

There's an Analyst Lurking in that Business

analyst_000005636961XSmall.jpg

I recently read an editorial from Grid Today (GT) based upon conversations with Forrester’s Frank Gillett suggesting that interest in Grid computing is waning. I will not dispute the veracity of this claim; rather I will leave that to the people such as the HPC Today editorial staff who have access to the Forrester report.  Irrespective of the actual level of interest that buyers have in the Grid, I was rather baffled by the reasons that Grid Today provided for the general "malaise".

The first reason that GT offers is that, "grid computing is, in general, beneficial to vertically specific applications." More specifically, they indicate that there are limited sets of applications that could benefit from grid computing. I am assuming that the set of applications that they are referring to are those which require high-performance parallel calculations as well as any algorithm that can use the Map-Reduce pattern to distribute the computational load across many servers.

So which classes of applications do not work well on the grid? Clearly Service Oriented Architectures (SOA) works well on the Grid. In fact the Globus Toolkit, a popular software toolkit for building grids, uses SOA at its core.

Yet I believe that any n-tier application run on a Grid has many advantages. For example, imagine a web-based application with a supporting relational database that is required to scale under significant user loads including the number of connections but also the complexity of the requested services. Also imagine that clusters of users in different regions will use this application.

First of all, it would be nice for us to provide the data-services of this application using a SOA.  Doing so allows us to expose the data through a single access-layer.  Thus any program can access the data using the same business rules without tying it to a single-application interface.  Secondly, if users require any complex reports or other heavy-duty calculations, a single web-server might easily be overwhelmed and thus forced out of the rotation until the process completes.  A better solution would be for the web-server to farm these sorts of operations out to the Grid – maybe even using a Map-Reduce pattern.  Furthermore adding Grid capacity is an easy way to handle high-peak loads of the application.  These resources could be used by other projects during the off-peak periods.  Lastly, the grid could coordinate resources that are proximate to the regional user-clusters and thus reduce communication latency for any data that needs to be exchanged without having to keep copies of the web or data-infrastructure throughout the enterprise.

If there are advantages to running your n-tier applications on the grid, it is not much of a stretch architecturally to extend that to other classes of application.  I could not imagine implementing a SaaS (Software as a Service) application on anything but a grid.  Having said that, I don’t believe that an application needs to be complicated to run better on a Grid.  Rather, I think any application that users rely on is a good candidate.

Many "desktop" applications not only can be run on the grid but also are more appropriate to do so.  Data centric applications are the prime candidates that come to mind. First of all, keeping results on your desktop all but kills collaboration between users because it is likely on an high-latency low-availability network, may be a separate security-domain and thus inaccessible to many users, and could be shutdown at any time.  In addition, if an application reads and/or writes significant amounts of important data, it is best to keep it in the data-center on reliable and, more-importantly, regularly backed-up storage.  Of course, the application could write across the typical high-latency low-availability desktop network into the datacenter, but that is fraught with problems.  Personally I believe that perhaps the most significant source of user frustration is "network drives" – but I digress.  If an application’s calculations take any significant resources, the user’s desktop quickly becomes a bottleneck.  Even if the user’s machine is beefy enough to handle running a job while still allowing access to email, they are still hardware limited.  In particular, if the application can be submitted in batch to the grid, the user could literally submit dozens if not hundreds of individual calculations and get the results in a fraction of the time it would take on their desktop.  Lastly, running jobs at the datacenter frees users from using a single desktop.  Rather, they can manage their computing from any location, which provides them significantly more freedom.

All of this brings me to GT’s second key assertion: that the term Grid has been, "bandied about so much that no one knows what it means or what business benefits they might derive from it."  This is indeed the core challenge. My experience is that very few business proponents specify software-architectures. Generally they could care less whether a salesperson is pushing SOA, Grid, Cloud, SaaS, or whatnot.  These are the concerns of people who support business-lines: CTOs, IT support-managers, etc.

Chances are you are not dealing with these sorts of technical folks when you are drafting a proposal.  Rather, you are likely speaking with a business-analyst. The ones I know are not easily charmed by buzzwords (even if their bosses or peers are).  They are more than aware that terms mean different things to different vendors and their staff.

Frankly they don’t care about your pet technology.  Instead, they have a set of goals and a given budget.  They are measured on how well the project met the user’s needs, how under-budget it came in, and how much time it took.  If any one proposal that they have happens to align with other business initiatives of which they are aware, then they will consider the advantages as well as the costs of implementing it.  We all know that individual business groups tend to go their own ways, particularly in large companies.  We are not going to corral them with the "Grid".

Yet there is plenty of hope for us.   We Grid proponents should focus on providing small group-level systems that are quickly setup, scale easily, and meet the customer’s defined business goals.   These implementations do not need to fall under the traditional association that Grid has with high-performance computing (HPC): HPC is not often amongst the business goals.  However if the group Grid is built using open-standards, has a resource manager, and allows for the provisioning of global management systems (e.g. authentication domains), it is easy for the technical types to incorporate this small-Grid into an enterprise-wide effort.  This is how we can sell the Grid.

منبع : http://gridgurus.typepad.com

 

OpenNEbula and VWS

OpenNEbula and VWS

robot-handshake_000003470462XSmall.jpg

Few days ago authors of the GridWay Metascheduler released Technology Preview of their OpenNEbula Virtual Infrastructure Engine (ONE), which enables deployment and management of virtual machines on a pool of physical resources. The software is very similar to the Globus Virtual Workspace Service (VWS), both in architecture and functionality. Both systems provide new service layer on top of the existing virtualization platforms (currently they support only the Xen hypervisor). This layer extends functionality of the underlying Virtual Machine Monitors (VMMs) from a single machine to a VM provisioning cluster. Both ONE Engine and VWS utilize passwordless SSH access to manage pool of nodes running VMMs, and allow system administrators to deploy new VMs, to start/shutdown and suspend/resume already deployed VMs, as well as to migrate VMs from one physical host to another. The most notable difference between ONE and VWS is that VWS is built on top of the GT infrastructure, and runs within the GT java container. This allows, for example, using RFT for stage-in/stage-out requests to be sent along with the workspace creation requests. On the other hand, the ONE Engine is a standalone service and its installation requirements include only a few software packages that are already present in most linux distributions.

منبع : http://gridgurus.typepad.com

 

Ten More Reasons to go to Oakland

Ten More Reasons to go to Oakland

conference_000003749151XSmall.jpg

Rich Wellner came up with four reasons to attend the Open Source Grid and Cluster Conference, to be held in Oakland May 12-16. I outdid him and came up with 10:

1) Globus program is fantastic, including tutorials, advanced technical presentations, contributed talks, and community events on every aspect of Globus.

2) Gobs of other material on Sun Grid Engine and Rocks, and other open source grid and cluster software.

3) Gathering: A great opportunity to meet colleagues, peers, collaborators from the grid and cluster community. The only grid meeting in the US the rest of this year--the next two OGFs are in Spain (June) and Singapore (September).

4) GT4.2: You'll get to learn about the exciting new features in Globus Toolkit 4.2. New execution, data, security, information, virtualization, and core services.

5) Gratfication (immediate) as you get to provide your input on future directions for Globus, Sun Grid Engine, Rocks, and other open source systems--and maybe sign up to contribute to those developments.

6) Grid solutions: You'll get to meet the people using Globus to build enterprise grid solutions in projects like caBIG, TeraGrid, Earth System Grid, MEDICUS, and LIGO, and learn about solution tools like Introduce, MPI-G, Swift, Taverna, and UniCluster.

7) Gurus: You get to grill the Globus gurus--or, if you prefer, show off your own Globus guru status.

8) Great price: $490 registration is substantially cheaper than OGF or HPDC, for example, and the hotel rate is reasonable ($149).

9) Gorgeous location: Oakland is easy to get to -- SFO (with easy BART  train ride), Oakland, and San Jose airports also nearby. Just a 10 minute train ride to download San Francisco. A lovely time to be in the Bay Area.

10) Gorilla and guerilla free: None of the corporate marketing talks that diluted the last GridWorld conference--apart from two sponsor talks, this is pure tech, and highly useful tech at that!

منبع : http://gridgurus.typepad.com

A Grid OS

A Grid OS?

package_000005071512XSmall.jpg I have recently been working on a test plan for a framework designed to deliver applications to grid users. The framework is useful for the specific environment in which the customer operates. However it has led me to imagine something more generic that anybody who manages a Grid intended for use by a diverse community would find useful.

You need to have a solid software infrastructure consisting of compilers, libraries, middleware, languages, and services. Your customers want to be able to run the applications that suit their goals best with as little fuss as possible. These include off-the-shelf, commercial customizations, open-source, freeware, supported in-house, and individually built software packages.

While there may be few interoperability issues within a small group or company, you can bet that not all programs will play well with others. Some applications will require very specific libraries and middleware while others will prove to be quite flexible. Some applications require supporting software for 64-bit architectures while others need 32-bit. Other software has different feature-sets on different hardware (e.g. SPARC versus x86) as well as software (e.g. Linux versus IRIX) systems. Still other applications, particularly those that are on long development cycles, tend to use older feature sets whose behavior may have changed or been eliminated from subsequent package releases. Meanwhile your in-house developers might be working on the bleeding-edge and therefore use software that is too unstable for the general user community. Face it: very few software developers expect their products to co-exist with others.

This is a big challenge for anybody who is expected to create a shared-computing environment for a big user community. Typically system administrators will create an operating-system image based upon anticipated usage patterns, security, stability, feature-sets, and availability. They will have specific builds for their web-farm, mail-servers, storage-nodes, and (most importantly) for our Grid computation nodes. They would also like to be proactive and keep their systems up to the latest security and bug-fix patch levels. In addition, they are going to try to provide the best product they can; therefore they would like to provide the most feature-rich infrastructure with which they feel-comfortable. However, and most importantly, they will use a package manager to maintain software releases on their machines. Why would any system manager want to reinvent the wheel when it comes to building software when the vendors will do it for them?

This last practice has a significant impact on the software you will find on the Grid. If the hardware vendor has a build for the software you use, chances are that is what you will get. These package managers tend to keep only one version of a particular software package on a system at a time. Consequently if a newer version of a package is desired, the older one is removed. Even if they tried to make multiple packages coexist, files would be overwritten. There are a few "compat" versions but these are exceptions.

Clearly, when your mandate is to provide a shared computing environment that has a significant number of processing nodes as well as users, you will have to provide a more substantive infrastructure. At this point you could either build specialized virtual machines for each operating environment or you can create a shared infrastructure that any image can use. Utility-computing players like Amazon have you create your own machine image (AMI) but I think it is unreasonable to expect application users to have the skills to create a proper operating environment.

The second option, creating a shared infrastructure that any image can use could be considered a grid operating system from scratch vis-à-vis Linux from scratch. This type of framework would force us to place our software into a categorized structure capable of differentiating operating systems, hardware architectures, and application versions. This infrastructure should not replace the standard installs for the operating system in order to avoid conflicts – providing application support for a grid is orthogonal to managing a compute node.

All of this needs to work without overtaxing your customers (i.e. application users). The typical user doesn’t care which operating environment they are provided as long as their software runs. Rather they would prefer to be able to call their application as if it were the only version using the only installed system libraries and middleware on the only supported compute node configuration. Basically if a user wishes to use an application, they simply want to call it by name: for example python and perhaps python-2.3.7 or python-2.4.5 should they require a particular version.

A big component of your effort in creating the proposed framework is providing the correct versions of libraries and middleware to your customers’ frontline applications; this is a task that demands specialized configuration scripts whose job is to set-up the operating environment to match the user request and the operating environment. There are a few tools out there that are quite capable of accomplishing something like this. However there is nothing that I am aware of whose goal it is to specifically deliver applications on a grid. Instead this class of tools provides far more flexibility than what is necessary, let alone wanted.

Ultimately I think that the best thing for the industry would be to establish a standard Grid directory structure for placing software in shared environments (e.g. //bin///-). A standard method for exposing applications should be decided upon as well. This could be anything from link-farms, to wrapper-scripts, or even environment set-up scripts. If this were to happen software developers and Grid administrators could create standardized packages including configuration scripts that would install into this framework. Setting up python would then be as easy as installing the standard packages for each desired operating environment and then calling "python".

منبع : http://gridgurus.typepad.com