تبليغاتX
XGRID تکنولوژی
 
XGRID تکنولوژی
 
 
پیاده سازی سیستم های توزیع شده
 


Using DRMAA with Unicluster Express

code_000000237891Small.jpgDistributed Resource Management Application API (DRMAA) is a high-level API that allows Grid applications to submit, monitor and control jobs to one or more DRM systems. Grid Engine comes with support for C/C++ and java, and one can also download bindings for ruby and python. There is also a nice collection of HowTos that should provide a great start for anyone looking to start writing DRMAA applications. The latest version of Unicluster Express (UCE) bundles Grid Engine 6.1u3, which is installed under $GLOBUS_LOCATION/sge. The $GLOBUS_LOCATION refers to the UCE installation directory (/usr/local/unicluster by default), and all of the DRMAA libraries and java files are located in the $GLOBUS_LOCATION/sge/lib directory. In order to run DRMAA applications, one has to set $LD_LIBRARY_PATH to point to the appropriate (architecture dependent) directory. For my development (64-bit linux) cluster with default UCE installation I used the following setup:

$ source /usr/local/unicluster/unicluster-user-env.sh
$ export LD_LIBRARY_PATH=/usr/local/unicluster/sge/lib/lx24-amd64
$ export JAVA_HOME=/opt/jdk
$ export PATH=$JAVA_HOME/bin:$PATH

A very simple example of a java DRMAA application that submits a job to Grid Engine is shown below:

$ cat SimpleJob.java 
import org.ggf.drmaa.DrmaaException;
import org.ggf.drmaa.JobTemplate;
import org.ggf.drmaa.Session;
import org.ggf.drmaa.SessionFactory;
public class SimpleJob {
  public static void main(String[] args) {
    SessionFactory factory = SessionFactory.getFactory();
    Session session = factory.getSession();
    try {
      session.init("");
      JobTemplate jt = session.createJobTemplate();
      jt.setRemoteCommand("/home/veseli/simple_job.sh");
      String id = session.runJob(jt);
      System.out.println("Your job has been submitted with id " + id);
    } 
    catch (DrmaaException e) {
      System.out.println("Error: " + e.getMessage());
    }
  }
}

One can compile and run the above example using something like the following:

$ javac -classpath /usr/local/unicluster/sge/lib/drmaa.jar SimpleJob.java 
$ java -classpath .:/usr/local/unicluster/sge/lib/drmaa.jar SimpleJob
Your job has been submitted with id 14
$ qstat -f 
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@horatio.psvm.univa.com   BP    1/1       0.36     lx24-amd64  
14 0.55500 simple_job veseli       r     06/20/2008 12:24:59     	1          
----------------------------------------------------------------------------
all.q@romeo.psvm.univa.com     BP    0/1       0.39     lx24-amd64    
----------------------------------------------------------------------------
all.q@yorick.psvm.univa.com    BP    0/1       0.45     lx24-amd64    
----------------------------------------------------------------------------
headnodes.q@petruchio.psvm.uni IP    0/1       0.15     lx24-amd64    
----------------------------------------------------------------------------
special.q@horatio.psvm.univa.c BIP   0/1       0.36     lx24-amd64    

I should point out that DRMAA is designed to be independent of any particular DRM. Those users that need job submission features or flags specific to Grid Engine can either use the “native specification” attribute, or they can use the “job category” attribute together with “qtask” files. In order to set native specification attribute in java one would use setNativeSpecification() method of the JobTemplate class (before the job submission line in the code):

jt.setNativeSpecification("-q special.q");

This method, however, makes your application dependent on the specific DRM you are working with at the moment. The above line will be interpreted correctly by Grid Engine, but may not be understood by other DRMs. In most cases a better solution is to use the job category attribute instead, and specify the DRM-dependent flags in the qtask file. For example, in order to submit your job to a particular Grid Engine queue in the java code one would have something like

jt.setJobCategory("special");

and use the qtask file to translate the “special” job category into appropriate Grid Engine flags:

$ cat ~/.qtask
special -q special.q

The cluster global qtask file (defines cluster wide defaults) in UCE resides at $GLOBUS_LOCATION/sge/default/common/qtask. As shown above, user-specific qtask files that override and enhance cluster-wide definitions are found at ~/.qtask.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  شنبه بیست و دوم تیر 1387ساعت 6:26 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Aromatic Clouds?

conference_000003749151XSmall.jpg

If you weren’t at OSGC you missed a number of interesting presentations. From my perspective, one of the most intriguing technologies was EUCALYPTUS: Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems.

Before I go on, I would like you to notice that anybody who is able to make an acronym out of eucalyptus has some time on their hands. Fortunately, they used this time to implement an open-source infrastructure for Elastic Computing. In particular, the goal of the project is to, "foster community research and development of Elastic/Utility/Cloud service implementation technologies, resource allocation strategies, service level agreement (SLA) mechanisms and policies, and usage models."

In my opinion, the most interesting facets of this project are:

  • It is compatible with the Amazon EC2 tools out of the box yet it is agnostic and thus is capable of supporting any number of client interfaces;
  • Any team can assemble a development environment for tools that they wish to deploy to the EC2 Cloud;
  • A group could create their own Cloud system which could use EC2 for Utility computing resources;
  • It is the first step towards creating an open-standard for Cloud computing.

My hope is that this project will not only get us all thinking about what we really need from a Cloud but also what we could improve... I plan to start working with this software as soon as it is available later this month.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  شنبه بیست و دوم تیر 1387ساعت 6:24 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

About Grid Engine Advanced Reservations

code_000000237891Small.jpgAdvanced reservation (AR) capability is one of the most important new features of the upcoming Grid Engine 6.2 release. New command line utilities allow users and administrators to submit resource reservations (qrsub), view granted reservations (qrstat), or delete reservations (qrdel). Also, some of the existing commands are getting new switches. For example, the “-ar “ option for qsub indicates that the submitted job is a part of an existing advanced reservation. Given that AR is a new functionality, I thought that it might be useful to describe how it works on a simple example (using 6.2 Beta software). Advanced resource reservations can be submitted to Grid Engine by queue operators and managers, and also by a designated set of privileged users. Those users are defined in ACL “arusers”, which by default looks as follows:

$ qconf -sul
arusers
deadlineusers
defaultdepartment
$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries NONE

The “arusers” ACL can be modified via the “qconf -mu” command:

$ qconf -mu arusers
veseli@tolkien.ps.uud.com modified "arusers" in userset list
$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries veseli

Once designated as a member of this list, the user is allowed to submit ARs to Grid Engine:

[veseli@tolkien]$ qrsub -e 0805141450.33 -pe mpi 2
Your advance reservation 3 has been granted
[veseli@tolkien]$ qrstat
ar-id   name       owner        state start at             end at               duration
-----------------------------------------------------------------------------------------
      3            veseli       r     05/14/2008 14:33:08  05/14/2008 14:50:33  00:17:25
[veseli@tolkien]$ qstat -f 
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/0/4          0.04     lx24-x86      

For the sake of simplicity, in the above example we have a single queue (all.q) that has 4 job slots and a parallel environment (PE) mpi assigned to it. After reserving 2 slots for the mpi PE, there are only 2 slots left for running regular jobs until the above shown AR expires. Note that the "–e" switch for qrsub designates requested reservation end time in the format YYMMDDhhmm.ss. It is also worth pointing out that the qstat output changed slightly with respect to previous software releases in order to accommodate display of existing reservations. If we now submit several regular jobs, only 2 of them will be able to run:

[veseli@tolkien]$ qsub regular_job.sh 
Your job 15 ("regular_job.sh") has been submitted
...
[veseli@tolkien]$ qsub regular_job.sh 
Your job 19 ("regular_job.sh") has been submitted
[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/2/4          0.03     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1        

However, if we submit jobs that are part of the existing AR, those are allowed to run, while jobs submitted earlier are still pending:

[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 20 ("reserved_job.sh") has been submitted
[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 21 ("reserved_job.sh") has been submitted
[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/4/4          0.02     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     20 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        
     21 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        
############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1        

The above example illustrates how ARs work. As long as particular reservation is valid, only jobs that are designated as part of it can utilize resources that have been reserved. I think that AR will prove to be extremely valuable tool for planning grid resource usage, and I’m very pleased to see it in the new Grid Engine release.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 11:7 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Steaming Java

code_000000237891Small.jpg

When Rich asked us to walk through a software development process, I immediately thought back to a conversation that I had with my friend Leif Wickland about building high-performance Java applications. So I immediately emailed him asking him for his best practices. We have both produced code that is as fast, if not faster than C compiled with optimization (for me it was using a 64-bit JRE on a x86_64 architecture with multiple cores).

That is not to say that if you were to spend time optimizing the equivalent C-code that it would not be made to go faster. Rather, the main point is that Java is a viable HPC language. On a related note, Brian Goetz of Sun has a very interesting discussion on IBM's DeveloperWorks, Urban performance legends, revisited on how garbage collection allows faster raw allocation performance.

However I digress… Here is a summary of what we both came up with (in no particular order):

           

  1. It is vitally important to "measure, measure, measure," everything you do.  We can offer any set of helpful hints but the likelihood that all of them should be applied is extremely low.        
  2. It is equally important to remember to only optimize areas in the program that are bottlenecks. It is a waste of development time for no real gain.        
  3. One of the most simple and overlooked things that help your application is to overtly specify method parameters that are read-only using the final modifier. Not only can it help the compiler with optimization but it also is a good way of communicating your intentions to your teammates. Furthermore, i f you can make your method parameters final, this will help even more. One thing to be aware of is that not all things that are declared final behave as expected (see Is that your final answer? for more detail).        
  4. If you have states shared between threads, make whatever you can final so that that the VM takes no steps to ensure consistency. This is not something that we would have expected to make a difference, but it seems to help.        
  5. An equally ignored practice is using the finally clause. It i s very important to clean up the code in a try block. You could leave open streams, SQL queries, or perhaps other objects lying around taking up space.        
  6. Create your data structures and declare your variables early. A core goal is to avoid allocating short-lived variables. While it is true that the garbage collector may reserve memory for variables that are declared often, why make it have to try to guess your intentions. For example, if a loop is called repeatedly, there is no need to say, for (int i = 0; … when you should have declared i earlier. Of course you have to be careful not to reset counters from inside of loops.        
  7. Use static for values that are constants. This may seem obvious, but not everybody does.        
  8. For loops embedded within other loops:                

                              

    • Replace your outer loop with fixed-pool of threads. In the next release of java, this will be even easier using the fork-join keywords. This has become increasingly important with processors with many cores.                         
    • Make sure that your innermost loop is the longest even if it doesn't necessarily map directly to the business goals. You shouldn't force the program to create a new loop too often as it wastes cycles.        
    • Unroll your inner-loops. This can save an enormous amount of time even if it isn't pretty. The quick test I just ran was 300% faster. If you haven' t unrolled a loop before, it is pretty simple:        
              unrollRemainder = count%LOOP_UNROLL_COUNT;
             
              for( n = 0; n < unrollRemainder; n++ ) {
                  // do some stuff here.
              }
             
              for( n = unrollRemainder; n < count; n+=LOOP_UNROLL_COUNT ) {
                  // do stuff for n here
                  // do stuff for n+1 here
                  // do stuff for n+2 here
                  …
                  // do stuff for n+LOOP_UNROLL_COUNT - 1 here
              }
              Notice that both n and unrollRemainder were declared earlier as recommended previously.
           
  9. Preload all of your input data and then operate on it later. There is absolutely no reason that you should be loading data of any kind inside of your main calculation code. If the data doesn't fit or belong on one machine, use a Map-Reduce approach to distribute it across the Grid.        
  10. Use the factory pattern to create objects.                

                              

    • Data structures can be created ahead of time and only the necessary pieces are passed to the new object.                         
    • Any preloaded data can also be segmented so that only the necessary parts are passed to the new object.                         
    • You can avoid the allocation of short-lived variables by using constructors with the final keyword on its parameters.                         
    • The factory can perform some heuristic calculations to see if a particular object should even be created for future processing.
           
  11. When doing calculations on a large number of floating-point values, use a byte array to store the data and a ByteWrapper to convert it to floats. This should primarily be used for read only (input) data. If you are writing floating-point values you should do this with caution as it may take more time than using a float array. One major advantage that Java has when you use this approach is that you can switch between big and little-endian data rather easily.        
  12. Pass fewer parameters to methods. This results in less overhead. If you can pass a static value it will pass one fewer parameter.        
  13. Use static methods if possible. For example, a FahrenheitToCelsius(float fahrenheit); method could easily be made static. The main advantage here is that the compiler will likely inline the function.        
  14. There is some debate whether you should make particular methods final if they are called often. There is a strong argument to not do this because the enhancement is small or nonexistent (see Urban Performance Legends or once again Is that your final answer?). However my experience is that a small enhancement on a calculation that is run thousands of times can make a significant difference. Both Leif and I have seen measurable differences here. The key is to benchmark your code to be certain.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 11:6 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Grid Interoperability and Interoperation

integration_000006229427Small.jpg

The high expectations raised by grid computing have favored the development and deployment of a growing number of grid infrastructures and middlewares. However, the interaction between these grids is still limited, so reducing the potential large-scale application of grid technology, in spite of efforts made by grid community. In this sense, the Open Grid Forum (OGF) is developing open standards for grid software interoperability, while the OGF's Grid Interoperation Now Community Group (GIN-CG) is coordinating a set of interoperation efforts among production grids. It is therefore clear that, according to OGF (as Laurence Field explains in his article entitled "Getting Grids to work together: interoperation is key to sharing"), there is a big difference between these two terms:

  • Interoperability is the native ability of grids and grid technologies to interact directly via common open standards.
  • Interoperation is a set of techniques to get production grid infrastructures to work together in the short term.

Since most common open standards to provide grid interoperability are still being defined and only a few have been consolidated, grid interoperation techniques, like adapters and gateways, are needed. An adapter is, according to different dictionaries of computer terms, “a device that allows one system to connect to and work with another”. On the other hand, a gateway is conceptually similar to an adapter, but it is implemented as an independent service, acting as a bridge between two systems. The main drawback of adapters is that grid middleware or tools must be modified to insert the adapters. Gateways can be accessed without changes on grid middleware or tools, but they can become a single point of failure or a scalability bottleneck.

GridWay provides support for some of the few established standards like DRMAA, JSDL or WSRF to achieve interoperability but, in the meanwhile, it also provides components to allow interoperation, like Middleware Access Drivers (MADs) acting as adapters for different grid services, and the GridGateWay, which is a WSRF GRAM service encapsulating an instance of GridWay, thus providing a gateway for resource management services.

GridWay 4.0.2, coinciding with the release of Globus Toolkit 4 and its new WS GRAM service, introduced an architecture for the execution manager module based on a MAD (Middleware Access Driver) to interface several grid execution services, like pre-WS GRAM and WS GRAM, even simultaneously. That architecture was presented in the paper entitled "A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services" (E. Huedo, R. S. Montero and I. M. Llorente). GridWay 5.0 took advantage of this modular architecture to implement an information manager module with a MAD to interface several grid information services, and a transfer manager module with a MAD to interface several grid data services. Moreover, the scheduling process was decoupled from the dispatch manager through the use of an external and selectable scheduler module.

GridWay components

The resulting architecture, which is shown above, provides direct interoperation between different middleware stacks. In fact, we demonstrated at OGF22 the interoperation of three important grid infrastructures, namely EGEE (gLite-based), TeraGrid and OSG (both Globus-based), being coordinately used through a single GridWay instance by means of the appropriate adapters. To set an example, the application was written using the DRMAA OGF standard. GridWay documentation provides a lot of information on how to integrate GridWay in the main middleware stacks, like gLite, pre-WS and WS Globus, or ARC, and provides information on how to develop new drivers for other middlewares.

OGF22 interoperation demo

Regarding the GridGateWay, it is being used for provisioning resources from several infrastructures. For example, the German Astronomy Community Grid (GACG or AstroGrid-D) uses a GridGateWay as a central resource broker, providing metascheduling functionality to Globus-based submission tools (e.g. for workflow execution) without modification. GridAustralia also uses a GridGateWay as a WSRF interface for its central GridWay Metascheduler instance, allowing reliable, remote job submission.

Astrogrid-D metascheduling architecture
Picture by AstroGrid-D

More information about the GridGateWay component is provided in its web page, as well as in this blog entry, which shows how to build Utility Computing infrastructures with this Globus-based gateway technology.


Eduardo Huedo

Reprinted from blog.dsa-research.org

منبع : http://gridgurus.typepad.com

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 11:4 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Grid Engine 6.2 Beta Release

package_000005071512XSmall.jpgGrid Engine 6.2 will come with some interesting new features. In addition to advance resource reservations and array job interdependencies, this release will also contain a new Service Domain Manager (SDM) module, which will allow distributing computational resources between different services, such as different Grid Engine clusters or application servers. For example, SDM will be able to withdraw unneeded machines from one cluster (or application server) and assign it to a different one or keep it in its “spare resource pool”. It is also worth mentioning that Grid Engine (and SDM) documentation is moving to Sun’s wiki. The 6.2 beta release is available for download here.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 11:2 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام
About Parallel Environments in Grid Engine

Support for parallel jobs in distributed resource management software is probably one of those features that most people do not use, but those who do appreciate it a lot. Grid Engine supports parallel jobs via parallel environments (PE) that can be associated with cluster queues. New parallel environment is created using the qconf -ap command, and editing the configuration file that pops up. Here is an example of a PE slightly modified from the default configuration:

$ qconf -sp simple_pe
pe_name           simple_pe
slots             4
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $round_robin
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min

In the above example, “slots” defines number of parallel tasks that can be run concurrently. The “user_lists” (“xuser_lists”) parameter should be a comma-separated list of user names that are allowed (denied) use of the given PE. If “user_lists” is set to NONE, any user that is not explicitly disallowed via the “xuser_lists” parameter. The “start_proc_args” and “stop_proc_args” represent command line of startup and shutdown procedures for the parallel environment. These commands are usually scripts customized for a specific parallel library intended for a given PE. They get executed for each parallel job, and are used, for example, start any necessary daemons that enable parallel job execution. The standard output (error) of these commands are redirected into .po(pe). files in the job’s working directory, which is usually user’s home directory. It is worth noting that the customized PE startup and shutdown scripts can make use of several internal variables, such as $pe_hostfile and $job_id, that are relevant for the parallel job. The $pe_hostfile variable in particular points to a temporary file that contains list of machines and parallel slots allocated for the given job. For example, setting “start_proc_args” to “/bin/cp $pe_hostfile /tmp/machines.$job_id” would copy $pe_hostfile to the /tmp directory. Some of those internal variables are also available to job scripts as environment variables. In particular $PE_HOSTFILE and $JOB_ID environment variables will be set and will correspond to $pe_hostfile and $job_id, respectively. The “allocation_rule” parameter helps scheduler decide how to distribute parallel processes among the available machines. It can take an integer that fixes the number of processes per host, or special rules like $pe_slots (all processes have to be allocated on a single host), $fill_up (start filling up slots on the best suitable host, and continue until all slots are allocated), and $round_robin (allocate slots one by one on each allocated host in a round robin fashion until all slots are filled). The “control_slaves” parameter is slightly confusing. It indicates whether or not the Grid Engine execution daemon creates parallel tasks for a given application. In most cases (e.g., for MPI or PVM) this parameter should be set to FALSE, as custom Grid Engine PE interfaces are required for getting control of parallel tasks to work. Similarly, the “job_is_first_task” parameter is only relevant if control_slaves is set to TRUE. It indicates whether or not the original job script submitted execution is part of the parallel program. The “urgency_slot” parameter is used for jobs that request range of parallel slots. If an integer value is specified, that number is used as prospective slot amount. If “min”, “max”, or “avg” is specified, the prospective slot amount will be determined as the minimum, maximum or average of the slot range, respectively. After a parallel environment is configured and added to the system, it can be associated with any existing queue by setting the “pe_list” parameter in the queue configuration, and at this point users should be able to submit parallel job. On the GE project site one can find a number of nice How-To documents related to integrating various parallel libraries. If you do not have patience to build and configure one of those, but you would still like to see how stuff works, you can try adding a simple PE (like the one shown above) to one of your queues, and use a simple ssh-based master script to spawn and wait on the slave tasks:

#!/bin/sh
#$ -S /bin/sh
slaveCnt=0
while read host slots q procs; do
  slotCnt=0
  while [ $slotCnt -lt $slots ]; do
    slotCnt=`expr $slotCnt + 1`
    slaveCnt=`expr $slaveCnt + 1`
    ssh $host "/bin/hostname; sleep 10" > /tmp/slave.$slaveCnt.out 2>&1  &
  done
done < $PE_HOSTFILE
while [ $slaveCnt -gt 0 ]; do
  wait 
  slaveCnt=`expr $slaveCnt - 1`
done
echo "All done!"

After saving this script as "master.sh" and submitting your job using something like "qsub -pe simple_pe 3 master.sh" (where 3 is the number of parallel slots requested), you should be able to see your "slave" tasks running on the allocated machines. Note, however, that you must have password-less ssh access to the designated parallel compute hosts in order for the above script to work.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 11:1 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

The Role of Open Source in Grid Computing

Grid Guru Ian Foster has a great piece in International Science Grid This Week. He talks about the significance of choosing open source licenses in the history of Globus, leading to a field dominated by open source software.

منبع : http://gridgurus.typepad.com

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 11:0 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

The MapReduce Panacea Myth?

code_000000237891Small.jpg

Everywhere I go I read about how the MapReduce algorithm will and continues to change the world with its pure simplicity… Parallel programming is hard but MapReduce makes it easy... MapReduce: ridiculously easy distribute programming… Perhaps one day programming tools and languages will catch up with our processing capability but until then, MapReduce will allow us all to process very large datasets on massively parallel systems without having to bother with complicated interprocess communication using MPI. 

I am a skeptic, which is not to say I have anything against a generalized framework for distributing data to a large number of processors. Nor does it imply that I enjoy MPI and its coherence arising from cacophonous chatter (if all goes well). I just don’t think MapReduce is particularly "simple". The key promoters of this algorithm such as Yahoo and Google have serious-experts MapReducing their particular problem sets and thus they make it look easy.  You and your colleagues need to understand your data in some detail as well. I can think of a number of examples of why this is so.

First, let’s say that you are tasked with processing thousands of channels of continuously recorded broadband data from a VLBI based radio-telescope (or any other processing using beam-forming techniques for that matter). You cannot simply chop the data into nice time-based sections and send it off to be processed. Any signal processing that must be done to the data will produce terrible edge effects at each of the abrupt boundaries. Your file-splits must do something to avoid this behavior such as padding additional data on either side of the cut. This in turn will complicate the append phase after the processing is done. Thus you need to properly remove the padded data – if the samples do not align in a coherent way, then you will introduce a spike filled with energy into your result.

Alternatively, you might have been tasked with solving a large system of linear equations. For example say you are asked to produce a regional seismic tomography map with a resolution down to a few hundred meters using thousands of earthquakes each with tens of observations. You could easily produce a sparse system of equations that creates a matrix with something on the order of one million columns and several tens if not hundreds of thousands of rows. Distributed algorithms for solving such a system are well known but require our cranky friend MPI. However we can map this problem to several independent calculations as long as we are careful no to bias the input data as in the previous example. I will not bore you with the possibilities but suffice it to say that researchers have been producing tomographic maps for many years by carefully selecting the data and model calculated at any one time.

I know what many of you are thinking – I’ve read it before: MapReduce is meant for "non-scientific”"problems. But is a sophisticated search-engine any different? What makes it any less "scientific" than the examples I provided?  Consider a search-engine that maintains several (n) different document indexes distributed throughout the cloud. A user then issues a query which is mapped to n servers.  Let’s assume for the sake of time, each node returns its top m results to the reduce phase.  These m results are then sorted and returned to the user. The assumption here is that there is no bias in the distribution of indexed documents relevant to a user’s query.  Perhaps one or more documents beyond the first m found in one particular index are far more relevant than the other (n+1) * m results from the other indexes.  But the user will never know.  Should the search engine return every single result to the reduce phase at the expense of response time?  Is there a way to distribute documents to the individual indexes to avoid well-known (but not all) biases?  I suggest that these questions are the sorts of things that give one search-engine an edge over another.  Approaches to these sorts of issues might well be publishable in referred journals.  In other words, it sounds scientific to me.

I hope that by now you can see why I say that using MapReduce is only simple if you know how to work with (map) your data (especially if it is wonderfully-wacky).  There is an inherent risk of bias in any map reduce algorithm. Sadly this implies that processing data in parallel is still hard no matter how good of a programmer you are nor how sophisticated your programming language is.

منبع : http://gridgurus.typepad.com

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 10:59 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Open Evolution

happy-monkey_000005269056XSmall.jpg

Proprietary standards can bring success at first but cannot last. At least that is the conclusion we are forced to draw from two interesting articles in the 22 March issue of the Economist: Break down these walls and Everywhere but nowhere. I highly recommend that you read them particularly if you think that Ian’s Grid definition requiring open-standards is debatable.

The core lesson comes from the original big players in the nascent internet such as AOL, CompuServe, and Prodigy. These companies provided their users with electronic mail (not necessarily what we consider email today), chat rooms, discussion boards, and access to a wide-range of information. However these services were restricted to users of each particular service. You simply could not access information from one provider if you subscribed to another.

However, it was not long before products based upon open standards that provided these same services (and more) became more attractive to users simply because they allowed people to venture outside of the closed communities to which they subscribed. Once these users got out, they never turned back. The original content-providers became nothing more than access points to the web. Consequently these service providers quickly lost their luster and thus their valuation. Only AOL was able to (and still struggles to) survive, having redefined itself as a web-portal with paid advertising – just like the services that nearly killed it.

Today, the hottest products in the digital world are the social-networking sites like Facebook and MySpace as well as virtual worlds such as Second Life. Their popularity and usefulness to individuals has given them significant momentum in the marketplace as the “next big-thing”. Consequently these companies have been given enormous valuations despite having no business model beyond the fact that they have hordes of captive-users. While these products typically come with an API so that users can add useful and interesting features, it is no substitute for true-operational freedom.  People want to interact others without having to switch systems or maintain two distinct profiles.

How long will it be before social-networking products appear that are not only based upon open-standards but also offering better features and more accessibility?  You can bet that it will be soon given the amount of potential money involved.  Then the reckoning will come and these companies, once flying high, will either be forced to adapt or perish.

What does this teach us about computing beyond the desktop, howsoever you wish to define it, be that a Grid, Cloud, or whatnot?  Personally, I think it is clear: we must develop to open-standards or perish. I cannot see how the Grid market is immune to pressures of interoperability and freedom of choice.  To paraphrase the Economist, why stay within a closed community when you can roam outside its walled garden, into the wilds of open computing!!!

I hope to see you all at the Open Source Grid and Cluster Conference.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 10:57 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

There's an Analyst Lurking in that Business

analyst_000005636961XSmall.jpg

I recently read an editorial from Grid Today (GT) based upon conversations with Forrester’s Frank Gillett suggesting that interest in Grid computing is waning. I will not dispute the veracity of this claim; rather I will leave that to the people such as the HPC Today editorial staff who have access to the Forrester report.  Irrespective of the actual level of interest that buyers have in the Grid, I was rather baffled by the reasons that Grid Today provided for the general "malaise".

The first reason that GT offers is that, "grid computing is, in general, beneficial to vertically specific applications." More specifically, they indicate that there are limited sets of applications that could benefit from grid computing. I am assuming that the set of applications that they are referring to are those which require high-performance parallel calculations as well as any algorithm that can use the Map-Reduce pattern to distribute the computational load across many servers.

So which classes of applications do not work well on the grid? Clearly Service Oriented Architectures (SOA) works well on the Grid. In fact the Globus Toolkit, a popular software toolkit for building grids, uses SOA at its core.

Yet I believe that any n-tier application run on a Grid has many advantages. For example, imagine a web-based application with a supporting relational database that is required to scale under significant user loads including the number of connections but also the complexity of the requested services. Also imagine that clusters of users in different regions will use this application.

First of all, it would be nice for us to provide the data-services of this application using a SOA.  Doing so allows us to expose the data through a single access-layer.  Thus any program can access the data using the same business rules without tying it to a single-application interface.  Secondly, if users require any complex reports or other heavy-duty calculations, a single web-server might easily be overwhelmed and thus forced out of the rotation until the process completes.  A better solution would be for the web-server to farm these sorts of operations out to the Grid – maybe even using a Map-Reduce pattern.  Furthermore adding Grid capacity is an easy way to handle high-peak loads of the application.  These resources could be used by other projects during the off-peak periods.  Lastly, the grid could coordinate resources that are proximate to the regional user-clusters and thus reduce communication latency for any data that needs to be exchanged without having to keep copies of the web or data-infrastructure throughout the enterprise.

If there are advantages to running your n-tier applications on the grid, it is not much of a stretch architecturally to extend that to other classes of application.  I could not imagine implementing a SaaS (Software as a Service) application on anything but a grid.  Having said that, I don’t believe that an application needs to be complicated to run better on a Grid.  Rather, I think any application that users rely on is a good candidate.

Many "desktop" applications not only can be run on the grid but also are more appropriate to do so.  Data centric applications are the prime candidates that come to mind. First of all, keeping results on your desktop all but kills collaboration between users because it is likely on an high-latency low-availability network, may be a separate security-domain and thus inaccessible to many users, and could be shutdown at any time.  In addition, if an application reads and/or writes significant amounts of important data, it is best to keep it in the data-center on reliable and, more-importantly, regularly backed-up storage.  Of course, the application could write across the typical high-latency low-availability desktop network into the datacenter, but that is fraught with problems.  Personally I believe that perhaps the most significant source of user frustration is "network drives" – but I digress.  If an application’s calculations take any significant resources, the user’s desktop quickly becomes a bottleneck.  Even if the user’s machine is beefy enough to handle running a job while still allowing access to email, they are still hardware limited.  In particular, if the application can be submitted in batch to the grid, the user could literally submit dozens if not hundreds of individual calculations and get the results in a fraction of the time it would take on their desktop.  Lastly, running jobs at the datacenter frees users from using a single desktop.  Rather, they can manage their computing from any location, which provides them significantly more freedom.

All of this brings me to GT’s second key assertion: that the term Grid has been, "bandied about so much that no one knows what it means or what business benefits they might derive from it."  This is indeed the core challenge. My experience is that very few business proponents specify software-architectures. Generally they could care less whether a salesperson is pushing SOA, Grid, Cloud, SaaS, or whatnot.  These are the concerns of people who support business-lines: CTOs, IT support-managers, etc.

Chances are you are not dealing with these sorts of technical folks when you are drafting a proposal.  Rather, you are likely speaking with a business-analyst. The ones I know are not easily charmed by buzzwords (even if their bosses or peers are).  They are more than aware that terms mean different things to different vendors and their staff.

Frankly they don’t care about your pet technology.  Instead, they have a set of goals and a given budget.  They are measured on how well the project met the user’s needs, how under-budget it came in, and how much time it took.  If any one proposal that they have happens to align with other business initiatives of which they are aware, then they will consider the advantages as well as the costs of implementing it.  We all know that individual business groups tend to go their own ways, particularly in large companies.  We are not going to corral them with the "Grid".

Yet there is plenty of hope for us.   We Grid proponents should focus on providing small group-level systems that are quickly setup, scale easily, and meet the customer’s defined business goals.   These implementations do not need to fall under the traditional association that Grid has with high-performance computing (HPC): HPC is not often amongst the business goals.  However if the group Grid is built using open-standards, has a resource manager, and allows for the provisioning of global management systems (e.g. authentication domains), it is easy for the technical types to incorporate this small-Grid into an enterprise-wide effort.  This is how we can sell the Grid.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 10:56 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

OpenNEbula and VWS

robot-handshake_000003470462XSmall.jpg

Few days ago authors of the GridWay Metascheduler released Technology Preview of their OpenNEbula Virtual Infrastructure Engine (ONE), which enables deployment and management of virtual machines on a pool of physical resources. The software is very similar to the Globus Virtual Workspace Service (VWS), both in architecture and functionality. Both systems provide new service layer on top of the existing virtualization platforms (currently they support only the Xen hypervisor). This layer extends functionality of the underlying Virtual Machine Monitors (VMMs) from a single machine to a VM provisioning cluster. Both ONE Engine and VWS utilize passwordless SSH access to manage pool of nodes running VMMs, and allow system administrators to deploy new VMs, to start/shutdown and suspend/resume already deployed VMs, as well as to migrate VMs from one physical host to another. The most notable difference between ONE and VWS is that VWS is built on top of the GT infrastructure, and runs within the GT java container. This allows, for example, using RFT for stage-in/stage-out requests to be sent along with the workspace creation requests. On the other hand, the ONE Engine is a standalone service and its installation requirements include only a few software packages that are already present in most linux distributions.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 10:55 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Ten More Reasons to go to Oakland

conference_000003749151XSmall.jpg

Rich Wellner came up with four reasons to attend the Open Source Grid and Cluster Conference, to be held in Oakland May 12-16. I outdid him and came up with 10:

1) Globus program is fantastic, including tutorials, advanced technical presentations, contributed talks, and community events on every aspect of Globus.

2) Gobs of other material on Sun Grid Engine and Rocks, and other open source grid and cluster software.

3) Gathering: A great opportunity to meet colleagues, peers, collaborators from the grid and cluster community. The only grid meeting in the US the rest of this year--the next two OGFs are in Spain (June) and Singapore (September).

4) GT4.2: You'll get to learn about the exciting new features in Globus Toolkit 4.2. New execution, data, security, information, virtualization, and core services.

5) Gratfication (immediate) as you get to provide your input on future directions for Globus, Sun Grid Engine, Rocks, and other open source systems--and maybe sign up to contribute to those developments.

6) Grid solutions: You'll get to meet the people using Globus to build enterprise grid solutions in projects like caBIG, TeraGrid, Earth System Grid, MEDICUS, and LIGO, and learn about solution tools like Introduce, MPI-G, Swift, Taverna, and UniCluster.

7) Gurus: You get to grill the Globus gurus--or, if you prefer, show off your own Globus guru status.

8) Great price: $490 registration is substantially cheaper than OGF or HPDC, for example, and the hotel rate is reasonable ($149).

9) Gorgeous location: Oakland is easy to get to -- SFO (with easy BART  train ride), Oakland, and San Jose airports also nearby. Just a 10 minute train ride to download San Francisco. A lovely time to be in the Bay Area.

10) Gorilla and guerilla free: None of the corporate marketing talks that diluted the last GridWorld conference--apart from two sponsor talks, this is pure tech, and highly useful tech at that!

منبع : http://gridgurus.typepad.com

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 10:54 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

A Grid OS?

package_000005071512XSmall.jpg I have recently been working on a test plan for a framework designed to deliver applications to grid users. The framework is useful for the specific environment in which the customer operates. However it has led me to imagine something more generic that anybody who manages a Grid intended for use by a diverse community would find useful.

You need to have a solid software infrastructure consisting of compilers, libraries, middleware, languages, and services. Your customers want to be able to run the applications that suit their goals best with as little fuss as possible. These include off-the-shelf, commercial customizations, open-source, freeware, supported in-house, and individually built software packages.

While there may be few interoperability issues within a small group or company, you can bet that not all programs will play well with others. Some applications will require very specific libraries and middleware while others will prove to be quite flexible. Some applications require supporting software for 64-bit architectures while others need 32-bit. Other software has different feature-sets on different hardware (e.g. SPARC versus x86) as well as software (e.g. Linux versus IRIX) systems. Still other applications, particularly those that are on long development cycles, tend to use older feature sets whose behavior may have changed or been eliminated from subsequent package releases. Meanwhile your in-house developers might be working on the bleeding-edge and therefore use software that is too unstable for the general user community. Face it: very few software developers expect their products to co-exist with others.

This is a big challenge for anybody who is expected to create a shared-computing environment for a big user community. Typically system administrators will create an operating-system image based upon anticipated usage patterns, security, stability, feature-sets, and availability. They will have specific builds for their web-farm, mail-servers, storage-nodes, and (most importantly) for our Grid computation nodes. They would also like to be proactive and keep their systems up to the latest security and bug-fix patch levels. In addition, they are going to try to provide the best product they can; therefore they would like to provide the most feature-rich infrastructure with which they feel-comfortable. However, and most importantly, they will use a package manager to maintain software releases on their machines. Why would any system manager want to reinvent the wheel when it comes to building software when the vendors will do it for them?

This last practice has a significant impact on the software you will find on the Grid. If the hardware vendor has a build for the software you use, chances are that is what you will get. These package managers tend to keep only one version of a particular software package on a system at a time. Consequently if a newer version of a package is desired, the older one is removed. Even if they tried to make multiple packages coexist, files would be overwritten. There are a few "compat" versions but these are exceptions.

Clearly, when your mandate is to provide a shared computing environment that has a significant number of processing nodes as well as users, you will have to provide a more substantive infrastructure. At this point you could either build specialized virtual machines for each operating environment or you can create a shared infrastructure that any image can use. Utility-computing players like Amazon have you create your own machine image (AMI) but I think it is unreasonable to expect application users to have the skills to create a proper operating environment.

The second option, creating a shared infrastructure that any image can use could be considered a grid operating system from scratch vis-à-vis Linux from scratch. This type of framework would force us to place our software into a categorized structure capable of differentiating operating systems, hardware architectures, and application versions. This infrastructure should not replace the standard installs for the operating system in order to avoid conflicts – providing application support for a grid is orthogonal to managing a compute node.

All of this needs to work without overtaxing your customers (i.e. application users). The typical user doesn’t care which operating environment they are provided as long as their software runs. Rather they would prefer to be able to call their application as if it were the only version using the only installed system libraries and middleware on the only supported compute node configuration. Basically if a user wishes to use an application, they simply want to call it by name: for example python and perhaps python-2.3.7 or python-2.4.5 should they require a particular version.

A big component of your effort in creating the proposed framework is providing the correct versions of libraries and middleware to your customers’ frontline applications; this is a task that demands specialized configuration scripts whose job is to set-up the operating environment to match the user request and the operating environment. There are a few tools out there that are quite capable of accomplishing something like this. However there is nothing that I am aware of whose goal it is to specifically deliver applications on a grid. Instead this class of tools provides far more flexibility than what is necessary, let alone wanted.

Ultimately I think that the best thing for the industry would be to establish a standard Grid directory structure for placing software in shared environments (e.g. //bin///-). A standard method for exposing applications should be decided upon as well. This could be anything from link-farms, to wrapper-scripts, or even environment set-up scripts. If this were to happen software developers and Grid administrators could create standardized packages including configuration scripts that would install into this framework. Setting up python would then be as easy as installing the standard packages for each desired operating environment and then calling "python".

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  چهارشنبه دوازدهم تیر 1387ساعت 10:51 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

عالم پير، گريد جوان‌ - Grid Computing و بزرگ‌ترين ماشينهاي علمي‌ ساخت بشر

نويسندگان: Fabrizio Gagliardi و Francois Grey ؛ ترجمه: سيدمصطفي ناطق‌الا‌سلا‌م‌
ماهنامه شبکه - شهريور ۱۳۸۶ شماره 79

اشاره :

اگر همه چيز مطابق برنامه پيش برود، سال آينده بزرگ‌ترين ماشين علمي‌اي كه تاكنون ساخته شده است، در مجتمع زيرزميني پرپيچ و خمي در سوئيس، نزديك ژنو، به بهره‌برداري خواهد رسيد. تصادم‌گر بزرگ هادرون ‌(LHC) كه در عمق بيش از صد متري زير زمين قرار دارد، دو باريكه پروتون را در جهت‌هاي مخالف هم در يك تونل دايره‌اي 27 كيلومتري شتاب خواهد داد. اين دو باريكه، در حالي كه تقريباً به سرعت نور رسيده‌اند، به صورت متقابل (شاخ به شاخ) با هم برخورد مي‌كنند و رگباري از بقاياي زيراتمي را توليد مي‌كنند كه دانش‌پيشگان انتظار دارند ذراتي مرموز را كه قبلاً هرگز مشاهده نشده‌اند، در ميان آن‌ها بيابند. اين امر مي‌تواند منجر به تغيير در درك بنيادي ما از جهان گردد. دست‌كم، اميد است كه چنين شود. پژوهشگران سازمان تحقيقات هسته‌اي اروپا (سرن)، جايي كه LHC به بهره‌برداري خواهد رسيد، مي‌دانند كه يافتن ذرات مادي گريزاني كه آن‌ها در جست‌وجويش هستند، كار بسيار دشواري خواهد بود. براي يافتن اين ذرات، پژوهشگران بايد توده‌هاي مهيبي از داده‌هاي مربوط به برخوردها را غربال نمايند: انتظار مي‌رود فوران داده‌ها در LHC به طور متوسط، سالانه به پانزده ميليون گيگابايت برسد؛ اين مقدار بيشتر از ميزان داده‌اي است كه براي پر كردن شش دي‌وي‌دي استاندارد در دقيقه لازم است. به اين ترتيب مرتب كردن و تحليل نمودن اين كوه داده‌ها كاري است فراتر از توان هر ابركامپيوتري در جهان. پس در همان حال كه تيم LHC براي تكميل نمودن ماشين غول‌پيكر زيرزميني در تكاپو است، روي سطح زمين گروه ديگري از فيزيك‌پيشگان و متخصصان علوم كامپيوتر در حال حل نمودن مسئله‌اي مستقل هستند: فراهم آوردن زيرساختي محاسباتي‌ كه از پس سيلاب داده‌هاي LHC برآيد. راه‌حلي كه آنان يافته‌اند مجموعه‌اي پهناور از كامپيوترهاي قدرتمند كه حدوداً در دويست مركز پژوهشي در سراسر دنيا گسترده‌اند و به گونه‌اي مرتبط و پيكربندي شده‌اند كه همچون يك سيستم واحد پردازش موازي كار كنند. اين نوع زيرساخت يك گريد پردازشي (computing grid) خوانده مي‌شود.

 

 |+| نوشته شده در  دوشنبه بیستم خرداد 1387ساعت 12:36 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

A Virtual Infrastructure Layer for Cluster and Grid Computing

Cluster, and so Grid site, administrators have to deal with the following requirements when configuring and scaling their infrastructure:

  • Heterogeneous configuration demands. Users often require specific versions of different software components (e.g. operating system, libraries or post-processing utilities). The cost of the installation, configuration and maintenance of user-specific or VO-specific worker nodes limits the flexibility of the infrastructure.
  • Performance partitioning. Most of the computing infrastructures do not allow administrators to isolate and partition the performance of the physical resources  they devote to different computing clusters or Grid infrastructures. This limits the quality of service and reliability of actual computing platforms, preventing a wide adoption of the Grid paradigm.

In order to overcome these challenges, we propose a new virtualization layer between the service and the physical infrastructure layers, which seamless integrates with existing Grid and cluster middleware stacks. The new virtualization layer extends the benefits of VMMs (Virtual Machine Monitors) from a single physical resource to a cluster of resources, decoupling a server not only from the physical infrastructure but also from the physical location. In the particular case of computing clusters, this new layer supports the dynamic execution of  computing services, working nodes, from different computer clusters on a single physical cluster.


layers.png

OpenNebula is the name of a new open-source technology that transforms a physical infrastructure into a virtual infrastructure by dynamically overlaying VMs over physical resources. So computing services, such as working nodes managed by existing LRMs (Local Resource Managers) like SGE, Condor, OpenPBS..., could be executed on top of the virtual infrastructure; so allowing a physical cluster to dynamically execute multiple virtual clusters.

The separation of resource provisioning, managed by OpenNebula, from job execution management, managed by existing LRMs, provides the following benefits:

  • Cluster consolidation because multiple virtual working nodes can run on a single physical resource, reducing the number of physical systems and so space, administration, power and cooling requirements. The allocation of physical resources to virtual nodes could be dynamic, depending on its computing demands, by leveraging the migration functionality provided by existing VMMs
  • Cluster partitioning because the physical resources of a cluster could be used to execute virtual working nodes bound to different virtual clusters
  • Support for heterogeneous workloads with multiple (even conflicting) software requirements, allowing the execution of software with strict requirements as jobs that will only run with a specific version of a library or legacy application execution

Consequently, this approach provides the flexibility required to allow Grid sites to execute on-demand VO-specific working nodes and to isolate and partition the physical resources. Additionally, the architecture offers other benefits to the administrator of the cluster, such as high availability, support for planned maintenance and changing capacity availability, performance partitioning, protection against malicious use of resources...

The idea of a virtual infrastructure which dynamically manages the execution of VMs on physical resources is not new. There exist several VM Management proprietary solutions to simplify the use of virtualization, so providing the enterprise with the potential benefits this technology may offer. Examples of products for the centralized management of the life-cycle of a VM workload on a pool of physical resources are: Platform VM Orchestrator, IBM Virtualization Manager, Novell ZENworks, VMware Virtual Center, and HP VMManager.

The OpenNebula Virtual Infrastructure Engine differentiates from those VM management systems in its highly modular and open architecture  to meet the requirements of cluster administrators. The OpenNebula Engine provides a command line interface for monitoring and controlling VMs and physical resources quite similar to that provided by well-known LRMs. Such interface allows its integration with third-party tools, such as LRMs, service adapters, VM image managers...; to provide a complete solution for the deployment of flexible and efficient computing clusters. The service layer decoupling from the infrastructure layer allows an straightforward extension of the previous idea to any kind of service. In this way any physical infrastructure can be transformed into a very effective provisioning platform.

A Technology Preview of OpenNebula is available for download under the terms of the Apache License, version 2.0.

Ignacio Martín Llorente
Reprinted from blog.dsa-research.org

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و نهم فروردین 1387ساعت 11:14 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

My favorite grid application

tacc.jpg

My esteemed Grid Gurus moderator, Rich Wellner, asked "what is the most creative use for grid technology that you've ever seen?" This is a difficult question to answer, but I will attempt to do so anyway.

I choose the work of George Karniadakis, Suchuan Dong, Nick Karonis, and their colleagues on modeling blood flow in the human body. Why I like it is the wacky (sorry, wonderful) way in which they mapped this apparently highly tightly coupled problem onto the distributed sites of the NSF TeraGrid. Quoting  one of their papers:

Motivated by a grand-challenge problem in biomechanics, we are striving to simulate blood flow in the entire human arterial tree. The problem originates from the widely accepted causal relationship between blood flow and the formation of arterial disease such as atherosclerotic plaques. These disease conditions preferentially develop in separated and recirculating flow regions such as arterial branches and bifurcations. Modeling these types of interactions requires significant compute resources to calculate the three-dimensional unsteady fluid dynamics in the sites of interest. Waveform coupling between the bifurcations, however, can be reasonably modeled by a reduced set of one-dimensional
equations that capture the cross-sectional area and sectional velocity properties. One can therefore simulate the entire arterial tree using a hybrid approach based on a reduced set of one-dimensional equations for the overall system and detailed 3D Navier-Stokes equations at arterial branches and bifurcations.

In other words, they mapped different parts of the human body (chest, legs, arms, head, and their arterial branches) to different TeraGrid sites, linking them by a simple, non-communication intensive 1-D problem.

The tools used to make this happen were MPICH-G2 (recently renamed as MPIG) and of course Globus.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و نهم فروردین 1387ساعت 11:13 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

The Canon of Clouds

clouds_000003876801XSmall.jpgThe emergence of cloud computing as a resource on the grid has led to a huge resurgence in interest in utility computing. Looking at the history of utility computing allows us to identify three canonical interaction models that also apply to cloud computing.

  • Metascheduling
  • Virtual machines
  • Application virtualization

Metascheduling

Initial cloud offerings like Amazon Elastic Compute Cloud created the nomenclature around clouds. Going back before the term "cloud" was coined we see a similar offering from Sun with their utility computing offering. In both cases users submit work to the service and eventually get results returned. How the request gets prioritized, provisioned and executed is at the discretion of the service provider. In many ways this is similar to how a typical cluster works. A user selects a cluster, submits a job and waits for a response. What node is used to execute his request is largely out of his control. While acknowledging there are substantial difference between a cluster and a cloud, another similarity reveals itself when thinking about how users interact with compute resources in companies that operate multiple clusters.

As companies began adding additional clusters, users quickly demanded a facility to submit their jobs to a high level service that would manage the interactions with all the clusters that were available. Most users didn't want to have to themselves use multiple monitoring tools to access multiple clusters and use the information gathered to make a decision about where to submit their job. What they wanted was a single interface to submit jobs to and a service that would make policy based decisions about which cluster to ultimately submit the request.

The situation today is similar. Multiple cloud and utility computing vendors exist and users don't want to spend their time gathering information about the state of each in order to decide where to submit their jobs. Further, administrators and managers need to be able to enforce policy. There are several reasons for requiring this behavior, but probably the easiest to explain is that there are costs associated with resource usage at the cloud vendors and organizations require control over how that money is spent.

The answer to all these needs is to place a metascheduler between the users and the various resources. Users can then use a single interface for all their jobs regardless of where they are ultimately going to be executed.

[A metascheduler] enables large-scale, reliable and efficient sharing of computing resources (clusters, computing farms, servers, supercomputers…), managed by different LRM (Local Resource Management) systems, such as PBS, SGE, LSF, Condor…, within a single organization (enterprise grid) or scattered across several administrative domains (partner or supply-chain grid). -- GridWay

Virtual machines

Clouds are only as useful as the software running in them. Therefore, the next important interaction model is that between users and virtual machines.

Users often need very specific software stacks. This includes the application they are running, support libraries and, in some instances, specific versions of operating systems. Analysts are saying that there are now at least 35 companies addressing the needs of users in managing these interactions. This includes software to implement the enactment layer, manage images, policy engines, user portals and analytics functions.

One of the questions yet to be answered in the cloud community is how to allow users to make use of several clouds on a day to day basis. As this market continues to mature, look for many of the same challenges (e.g. security, common APIs, WAN latencies) that the grid community has been tackling for over a decade to become increasingly important to cloud users.

Application virtualization

In the context of clouds, application virtualization gains significant power by being able to add or remove instances of applications on demand. This is currently being done in the context of data center management using proprietary tools. Clouds present a cool new opportunity to do the balancing act on a regional basis. As more clouds are built and standard interfaces made available, users will be able to load balance to multiple clouds operating in different countries or cities as demand grows and shrinks.

These three models represent established, powerful interaction modes that are being used in production in a variety of settings today. It will be interesting over the next year to see which cloud operators adopt which models and how many lessons they take from existing non-cloud implementation versus trying to reinvent the wheel in a new way.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و نهم فروردین 1387ساعت 11:12 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

UniCluster 3.2 Released

code_000000237891Small.jpgThe grid.org team has just declared UniCluster 3.2 to be stable.

  • Users can now install UniCluster Express over an existing Grid Engine installation.
  • Expanded platform and operating support, including native 64-bit system support.
  • The installation directory and service user are now created at installation, if they do not exist already.
  • Removed several installation prerequisites, making installation easier and faster.
  • Maintenance in the form of defect repairs. Refer to bugzilla for specific details.

In particular, being able to install over an existing Grid Engine installation is super cool. This is a feature that I've been excited about for a long time as it brings globus, ganglia and the UniCluster monitoring application to the existing 10,000 or so Grid Engine clusters.

Download

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و نهم فروردین 1387ساعت 11:6 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام
Red Hat Enterprise MRG و محاسبات موازی و محاسبات روی گرید

شرکت ردهت اقدام به معرفی نسخه‌ی جدیدی از سیستم عامل‌های خود نموده است که در این نسخه٬ محاسبات موازی٬ محاسبات روی گرید٬ و مسائلی از این دست که مورد علاقه محاسباتی کاران است بسیار قابل حصول شده است.

نسخه‌ی بتای این سیستم عامل٬ اکنون قابلِ دانلود است.

به بخشی از توضیحات مربوطه توجه کنید:

Red Hat Enterprise MRG supports the full spectrum of distributed tasks, including:

* High-speed, reliable, or large file messaging
* Parallel & cycle-stealing scheduling
* High Performance Computing (HPC) and High Throughput Computing (HTC)
* Distributed workload management

Red Hat Enterprise MRG can run across multiple platforms but also takes deep advantage of Red Hat Enterprise Linux capabilities like clustering, IO, and virtualization for optimal performance and qualities of service.

مرجع:
http://www.redhat.com/mrg/?intcmp=70160000000HEmC
http://www.redhat.com/mrg/grid

منبع :

http://condmatt.blogspot.com

 

 |+| نوشته شده در  چهارشنبه چهاردهم فروردین 1387ساعت 11:32 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

جهش اطلاعاتي دنياي رايانه

اين نوشته يک گزارش علمي است از پروژه ملي ژاپن به اسم NAREGI که مخفف National Research Grid Initiative مي باشد که طي يک همايش بين المللي در معرض داوري دانشمندان و متخصصان سراسر دنيا قرار گرفت. نگارنده خود نيز در اين همايش شرکت داشت که جهت اطلاع دوستان محقق داخلي و مسولان مربوطه اين گزارش را تقديم مي نمايد. هدف اصلي پروژه NAREGI اين است که به قدرت محاسباتي پتا(10 به توان 15) فلاپ بر ثانيه دست بيابند. اين ميزان قدرت محاسباتي معادل يک ميليون پينتيوم 4 است. اين پروژه از سال 2003 تا 2007 با بودجه اي از حدود 20 ميليون دلار بر سال تعريف شده است. يعني براي کل پروژه 5 ساله 100 ميليون دلار بودجه پيش بيني شده است. شايد چيزي مثل درآمد نفتي يک روز ما!

قرار است با اين قدرت محاسباتي پتافلاپ بر ثانيه چه کاري بکنند؟ اين طوري که پروفسور H. Nakamura رييس انيستيتوي علوم مولکولي نارا- ژاپن ميگفت اهداف به اين شکل است: 1- به وجود آوردن و نهادينه کردن علم جديدي به اسم علم نانو (نه تکنولوژي نانو) به عنوان يکي از علوم پايه. به نظر نگارنده روحيه متواضع و بي ادعاي ژاپني به سختي مي تواند ادعايي از اين نوع داشته باشد و اگر يک دانشمند ژاپني چنين ادعايي بکند بايد خيلي آن را جدي گرفت! 2- مفهوم GRID را به عنوان ابزار طبيعي اين علم درآورند. GRID يک محيط محاسباتي ناهمگون است که مساله ترجمه کدهاي کامپيوتري بين سيستم عامل‌هاي مختلف را مرتفع خواهد کرد. در حقيقت GRID خود يک سيستم عامل است که توسط ابزارهايي به اسم middleware امکان ارتباط بين کدهاي توسعه يافته تحت سيستم عامل‌هاي مختلف را فراهم مي کند. براي توسعه سيستم عامل GRID و middleware‌هاي مربوطه حدود 300 مهندس کامپيوتر مطابق گفته معاون وزير علوم ژاپن در اين پروژه مشارکت دارند. تاکنون ژاپني‌ها توانسته اند به طور موفقيت آميزي چند برنامه را به عنوان مثالي از عملي بودن اين محيط محاسباتي با استفاده از سيستم GRID حل نمايند: شبيه سازي مولکول‌هاي بزرگ از حدود پروتيين‌ها در خلا با قدرت محاسباتي فعلي به صورت کاملا کوانتومي وجود ندارد. وجود حلال (يعني 10 به توان 23 مولکول آب!) مساله را از اين نيز پيچيده تر ميکند.ولي با استفاده از قدرت فعلي محاسباتي از حدود 17 ترا (10 به توان 12) فلاپ بر ثانيه ژاپني‌ها توانسته اند برخي واکنش‌هاي شيميايي در محلول‌ها و من جمله فرايندهاي حياتي مربوط به پروتيين‌ها را شبيه سازي کنند. اين ميزان قدرت محاسباتي توسط حدود سه هزار رايانه در سرتاسر ژاپن تامين ميشود که توسط شبکه فوق سريع به هم مربوط اند.

هدف غايي اين است که بتوانند قطعات الكترونيك مقياس نانومتر (مشتمل بر حدود يک ميليون الکترون) را بدون ساختن قطعات در آزمايشگاه بر روي سيستم GRID شبيه سازي کنند. البته کاربرد سيستم GRID به شبيه سازي‌هاي کوانتومي و فرايندهاي شيميايي يا کاربردهاي آن در ماده بيولوژيک منحصر نيست. نکته جالب توجه اين است که حدود 40 کمپاني مهم ژاپني نظير هيتاچي و تويوتا نيز در اين پروژه مشارکت دارند. مثلا يکي از کاربردهاي بالقوه اين محيط محاسباتي ميتواند شبيه سازي تصادف خودرو‌ها با جزييات بيشتر و دقيق تر باشد.

نکاتي هم از بحث با برخي دانشمندان اروپايي و آمريکايي که به اين همايش دعوت شده بودند نقل مي‌کنم: پروفسور Sandro Sorella از مرکز SISSA در ايتاليا معتقد است که هرچقدر که تعداد زيادتري کامپيوتر از مراکز مختلف را بتوان تحت تکنولوژي GRID به هم متصل کرد، به همان ميزان نيز متقاضي استفاده از شبکه و اجراي برنامه در شبکه افزايش خواهد يافت که عملا فرقي بين استفاده از GRID يا عدم استفاده از آن وجود ندارد. پروفسور Takami Tohyama از انيستيتوي تحقيقات مواد دانشگاه توهوکو ژاپن در جواب اين سوال من که شما تميز ترين کد قطري سازي دقيق دنيا را در طي 15 سال گذشته توسعه داده ايد حاضريد اجازه دهيد کس ديگري آن سوي دنيا کد شما را کامپيايل کرده و از آن استفاده کند گفت که اين يک روياست! تحقق آن سخت به نظر ميرسد. يک پروفسور روسي الاصل از کانادا هم که متخصص محاسبات بزرگ مقياس است معتقد بود که اگر به فرض به هدف پتافلاپ برسند فقط قدرتشان 1000 برابر قدرت محاسباتي نوعي خوشه‌هاي 512 تايي است که به معناي 10 برابر شدن ابعاد فضايي يک سيستم 3 بعدي است و اشتهاي ما را براي سيستم‌هاي بزرگتر برخواهد انگيخت. چون هنوز اين ميزان كافي نيست.

پروفسور G. Baskaran از انيستيتوي علوم رياضي مدرس هندوستان معتقد است راه حل مسايل پيچيده در فيزيک ماده چگال يا ماده بيولوژيک کامپيوترهاي بزرگ نيست! وقتي يک مساله جديدي با ميزان پيچيدگي جديد فرا روي ما قرار ميگيرد، براي حل آن نياز به ابداع «مفهوم» جديد داريم. به نظر نگارنده نيز اين استاد بزرگوار در حالت کلي فرمايششان متين است. اما به هر صورت کشور ما براي بسياري از مسايل به قدرت محاسباتي از حدود چند ده ترافلاپ بر ثانيه (معادل چند هزار پنتيوم 4) براي برخي پروژه‌هاي ملي نياز دارد.

اگر قرار است که اين ميزان قدرت محاسباتي با چيزي مثل چند ده ميليون دلار حاصل شود براي مملکت ما کار سختي نيست. فقط کافي است که کار را به کاردان بسپارند!! مملکت ما در حال حاضر در شيمي داراي دانشمنداني است که توسط استانداردهاي بين المللي به عنوان دانشمند پر استناد معرفي شده اند. در فيزيک هم تا آنجايي که نگارنده خود به عنوان محقق فيزيک مطلع است دانشمندان قابلي در اين کشور وجود دارند. درعلوم کامپيوتر با اينکه رشته تخصصي بنده نيست ولي بچه‌هايي که قادرند در سطح اسباب بازي (مسابقه فوتبال روبات‌ها) در دنيا اول شوند به وضوح مي توانند در سطوحي جدي تر و کاربردي تر از اين حرف‌ها شانه به شانه دوستان ژاپني ما پيش بروند. در رشته‌هاي ديگر نيز مطمئنا کساني هستند که در صورت اعتماد به آنها قاردند کارهاي مهمي انجام دهند.

نکته آموزنده‌اي که از اين همايش ژاپني مي توان آموخت اين است که روحيه پاسخگويي دانشمندان ژاپني ايجاب ميکند که به ازاي پول 20 ميليون دلار بر سالي که تاکنون استفاده کرده اند، چند تا از برجسته ترين دانشمندان شيمي (به عنوان مثال پر استناد ترين دانشمند شيمي دنيا در اين همايش شرکت داشت)، فيزيک و متخصصان محاسبات بزرگ مقياس را از آمريکا و اروپا دعوت کنند تا در حضور آنها به سنت حسنه «پاسخگويي» بپردازند! هيچ چيز سري هم وجود ندارد! همه به صورت آزاد دعوت شده اند تا نظر دهند. اگر کسي اشکالي در سيستم و رهيافت علماي ژاپني به ذهنش برسد و به آنها تذکر دهد با لبخند مليح ژاپني و ادب و تشکر فراوان آنها مواجه خواهد شد.

 

منبع : http://www.bashgah.net

 

 |+| نوشته شده در  چهارشنبه چهاردهم فروردین 1387ساعت 11:23 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

HTC and Cloud and Grid Computing

The HyperText Computing (HTC) paradigm is not a “complete solution” to the challenges and opportunites afforded by Cloud and Grid computing — however this post argues that the HTC is part of the solution. My angle into this question is via a recent blog post.

This is how Tim Foster, in a recent post at Grid Gurus, concludes his discussion of current and future trends of Cloud and Grid computing (emphasis mine):

In building this distributed “cloud” or “grid” (“groud”?), we will need to support on-demand provisioning and configuration of integrated “virtual systems” providing the precise capabilities needed by an end-user. We will need to define protocols that allow users and service providers to discover and hand off demands to other providers, to monitor and manage their reservations, and arrange payment. We will need tools for managing both the underlying resources and the resulting distributed computations. We will need the centralized scale of today’s cloud utilities, and the distribution and interoperability of today’s grid facilities.

The concepts that Tim highlights: “on-demand provisioning”, “configuring integrated virtual systems”, providing “precise capabilities” and a focus on the needs of the “end-user” are all addressed by the HyperText Computing (HTC) paradigm. HTC also addresses the need to view central resources through the same lens as localised ones.

The HyperText Computing (or Request Based Distributed Computing - RBDC) — is a small extension of http and our conceptions of server, proxy and client. It creates a distributed computing platform that is built from an end-user perspective outwards just as http does for information. It is built on a recognition of the equivalence between http resources and the code that when executed will return the resource. RBDC unifes programming models by applying browser based sandboxed Virtual Machines (VM) to our conception of proxies and servers.

Key benefits of RBDC are ultra-lightweight distributed computing, run-time code mobility, and backwards compatibility with http.

A fuller description of RBDC may be found here.

Http offers location transparency for retrieving data, a small http extension can also provide location transparency for code execution.

منبع : http://www.davidpratten.com

 

 |+| نوشته شده در  شنبه سوم فروردین 1387ساعت 10:28 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Globus Selected as a Google Summer of Code 2008 Mentoring Organization

iStock_000002311523Small.jpgThe Globus Alliance has been selected as a Google Summer of Code 2008 mentoring organization. Google Summer of Code (GSoC) is a program that offers student developers stipends to write code for various open source projects. Google works with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Historically, the program has brought together over 1,500 students with over 130 open source projects to create millions of lines of code. The program, which kicked off in 2005, is now in its fourth year.

If you are a student and would be interested in participating in GSoC with Globus as your mentoring organization, please take a look at our GSoC Ideas page. This page lists projects that Globus has proposed for GSoC, but it is not a closed list. If you have an idea for a cool project that uses or extends Globus technologies, please take a look at our list of Globus GSoC mentors and contact the one which most closely matches your interests. Take into account that student proposals must be submitted by March 31st and that you must meet Google's student eligibility criteria.

If you have any questions about our participation in GSoC, please contact the Globus GSoC administrators.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه دوم فروردین 1387ساعت 3:44 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

All Jobs Are Not Created Equal

handstand_000004002888XSmall.jpgChoosing a distributed resource management (DRM) software may not be a simple task. There are a number of open source or commercial software packages available, and companies usually go through product evaluation phase in which they consider factors like software license and support costs, maintenance issues, their own use cases and existing/planned infrastructure, etc. After following this (possibly lengthy) procedure, and finally making the decision, purchasing and installing the product, you should also make sure that the DRM software configuration fits your cluster usage and needs. In particular, designing the appropriate queue structure, configuring resources, resource management and scheduling policies are some of the most important aspects of your cluster configuration. At first glance devoting your company's resources into something like queue design might seem unnecessary. After all, how can one go wrong with the usual "short", "medium" and "long" queues? However, the bigger your organization is and the more diverse computing needs of your users are, the more likely it is that you would benefit from investing some time into designing and implementing queues more efficiently. My favorite example here involves high priority jobs that must be completed in a relatively short period of time, regardless of how busy the cluster is. Such jobs must be allowed to preempt computing resources from other lower priority jobs that are already running. Better DRMs usually allow for such use case (e.g., by configuring "preemptive scheduling" in LSF, or using "subordinated queues" in Grid Engine), but this is clearly something that has to be well thought through before it can be implemented. In any case, when configuring DRM software, it is important to keep in mind that not all jobs (or not all users for that matter) are created equal...
 
منبع : http://gridgurus.typepad.com
 
 |+| نوشته شده در  جمعه دوم فروردین 1387ساعت 3:43 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام
از  وبلاگ دانشجویان گروه کامپیوتر موسسه آموزش عالی سلمان

خلاصه شده از وبلاگ زیر

آدرس : http://salmancg.blogfa.com/cat-6.aspx

سلام، بالاخره تتونستم دو پروژه مربوط به گرید رو آپلود کنم و براتون بزارم. فایل اولی رو قبلا برا دوستان گذاشته بودمکه مربوط به درس شیوه ارائه بود. فایل دوم بطور دقیق تر معماری گرید رو مورد بررسی قرار میده.

فایل اول مربوط به آشنایی مقدماتی با تکنولوژی گرید هستش.

آشنایی با گرید [ 3 mb ]

فایل دوم مربوط به بررسی دقیق معماری گرید هستش، این فایل ترجمه Part دوم کتاب Introduction to Grid  از شرکت IBM هستش که کار ترجمه رو (به ترتیب حروف الفبا) من و خانم اسماعیل زاده و آقای بنائی و خانم طیرانی و آقای فاتحی انجام دادیم.

بررسی معماری گرید [ 8  mb]

اینجا از جمال عزیز برا پیگیریهاش و بیژن گل برا فعالیتاش در زمینه اطلاع رسانی، تشکر می کنم.

فایل ppt پروژه شیوه ارائه (Grid Technology)
دوستان عزیزم فایل PowerPoint ارائه ام که در مورد تکنولوژی گرید بود رو می تونید از لینک زیر دانلود کنید. امیدوارم براتون مفید باشه.

دریافت فایل Power Point

 از لینک زیر هم می تونید یه کتاب خیلی معتبر در زمینه Grid دانلود کنید. این کتاب از انتشارات شرکت IBM هستش.

دریافت Red Book

 

 |+| نوشته شده در  پنجشنبه بیست و سوم اسفند 1386ساعت 2:17 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Are You Looking for a Grid Job?

distributed.jpgMy team is looking for some folks to join up and help us bring grid technology to our customers. Drop me a line!

 

 

 

 

 

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و سوم اسفند 1386ساعت 9:52 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

All of Your Data in One Basket

disk_000000967564XSmall.jpg

I once worked with this person who wrote programs that only wrote to a single file. Once this program was put into the grid environment it would routinely create files that were hundreds of gigabytes in size.  Nobody considered this to be a problem because the space was available and the SAN not only supported files of that size, but also performed amazingly well considering the expectations. While this simplifies the code and data management, there are a number of reasons why this is not a good practice.

  • You don’t always need all of the output data at once. Moving a piece from the grid to your desktop for testing would not even be a consideration.
  • The amount of computation-time needed to recreate a huge file is significant.
  • There is no easy way to get to use multiple threads for writing and/or reading data.
  • Moving files across the network takes a lot more time.
  • A file can only be opened in read-write mode by one process at a time.  One large file is going to block a lot more modification operations than several single files.
  • Backing the file up is remarkably more difficult.  You cannot just burn it to a DVD so it has to be sent to disk or to tape.  If you need to restore a file it can take a significant amount of time.
  • Your file is going to be severely fragmented on the physical drives and therefore will cause increased seek times.
  • You can no longer use memory-mapped files.
  • Performing a checksum on a large file takes forever.
  • Finally, if you had properly distributed the job across the Grid, you should not have such large files!!!

Why would anybody do such a thing?  All your data are belong to us?

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و سوم اسفند 1386ساعت 9:13 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Recommended reading list for grid developers

By: Ovais Khan

Edna Nerona in an IBM Developer Work article, recommends a list of reading material for grid developers and researchers. Some of the important links are being provided here, For rest see the actual article.

منبع : http://www.gridblog.com/comments.php?id=242_0_1_0_C

 

 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:28 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Grid Computing as a Managed Service

By: Ahmar Abbas

"Grid computing is the latest to join the bandwagon of managed services. It's a good way of avoiding an expensive infrastructure investment", writes Bob Violino in his article Grid Computing Comes Around in this edition of Global Services magazine.

This article focuses on grid computing as a managed service. “What differentiates grid managed services from straight hosting is that the entire technology substrate that enables grid computing [software, hardware, storage] has already been deployed by the service provider,” says Ahmar Abbas, MD, Grid Technology Partners, a consulting firm in Falls Church, Va. “The client needs to just focus on the application enablement so that it can utilize the grid infrastructure.” Also different is the concept of paying for CPU utilization rather than a monthly fee for hosting infrastructure.

منبع : http://www.gridblog.com/comments.php?id=240_0_1_0_C

 

 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:26 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Elastic Compute Cloud (Amazon EC2)

By: Ahmar Abbas

A few weeks back I blogged about Amazon's Simple Storage Service (S3) and how it was gaining traction in the market.

Well, it turns out that Amazon has even greater ambitions than just providing loads of hosted and managed storage! Today they announced their Elastic Compute Cloud (Amazon EC2).
The key to the technology seems to be the Amazon Machine Image (AMI). Users can create AMI based on their particular application or system profile. These are uploaded to the S3 service and are brought on line when required.

I can see some immediate business continuity / disaster recovery applications. Though not quite sure how load balancing occurs across multiples AMI instances that are brought live as application servers.

Another great step by Amazon to turn is technology platform into a revenue generating engine!

منبع : http://www.gridblog.com/comments.php?id=239_0_1_0_C

 

 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:25 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Software shops get SaaS-y

By: Ahmar Abbas

Independent Software Vendors (ISVs) that venture into the SaaS world have taken on two distinct sets of responsibilities. First, like traditional software companies, SaaS vendors are responsible for continually delivering innovative and relevant software products. Second, SaaS vendors must also develop, manage and support the infrastructure that is used to provide the software to the end user, under a regime of demanding service level agreements and associated penalties. Here's a look at the challenges (and rewards) ahead.

Read full article in ITWorld

منبع :  http://www.gridblog.com/comments.php?id=238_0_1_0_C

 

 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:23 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Back to Grid Blogging - OGF, Airbus deal

By: Ovais Khan

Recently, I have been busy at work so stopped blogging for a while. During the period of inactivity, there had been numerous news items and grid related activities that I have been starring in my reader and mails. The following are highlights of this somewhat longer post.

  • EGA - GGF Merger
  • Airbus into Grid Computing
  • A good Computational Grid Intro



  • EGA - GGF Merger: EGA and GGF, the two mainstream standard groups which have been discussing about a merger for soem time now, have finally merged to form a new entity Open Grid Forum (OGF). OGF Website is still under construction and the Board of Directors and leadership teams are yet to be finalized.
    The Sci-Tech Today has the following comments:

    Mark Linesch, who will lead the group, said the OGF would "open new doors to scientific discovery, business value and commercial adoption worldwide."
    Experts welcomed the end of the groups' prolonged sparring over definitions and semantics.



    Instead of quoting more, I am providing a link of related resource for the interested readers:



     
  • Airbus into Grid Computing: Grid Today in a news story reports that Fujitsu Systems has received order from Airbus for SynfiniWay HPC Grid middleware

    SynfiniWay proved to have the most complete and integrated Grid computing solution for aerodynamics analyses at Airbus, combining service-oriented applications with open workflow capabilities for efficient support of complex dynamic processes.

    Fujitsu Systems Europe has also been contracted to develop the services around the aerodynamic applications, and to integrate SynfiniWay within the existing user desktop tools for transparent grid access.


     
  • A good Computational Grid Intro:
    I came across this well written state of affairs of Computational Grid by Tim Bray in his ongoing weblog.

منبع : http://www.gridblog.com/comments.php?id=236_0_1_0_C

 

 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:22 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Amazon Simple Storage Service - Gaining Momentum

By: Ahmar Abbas

What do Microsoft and SmugMug have in common? Both rely on the Amazon Simple Storage Service (S3) for cheap and reliable web-scale storage. With Amazon S3, growing companies now have the resources to look and feel like a Fortune 500 enterprise.

Today, Amazon announced a variety of customers that together are storing more than 800 million data objects using Amazon S3. On one end of the spectrum there is Microsoft, which is utilizing S3 to dramatically reduce its storage costs without compromising scale or reliability. On the other end are small businesses such as SmugMug that are depending on the S3 benefits of scale and cost-efficiently previously only available to large companies.

منبع : http://www.gridblog.com/comments.php?id=235_0_1_0_C

 

 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:20 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Four Reasons to Attend the Open Source Grid and Cluster Conference

conference_000003749151XSmall.jpgWe're combining the best of GlobusWorld, Grid Engine Workshop and Rocks-a-Palooza into one killer event in Oakland this May. Here's why you should come to the Open Source Grid and Cluster Conference:

  • Great Speakers: We're going to have the rock stars of the grid world speaking and teaching.

  • Great Topics: Dedicated tracks to each of the communities being hosted.

  • Community Interaction: The grid community is spread all over the world, this will be a meeting place to get face time with the people you know by name only.

  • You Can Speak: We're currently accepting agenda submissions for 90 minute panels and sessions.

This should be a fantastic conference, I'll look forward to meeting you there.

منبع : http://gridgurus.typepad.com
 
 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:8 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Grid vs Clouds? Who can tell the difference?

clouds_000003876801XSmall.jpgThe term "cloud computing" seems to be attracting lots of attention these days. If you google it, you'll find more than half a million results, starting with Wikipedia definitions and news involving companies like Google, IBM, and Amazon. There is definitely no shortage of blogs and articles on the subject. While reading some of those, I've stumbled upon an excellent post by John Willis, in which he shares what he learned while researching the "clouds".

One interesting point from John's article that caught my eye was his regard of virtualization as the main distinguishing feature of "clouds" with respect to the "old Grid Computing" paradigm ("Virtualization is the secret sauce of a cloud."). While I do not disagree that virtualization software like Xen or VMware is an important part of today's commercial "cloud" providers, I also cannot help noticing that various aspects of virtualization were part of grid projects from their beginnings. For example, SAMGrid, one of the first data grid projects that served (and still serves!) several of Fermilab's High Energy Physics experiments since the late 1990's, allowed users to process data stored in multiple sites around the world without requiring users to know where the data will be coming from, and how will it be delivered to their jobs. In a sense, from physicist's perspective experiment data was coming out of the "data cloud". As another example, "Virtual Workspaces Service" has been part of the Globus Toolkit (as incubator project) for some time now. It allows an authorized grid client to deploy an environment described by the workspace metadata on a specified resource. Types of environments that can be deployed using this service range from atomic workspace to a cluster.

Although I disagree with John's view on the differences between the "old grid" and "new cloud" computing, I still highly recommend the above mentioned article, as well as his other posts on the same subject.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  یکشنبه نوزدهم اسفند 1386ساعت 9:6 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Vishwa: A Reconfigurable Scalable Middleware for Grid Computing

DOSLab Page

Advancements in networking and cheaper computing technology have enabled the Internet to be used for resource sharing, instead of just document sharing. The resources can include computing, storage and network resources. The dynamic nature of the Internet in terms of node/network failures poses challenges for large scale resource sharing. Further, the resources are autonomic, implying that they may join or leave the system dynamically. Thus, solutions for Internet scale resource sharing must enable dynamic application dependability and middleware reconfigurability. This means that the underlying resource sharing middleware must ensure dependability of applications in spite of resource/network dynamics. Further, the middleware components themselves must adapt to these dynamics (middleware reconfigurability).

 

Peer-to-Peer (P2P) systems such as Gnutella, Freenet, Pastry etc. provide reconfigurability and scalability. However, these file sharing P2P systems may not be directly usable for sharing compute resources on the Internet. This is because they do not consider the proximity of resources or their capabilities. We explore the use of P2P system concepts to build a reconfigurable and scalable middleware for Internet scale resource sharing. In unstructured P2P systems such as Gnutella and Freenet, the overlay is built in an uncontrolled fashion, possibly with self-organizing behaviour. They provide flexibility for finding resources by supporting arbitrary queries for searching. They may be inefficient, as they use flooding for the search. Further, unstructured P2P systems cannot provide guarantees about finding the data. In contrast, structured P2P systems assign static identifiers to peers and impose a overlay structure based on the node identifier. A routing structure based on distributed data structures (Distributed Hash Table, as in Chord and Pastry) is also imposed. The structured P2P systems can provide data location guarantees and are efficient for searching (O(log(n)) time, for n nodes). However, they support only limited and exact matching queries.

 

We propose Vishwa, a two layered P2P middleware for resource sharing in the Internet. It is a scalable and dynamically reconfigurable middleware. It provides a dependable execution environment for grid applications. The task management layer of the middleware is responsible for initial task deployment on the best available under-utilized nodes as well as the runtime migration of tasks to handle load dynamics. The task management layer is realized as an unstructured P2P layer and allows logical resource clustering based on proximity. The unstructured overlay allows neighbour lists to be constructed based on application specific criteria, whereas in structured overlay, the neighbour lists are only based on node identifiers. Thus, the task management layer of Vishwa constructs the neighbour list based on resource capabilities. If you want to know more about Vishwa, please follow the link below for the presentation or try the Vishwa technical report on the publications page.

 

Slides:

Vishwa    Vishwa Compared with Globus Toolkit

 

Data Management in Grids using the Two Layered P2P Architecture

DOSLab Page

We have extended Vishwa to a data management platform named as Virat. Large amounts of scientific data are being produced, for instance see Grid Physics Project or the Compact Muon Solenoid (CMS). Distributed computations on this data must be scheduled. Hundreds to thousands of geographically distributed users need access to data for performing computation. So, there is a need to replicate the data at appropriate locations to handle node/network failures and minimize computation time and/or bandwidth. There must be ways of describing the data in the form of meta-data to allow geographically distributed access to the data. The meta-data must also be replicated for fault-tolerance. Thus, replica management of data as well as meta-data is important. There must also be efficient mechanism to search/query the data. Another important requirement in a data grid is the discovery of data/compute resources based on proximity and node capabilities. We have designed and developed Virat to address the above issues and the orthogonal non-functional properties of scalability and fault-tolerance.

 

A platform that can be used for building such generic services must address key issues such as scalability, middleware reconfigurability, dependability, replication and resource/data discovery mechanisms. Existing shared object spaces cannot be used directly as such a platform because they do not scale up. Inefficient mechanisms for handling failures and object lookups and the use of centralized components inhibit their scalability. Virat focuses on the integration of shared object spaces with Peer-to-Peer systems. Virat provides a shared object space abstraction over a wide area distributed system. It is built using a unique two-layered P2P architecture that combines the advantages of structured and unstructured P2P systems. The unstructured layer facilitates capability based neighbourhood formation and allows cluster-level replication of data (and meta-data) to handle failures and to maintain consistency. The structured layer allows failure data to be recovered even across zones/clusters in O(log(N)). Performance studies (over Intranet and WAN testbeds) using a prototype implementation suggests that Virat can scale to millions of objects. For more details on Virat please check the following slides or follow the publications link for Virat papers.

 Slides:

Virat

 

Distributed and Object Systems Lab, IIT Madras

http://dos.cs.iitm.ernet.in/index.shtml

 

 |+| نوشته شده در  جمعه دهم اسفند 1386ساعت 7:27 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Breaking Out of the Core

apples_000004630724XSmall.jpg

I think that one of the most exciting consequences of the rise of multicore is the possibility of overcoming the limitations of the WAN by processing where you collect your data.    It is exceptionally difficult and/or expensive to move large amounts of data from one distant site to another regardless of the processing capability you might gain.  Paul Wallis has an excellent discussion about the economics and other key issues that the business community faces with computing on "The Cloud" in his blog Keystones and Rivets.

So how do cores help us get passed the relatively high costs of the WAN?  The first signs of this trend will be wherever significant amounts of data are collected out in the field.  Currently you have a number of options, none of them great, for retrieving your data for processing.  These include:

  • Provision the bandwidth required to move the data, typically at significant cost.
  • Significantly reduce the size or quality of the data and transmit it more affordably.
  • Write the data to media and collect it on a regular basis

There never really was much consideration given to processing the data in situ because the computational power just was not there.  Multicore processors have allowed us to rethink this. 

For example, consider one of the most sought after goals in a hot industry: near-real time monitoring of a reservoir for oil-production and/or for CO2 sequestration. (see the Intelligent Oilfield, IPCC Special Report on Carbon dioxide Capture and Storage)  The areas where this is most desired tend to be fairly remote such as offshore or in the middle of inhospitable deserts.  There is no network connectivity to speak of to these areas let alone enough to move data from a large multi-component ocean-bottom seismic array like those found in the North Sea.

Consequently, a colleague of mine and I were tasked with how we might implement the company’s processing pipelines in the field.  Instead of processing the data using hundreds of processors and an equivalent number of terabytes of storage everything needed to fit on ***maybe*** as much as a single computer rack.  Our proposal had to include power conditioning and backup, storage, processing nodes, management nodes (e.g. resource managers), as well as nodes for user interaction.  Electrical circuit size limitations also limited our choices.  Needless to say, 30-60 processors just was not enough capacity to seamlessly transition the algorithms from our primary data center.  The only way it could be done was by developing highly specialized processing techniques: a task which could take years.

Now that we are looking at 8 cores per processor with 16 just around the corner everything has changed.  Soon, it will be possible to provision anywhere from 160-320 processors under the same constraints as before.  It is easy to imagine another doubling of this shortly thereafter.  Throw in some virtualization for a more nimble environment and we will be able to do sophisticated processing of data in the field.  In fact, high-quality and timely results could alleviate much of the demand for more intensive processing after the fact.

Who needs the WAN and all of its inherent costs and risks? Why pay for expensive connectivity when you could have small clusters with hundreds of processors available in every LAN? If remote processing becomes commonplace because of multicore, we might see the business community gravitate towards the original vision of the Grid.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه دهم اسفند 1386ساعت 7:23 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

How Will Users Interact With the Cloud?

This is a repost of a reply I wrote to a LinkedIn question

Mark Mathson gave a great answer and blog link in his reply, but it's worth going down one additional level of detail.

A cloud is operated by something. That something is software and people need to be able to interoperate with that software. So the question is twofold.

1) What does that software do.
2) What does the interaction model look like.

Part one is mostly undefined. The term cloud computing is only a few months old at this point and there is no definition that I've seen that describes in detail what the services are and how they work. Since cloud computing is a subset of grid computing we can make some educated guesses as to how this will turn out.

o There will have to be a security model. This model will be complex enough that I'm calling out additional specifics. Currently there is no model specified in any definition of cloud computing.

o That model includes delegation. In the early development of the grid we had a security model without delegation and it was a non-starter. Anytime you need to request something of a service you need to delegate authority to that service.

o That model will have to be multi-institutional. By this I mean that the model must allow people from different communities to be able to access the resources within the cloud without having to join a common security domain. The owner of the resources will have to be able to make local decisions about who is allowed to use his resources.

o Monitoring will be complex, but must run on a common backplane. In the grid community we have hierarchical, distributed monitoring that allows canonical services and a variety of applications to push monitoring information upstream to consumers. No definition of cloud computing currently has any monitoring specification.

o Data handling will be a challenge. In the grid community we discovered early on that moving data between facilities was a bottleneck due to some decisions made in developing TCP decades ago. We worked around these to develop protocols that move data at near theoretical maximum rates even in WAN environments. We also found that people who want to move a lot of data find it cumbersome to manage the processes to do that themselves. We developed 'fire and forget' mechanisms to moving data. A user can make a request, walk away and check the results the next day. As a side note, this behavior requires delegation to work in a secure fashion.

All of the above have to be dealt with before one even begins to contemplate the VM issues that seem to dominate the cloud computing discussions.

The second part is about how the user will interact. That one is much more trivial to answer. Our users already interact in a variety of ways. Some examples include browsers, native applications, java applications, remote desktops and display technologies like x-windows.

All of those will continue to be in play in a cloud based architecture because each has significant structural, administrative and performance advantages that have led to their survival for a long time.

The cloud won't be about what window a user interacts with, it will be about the plumbing that makes that window useful.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه دهم اسفند 1386ساعت 7:21 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Why Should You Use Open Source?

bolts_000000884335XSmall.jpgThe open source justification is no longer the new path that few organizations have walked. I remember in the mid-90's when I switched from Solaris x86 to BSD and then to linux trying to explain what I was doing to co-workers. At that point I wasn't even trying to justify a decision to migrate some production machines, I was just exploring alternatives on my workstations. Still, I got far more confusion and skepticism then nods of understanding.

Today the world is different. People use open source for a wide variety of things. Most folks understand the landscape and regularly use total cost of ownership and risk mitigation as important parts of their final decision. What's still missing, in some cases, is the ability to take advantage of a unique opportunity that open source give you at an infrastructure layer.

Grid software is fundamentally concerned with managing very complex business needs in a manner that allows humans to understand what is going on with their systems. As such one of the most important aspects is the ability to integrate that infrastructure with applications in a manner that allows developers and system integrators to present simpler interfaces to their users.

With proprietary systems there are often APIs that allow this to be done. However, in no instance that I've seen are these APIs on the 'critical path' for the company making the software. They are always offered essentially as a patch that some powerful customer needed and now is slowly leaking out to the rest of the customer base. These systems also tend to be highly unstable and each version carries changes in the API. These changes are frequently radical and nearly always undocumented until a customer comes across something that has stopped working and raises a stink with the vendor.

Open source software tends to work differently, especially at an infrastructure layer. The components are built by folks who are 'eating their own home cooking' and understand the implications of a change in interface. As such, they tend to be infrequent and, when they do occur, highly justifiable. The reduction in quantity of changes is helpful, but because there is no vendor forcing an upgrade, the fact that you can adopt a new version when the timing is right for your organization is also a big plus.

The world has changed. And it's changed for the better for data center managers globally.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه دهم اسفند 1386ساعت 7:14 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

GridLab Testbed All GridLab technologies fit into the GridLab architecture which defines a cleanly layered environment. On the highest layer (called User Space) there is GAT (Application oriented high level API to complex and dynamic Grid Environments) and GridSphere (Grid-Portal development framework). The Middleware layer (called Capability Space) covers the whole range of Grid capabilities as required by applications, users and administrators, such as: GRMS (Grid Resource Management and Brokering Service), Data Access and Management (Grid Services for data management and access), GAS (Grid Authorization Service), iGrid (GridLab Information Services), Delphoi (Grid Network Monitoring & Performance Prediction Service), Mercury (Grid Monitoring infrastructure), Visualization (Grid Data and Visualization Services), Mobile Services (Grid Services supporting wireless technologies). GridLab technologies help real end-users to develop and run their grid-enabled applications. We have been testing our solutions with Cactus (Framework for scientific numerical simulations) and Triana (Visual workflow-oriented data analysis environment).

GridLab Testbed The GridLab Testbed is a Pan-European distributed infrastructure which consists of heterogeneous machines from various academic and research institutions (see its current status). It has been established as a result of collaboration of all GridLab participants and partners in order to provide a real robust grid environment. All our middleware services are deployed and tested on the GridLab Testbed every day (see more details). Moreover, useful statistics from daily software builds on our machines are also available (click here) together with a set of GridLab specific tests, such as matrix tests, functionality and usability of services, etc.

 

Quick links to GridLab Workpackages:
TB WP1 WP2 WP3 WP4 WP5 WP6 WP7 WP8 WP9 WP10 WP11 WP12 WP13

منبع : http://www.gridlab.org

 

 |+| نوشته شده در  جمعه دهم اسفند 1386ساعت 6:59 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Access for Mobile Users

GridLab Mobile Home

GridLab Mobile logo

 

Motivation

Small and flexible mobile devices are increasingly used for web access to various remote resources. This working package wants to provide grid access mechanisms for such devices. This requires adaption of existing access technologies like portals for low bandwidth connectivity and low level end user hardware. The mobile nature of such devices also requires flexible session management and data synchronization. This work package will enhance the scope of present grid environments to the emerging mobile domain. Utilizing the new higher bandwidth mobile interconnects, very useful and previously impossible scenarios of distributed and collaborative computing can be realized.

Place in the GridLab architecture

PlaceInArchitecture.jpg - 31769 Bytes

Access to Grid(Lab) mobile services

The main goal of our efforts is to give the Grid users a possibility to access their applications and resources from any place using mobile devices. According to our approach the devices are incorporated only as the clients of Grid services (not peers). Moreover, because of limitations of mobile devices this approach assumes adopting a gateway between the client and the Grid. These limitations forced us to pay special attention to build flexible user interfaces as well.

We developed (or co-developed) also several specialized mobile-oriented Grid services. In some cases we provided only a mobile wrap-up of the heavyweight Grid services placed in the gateway. Some of them are build from the scratch as a specialised mobile service.

Our mobile client is tighty coupled with the gateway. This "connection" means that features in the mobile client are mapped to corresponding plugins in the gateway. Those plugins are responsible for interacting with Grid services in the name of the mobile client.

Minimal Grid Interface

The below schema presents our approach: we give mobile users access to Grid services via gateway. The mobile client running on a mobile device together with the gateway make up the Minimal Grid Interface.

GridLab model of mobile access to the grid

 

 |+| نوشته شده در  جمعه دهم اسفند 1386ساعت 6:56 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

راديو آلمان: سريع‌ترين ابررايانه براي اهداف غير نظامي وارد شبكه شد

پايگاه اينترنتي راديو "دويچه وله" نوشت:سريع‌ترين ابر رايانه براي اهداف غير نظامي به ارزش ‪ ۱۵‬ميليون يورو به‌طور رسمي وارد شبكه آلمان شد.

به گزارش اين پايگاه اينترنتي، "توماس ليپرت" رئيس مركز رايانه‌اي يوليش اعلام كرد: در فهرست برترين ابررايانه‌ها، يوجين سريع‌ترين ابر رايانه‌اي است كه براي اهداف غير نظامي مورد استفاده قرار مي‌گيرد.

سريع‌ترين ابررايانه‌ي جهان با نام "يوجين" كه براي اهداف غيرنظامي ساخته شده است پنجشنبه گذشته در مركز تحقيقاتي يوليش در نزديكي شهر كلن آلمان، كار خود را به‌طور رسمي آغاز كرد.

اين منبع گزارش كرد: "ابررايانه‌ي مركز تحقيقات هسته‌اي ايالات متحده‌ي آمريكا در فلوريدا كه كاربرد نظامي دارد، تنها ابررايانه‌اي است كه از يوجين سريع‌تر است.

كيس يوجين بزرگ‌تر از آن است كه زير يك ميز كار جا بگيرد. اين ابررايانه كه ابعاد آن‌هم برازنده‌ي نامش است، از ‪ ۶۵‬هزار پردازنده تشكيل شده است و اين پردازنده‌ها در ‪ ۱۶‬محفظه قرار داده شده‌اند كه هر محفظه به بزرگي يك كيوسك تلفن عمومي است.

يوجين به همراه يومپ و يوبل، دو ابررايانه‌ي ديگر مركز تحقيقاتي يوليش، در يك سالن بزرگ جاي گرفته است.

بازديدكنندگان از اين سالن بايد گوشي روي گوش‌هاي خود بگذارند زيرا تعداد زيادي دستگاه تهويه و دمنده در اين سالن، هوا را در قفسه‌ها تهويه مي‌كنند.

هر محفظه در حدود ‪ ۳۰‬كيلووات گرما توليد مي‌كند، دستگاه‌هاي تهويه دائم كار مي‌كنند تا دماي ‪ ۱۶‬درجه‌ي سانتيگراد را براي يوجين فراهم كنند، اين دما، دمايي است كه اين ابر رايانه بهترين بازده را در آن دارد.

يوجين قادر است در هر ثانيه ‪ ۲۲۳‬هزار ميليارد محاسبه انجام دهد، رقمي غير قابل تصور كه معادل سرعت محاسبه‌ي ‪ ۲۰‬هزار رايانه است.

مي‌توان اين‌طور تصور كرد كه هر كدام از هفت ميليارد نفر جمعيت كره‌ي‌زمين در يك ثانيه و به‌طور هم‌زمان، ‪۳۰‬هزار محاسبه‌ي رياضي را انجام دهند.

البته اين محاسبه به‌طور حتم ‪ ۱+۱‬نخواهد بود بلكه محاسبات پيچيده‌تر اعشاري يا چيزي شبيه آن. چنين بازدهي بدون شك از توان يك پردازنده خارج است.

در يوجين هر پردازنده، يك بخش از كار را انجام مي‌دهد، به‌همين دليل مهم‌ترين نكته در يك ابررايانه اين است كه شبكه‌اي منسجم بتواند نتيجه‌ي محاسبات پردازنده‌ها را به‌صورت يك خروجي ارائه دهد.

شبكه‌ي ارتباطي بين پردازشگرها بايد پايدار و بسيار سريع باشد تا بتواند داده‌ها را دائم از يك پردازنده به ديگري منتقل كند.

از اين پس حدود ‪ ۲۰۰‬گروه تحقيقاتي آلماني و اروپايي مي‌توانند در پروژه‌هاي خود روي كمك يوجين حساب كنند.

يك هيئت نظارت مستقل تصميم مي‌گيرد كه كدام پروژه اولويت دارد.

شبيه‌سازي‌هاي رايانه‌اي اكنون مدت‌هاست كه در كنار نظريه و آزمايش، ركن سوم دانش را تشكيل مي‌دهند.

يكي از كارشناساني كه پايان‌نامه‌ي دكتري خود را در مركز محاسبات يوليش مي‌نويسد، معتقد است: وقتي دارويي توليد مي‌شود، آزمايش‌هاي بيشماري بايد روي آن صورت گيرد تا وارد بازار شود.

انجام اين آزمايش‌ها در دنياي واقعي، هزينه‌ي بسيار هنگفتي در بر خواهد داشت، بدين ترتيب با شبيه‌سازي‌هاي رايانه‌اي دست‌كم مشخص مي‌شود كه كار در چه جهتي پيش مي‌رود.

وي در مثال ديگري ادامه مي‌دهد: زماني‌كه يك ستاره‌شناس نياز به يك آزمايش دارد، آوردن يك ستاره، روي ميز كار آزمايشگاه فيزيك، براي او بسيار بسيار گران تمام خواهد شد، وانگهي اين‌كار ميلياردها سال نيز طول خواهد كشيد.

به‌همين خاطر آزمايش‌هاي رايانه‌اي از اهميت خاصي برخوردارند.

كارشناس ديگري در حال انجام آزمايشي در زمينه‌ي فيزيك كوانتوم است. وي و هم‌كارانش در حال بررسي قوي‌ترين نيروي طبيعت هستند.

نيروي قدرتمند هسته‌ي اتم كه كوارك را به نوترون يا پروتون پيوند مي‌دهد و آنها را در هسته‌ي اتم نگه مي‌دارد.

آزمايش در اين زمينه، در آزمايشگاه و با روش‌هاي موجود، قابل انجام نيست. جايي‌كه فيزيك از پاسخ باز مانده است، رايانه تنها راه‌حل موجود به‌نظر مي‌رسد.

مدير موسسه‌ي يوليش، مزيت اصل اين ابررايانه را در مقايسه با هم‌نوعان خود، صرفه‌جويي قابل توجه در مصرف انرژي در برابر توانايي آن در محاسبه مي‌داند. او از اين نظر يوجين را يك ابر رايانه‌ي سبز مي‌داند.

اين‌كه آيا يوجين جايگاه ممتاز خود را در ميان ابررايانه‌هاي ديگر طولاني‌مدت حفظ خواهد كرد، جاي سوال است.

هم‌اكنون مهم‌ترين رقيب يوجين، "رنجر" ، ابررايانه‌ي دانشگاه تگزاس آمريكا است كه به زودي كار خود را آغاز خواهد كرد.

به گفته‌ي يكي از مديران موسسه‌ي يوليش، هر سال دو بار ليست ‪۵۰۰‬ ابررايانه‌ي برتر دنيا منتشر مي‌شود كه در طول سال‌هاي اخير هر بار همواره ‪ ۲۵۰‬تا ‪ ۳۰۰‬رايانه‌ي جديد وارد ليست شده‌اند.

بنابراين نبايد از آمدن رايانه‌ي سريع‌تري تعجب كرد."

 

 |+| نوشته شده در  سه شنبه هفتم اسفند 1386ساعت 11:57 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

ZoneSrb_300x182.jpg

SRBThe SDSC Storage Resource Broker – supports shared collections that can be distributed across multiple organizations and heterogeneous storage systems. The SRB can be used as a Data Grid Management System (DGMS) that provides a hierarchical logical namespace to manage the organization of data (usually files).

The SRB software infrastructure can be used to enable Distributed Logical File Systems, Distributed Digital Libraries, Distributed Persistent Archives, and Virtual Object Ring Buffers. The most common usage of SRB is as a Distributed Logical File System (a synergy of database system concepts and file systems concepts) that provides a powerful solution to manage multi-organizational file system namespaces.

SRB presents the user with a single file hierarchy for data distributed across multiple storage systems. It has features to support the management, collaboration, controlled sharing, publication, replication, transfer, and preservation of distributed data. The SRB system is middleware in the sense that it is built on top of other major software packages (file systems, archives, real-time data sources, relational database management systems, etc). The SRB has callable library functions that can be utilized by higher level software. However, it is more complete than many middleware software systems as it implements a comprehensive distributed data management environment, including end-user client applications ranging from Web browsers to Java class libraries to Perl and Python load libraries.

منبع : http://www.sdsc.edu/srb/index.php/Main_Page

 

 |+| نوشته شده در  یکشنبه پنجم اسفند 1386ساعت 10:41 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

What is OGSA-DQP?

OGSA-DAI components are either data access components or data integration components. A Distributed Query Processing (DQP) system is an example of a data integration component and can potentially provide effective declarative support for service orchestration as well as data integration. The service-based DQP framework described in [1],[2] provides an approach that:

  • supports queries over OGSA-DAI data services and over other services available on the Grid, thereby combining data access with analysis;
  • adapts techniques from parallel databases to provide implicit parallelism for complex data-intensive requests; and
  • uses the emerging standard for Grid data services to provide consistent access to database metadata and to interact with databases on the Grid.

The service-based DQP framework consists of the following two services:

  • Grid Distributed Query Service (Coordinator). The Grid Distributed Query Service (GDQS), or coordinator, is the main interaction point for the clients. When a coordinator is set up, it obtains the metadata and computational resource information that it needs to compile, optimise, partition and schedule distributed query execution plans over multiple execution nodes in the Grid. The implementation of the coordinator builds on a previous work on the Polar* distributed query processor for the Grid [3],[4] by encapsulating its compilation and optimisation functionality. The coordinator is currently implemented as a set of OGSA-DAI data service resources and activities.

  • Query Evaluation Service (Evaluator). The Query Evaluation Service (QES), or evaluator, is used by the coordinator to execute query plans generated by the query compiler, optimiser and scheduler. Each evaluator evaluates a partition of the query execution plan assigned to it by a coordinator. A set of evaluators participating in a query form a tree through which the data flows from leaf evaluators which interact with Grid data services, up the tree to reach its destination.

As well as using the services provided by OGSA-DAI data services, the coordinator is itself implemented as an OGSA-DAI data service, and thus can be discovered and invoked in the same way as other OGSA-DAI data services. Consequently, the Grid stands to benefit from OGSA-DQP, through the provision of facilities for declarative request formulation that complement existing approaches to service orchestration, via uniform interfaces and interaction semantics.

Figure 1 provides an overview of the interactions during the instantiation and set-up of a OGSA-DQP coordinator as well as those that take place when a query is received and processed via a set of evaluators. The components in this figure and the numbered interactions between each component are now described. The 3-dot sequence in this figure can, as usual, be read as `and so on, up to'. This description of OGSA-DQP is intended to give a high level overview of the system.

Setting up and executing queries using OGSA-DQP


Figure 1: Setting up and executing queries using OGSA-DQP

1: An OGSA-DQP coordinator consists of two types of OGSA-DAI data service resources: GDQS factory data service resources and GDQS data service resources. Initially, an installed coordinator service will expose only a GDQS factory data service resource. This data service resource is then used to create GDQS data service resources which can be used by a client to execute queries.

In this first step in the interaction between a client and OGSA-DQP, the client uses a deployed GDQS factory data service resource to create a configured GDQS data service resource. The client interacts with the GDQS factory data service resource by sending an OGSA-DAI perform document which specifies that a DQPFactory activity should be executed. The DQPFactory activity is able to interact with a GDQS factory data service resource in order to dynamically deploy a GDQS data service resource. The DQPFactory activity is parameterised by an XML document which specifies exactly how the deployed GDQS data service resource should be configured. Configuration parameters include the databases and evaluators which can be utilised by the data service resource which is to be created. The result of this interaction is that a GDQS data service resource is created and initialised. The coordinator service now exposes this dynamically deployed GDQS data service resource and it is automatically assigned a resource ID by OGSA-DAI.

2: During the initialisation of the GDQS data service resource, the schemas of the databases it will use are imported by contacting the OGSA-DAI data services which wrap these databases.

3: The client receives the result of the perform document submitted in step 1. This result contains the resource ID needed by the client to identify the created GDQS data service resource in subsequent interactions with this data service resource.

[Note] steps 1-3 need not take place if a GDQS data service resource already exists which imports the databases and analysis services required by a client (if this is the case, the client should contact the existing GDQS data service resource directly). Each GDQS data service resource is able to process multiple concurrent queries and the GDQS data service resource is not terminated by a client following a query session. Steps 1-3 represent a setup process which is necessary to configure a GDQS data service resource for use by one or more clients.

4: The client submits a perform document containing a query. Queries are written in OQL and are executed by the OQLQueryStatement activity. The GDQS data service resource uses the Polar* query compiler to parse, optimise and schedule the query. A query plan is created, consisting of a number of partitions. Each partition specifies an individual evaluator's role in the query plan.

5: Query partitions are sent to the relevant evaluator services.

6: Some evaluators interact directly with OGSA-DAI data service to obtain data.

7: Other evaluators may interact with other evaluators to implement their role in the execution of the query.

8 - 9: Results propagate back from the evaluators to the coordinator and eventually back to the client.

[Note] OGSA-DQP is also able to invoke Web services from within queries. This is not illustrated in Figure 1 in order to preserve the clarity of the figure and its associated description. Also omitted from the figure are the resource properties made available by the GDQS data service resource. Following initialisation, the GDQS data service resource provides a resource property enabling the client to obtain a description of the database schemas imported by OGSA-DQP.

References

[1] M. N. Alpdemir, A. Mukherjee, N.W. Paton, P.Watson, A. A. Fernandes, A. Gounaris, and J. Smith. Service-based distributed querying on the grid. In the Proceedings of the First International Conference on Service Oriented Computing, pages 467-482. Springer, 15-18 December 2003.

[2] M.Nedim Alpdemir, Arijit Mukherjee, Norman W. Paton, Paul Watson, Alvaro A.A. Fernandes, Anastasios Gounaris, and Jim Smith. OGSA-DQP: A service-based distributed query processor for the Grid. In Simon J. Cox, editor, Proceedings of UK e-Science All Hands Meeting Nottingham. EPSRC, 24 September 2003.

[3] J. Smith, A. Gounaris, P. Watson, N. W. Paton, A. A. A. Fernandes, and R. Sakellariou. Distributed Query Processing on the Grid. In Proc. Grid Computing 2002, pages 279-290. Springer, LNCS 2536, 2002.

[4] J. Smith, A. Gounaris, P. Watson, N. W. Paton, A. A. A. Fernandes, R. Sakellariou, Distributed Query Processing on the Grid, Intl. J. High Performance Computing Applications, Vol 17, No 4, 353-368, 2003 (Extended Version of Grid 2002 paper selected for publication in special issue).

منبع : http://www.ogsadai.org.uk/about/ogsa-dqp

 

 |+| نوشته شده در  یکشنبه پنجم اسفند 1386ساعت 10:39 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Sun Expands Grid Application Offerings
February 19, 2008
By Paul Shread

Sun Microsystems has added 14 new applications to its Network.com Application Catalog of online grid-enabled applications available from the Sun Grid compute utility service on a pay-per-use basis.

Sun also launched a new partner program, Sun Network.com Connection, for independent software vendors (ISVs) to create on-demand service offerings at lower risk and cost, with access to new sales channels. Sun also added the Netherlands to the list of 25 countries where the services can be utilized.

The latest additions bring the grid service's total number of "Click and Run" applications to 39.

Mark Herring, Sun's senior director of software marketing, said Network.com "is evolving into a virtual on-demand data center that allows businesses of any size to leverage compute infrastructure without the cost of ownership and with the flexibility of scaling up or down compute resources in real time as business demands change."

The latest applications include Blender, open source tools for modeling, rendering, animation, post-production, creation and playback of interactive 3D content. Sun is sponsoring Blender Foundation's open movie "Peach," a short 3D animation by artists and developers in the Blender community, with grants of CPU hours in Network.com.

Other new open source applications include Zeus (a life sciences application), GAP (a computational mathematics application) and OOFEM (a computer aided engineering application).

In addition to the Solaris 10-based grid platform, Network.com provides developers and open source communities with tools, resources and an active grid developer community that helps them build and test on-demand applications.

Sun's Network.com provides access to compute infrastructure on a pay-per-use basis via its Sun Grid compute utility at $1 per CPU hour.

منبع : http://www.gridcomputingplanet.com

 

 |+| نوشته شده در  یکشنبه پنجم اسفند 1386ساعت 9:58 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

How to Rank High Throughput Computing Enviroments

engine_000005010235XSmall.jpg

TOP500 lists computers ranked by their performance on the LINPACK Benchmark. It is clear that no single number can reflect the performance of a computer. Linpack is, however, a representative benchmark to evaluate computing platforms as High Performance Computing (HPC) environments, that is in the dedicated execution of a single tightly coupled parallel application. On the other hand, an HTC application comprises the execution of a set of independent tasks, each of which usually performs the same calculation over a subset of parameter values. Although, the HTC model is widely used in Science, Engineering and Business, there is not representative bechmark and model to evaluate the performance of computing platforms as HTC environments. At first sight, it could be agued that there is no need for such a performance model. We agree on this for static and homogeneous systems. However, how can we evaluate a system consisting of heterogeneous and/or dynamic components?.

Benchmarking of Grid infrastructures has always been a highly polemic area. The heterogeneity of the components and the high number of layers in the middleware stack make difficult even to define the aim and scope of the benchmark. A couple of years ago we wrote a paper entitled "Benchmarking of High Throughput Computing Applications on Grids" (R. S. Montero, E. Huedo and I. M. Llorente) for the Parallel Computing Journal presenting a pragmatic approach to evaluate the performance of a Grid infrastructure when running High Throughput Computing (HTC) applications. We demonstrated that the complexity of a whole Grid infrastructure can be represented by only two performance parameters, which can be used to compare infrastructures. The proposed performance model is independent from the middleware stack and valid for any computing infrastructure, so being also applicable for the evaluation of clusters and HPC servers.

The Performance Model

Our proposal is to follow an approach similar to that used by Hockney and Jesshope to characterize the performance of homogeneous array architectures on vector computations. A first-order description of a Grid can be made by using the following formula for the number of tasks completed as a function of time:

n(t)=R*t-N

Note that given the heterogeneous nature of a Grid, the execution time of each task can differ greatly. So the following analysis is valid for general HTC applications, where each task may require distinct instruction streams. The coefficients of the line are called:

  • Asymptotic performance (R): the maximum rate of performance in tasks executed per second. In the case of an homogeneous array of P processors with an execution time per task T, we have R = P/T.
  • Half-performance length (N): the number of task required to obtain the half of the asymptotic performance. This parameter is also a measure of the amount of parallelism in the system as seen by the application. In the homogeneous case, for an embarrassingly distributed application we obtain N = P/2.

The above linear relation  can be used to define the performance of the system (tasks completed per second) on actual applications with a finite number of tasks:

r(n)=R/(1+N/n)

graph.png

Interpretation of the Parameters of the Model

This linear model can be interpreted as an idealized representation of a heterogeneous Grid, equivalent to an homogeneous array of 2N processors with an execution time per task 2* N/R.

equivalencia-grid-homogeneo.jpg

The half-performance length (N), on the other hand, provides a quantitative measure of the heterogeneity in a Grid. This result can be understood as follows, faster processors contribute in a higher degree to the performance obtained by the system. Therefore the apparent number of processors (2N), from the application's point of view, will be in general lower than the total processors in the Grid (P). We can define the degree of heterogeneity (m) as 2N/P. This parameter varies form m = 1 in the homogeneous case, to m = 0 when the actual number of processors in the Grid is much greater than the apparent number of processors (highly heterogeneous).

N is an useful characterization parameter for Grid infrastructures in the execution of HTC applications. For example, let us consider two different Grids with a similar asymptotic performance. In this case, by analogy with the homogeneous array, a lower N parameter reflects a better performance (in terms of wall time) per Grid resource, since the same performance (in terms of  throughput) is delivered by a smaller ‘‘number of processors''.

comparacion-de-infraestructuras.jpg

The Benchmark

We propose the OGF DRMAA implementation of the ED benchmark in the NAS Grid Benchmark suite, with an appropriate scaling to stress the computational capabilities of the infrastructure, as benchmark to apply the performance model. The ED benchmark comprises the execution of several independent tasks. Each one consists in the execution of the SP flow solver  with a different initialization parameter for the flow field. These kind of HTC applications can be directly expressed with the DRMAA interface as bulk jobs.

DRMAA represents a suitable and portable API to express distributed communicating jobs, like the NGB. In this sense, the use of standard interfaces allows the comparison between different Grid implementations, since neither NGB nor DRMAA are tied to any specific Grid infrastructure, middleware or tool. DRMAA is implemented with the following available Resource Manager systems: Condor, LSF, Globus GridWay, Grid Engine and PBS.

sp-drmaa.png

In the paper we present both an intrusive and a non-intrusive methods to obtain the performance parameters.  The light-weight non-intrusive probes provide continual information on the health of the Grid environment, and so a way to measure the dynamic capacity of the Grid, which could eventually be used to generate global meta-scheduler strategies.

An Invitation to Action

We have demonstrated in several publications how the first-order model reflects performance of complex infrastructures running HTC applications. So, why don't we create a TOP500-like ranking of infrastructures?. The ranking could be dynamic, obtaining the parameters with the non-intrusive probes. We have all the ingredients:

  • A model representing the achieved performance by using only two parameters: asymptotic performance (R) and half-performance length (N)
  • A benchmark representative of HTC applications: embarrassingly distributed test  included in the NAS Grid Benchmark suite
  • A standard to express the benchmark: OGF DRMAA

Ignacio Martín Llorente
Reprinted from blog.dsa-research.org

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  شنبه چهارم اسفند 1386ساعت 7:45 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

How to Monitor Grid Engine

world_000003129409XSmall.jpgYou have built and installed your shiny new cluster, installed the Grid Engine software, configured the queues, and announced to the world that your new system is ready to be used. What next? Well, think about your monitoring options…

As users start submitting jobs and hammering the system in every possible way, things will inevitably break on occasion. When something goes wrong in the system, you will want to know about the problem before you start receiving help desk calls and user emails.

The first step in developing an effective strategy for monitoring Grid Engine is learning how to use the available command line tools and how to look for possible issues in the system. Some of the things that you should always pay attention to include:
• queues in the unknown state; instance queue in an unknown state usually means that execution daemon is down on that particular host
• queues and jobs in the error state
• configuration inconsistencies
• load alarms
All of the above information can be easily obtained using the qstat command (e.g., try something like “qstat -f -qs uaAcE -explain aAcE”). It is also not difficult to script basic GE monitoring tasks and come up with a simple infrastructure that is able to alert system administrators to any new or outstanding problems in the system.

As your user base grows, so will your monitoring needs, and you will likely want to extend your monitoring tools. You should consider looking into existing software packages like xml-qstat, which uses XSLT transformations to render Grid Engine command line XML output into different output formats. Alternatively, you can also develop set of your own XSL stylesheets that are customized to your needs, and use widely available command line tools such as xsltproc to generate monitoring web pages from the “qstat -xml” output.

Another interesting Grid Engine monitoring option is the Monitoring Console that comes with Cluster Express (CE). Its main advantage is that it integrates monitoring data from several different sources: Ganglia (system data), Grid Engine Qmaster and ARCo database (job data). However, even though the Cluster Express by itself is easy to install, at the moment integrating the CE Monitoring Console with existing Grid Engine installation requires a little bit of work. I am told that this will be much simplified in the upcoming CE release. In the meantime, if you are really anxious to try the CE Monitoring GUI on your Grid Engine cluster, do not hesitate to send me an email…

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  شنبه چهارم اسفند 1386ساعت 7:44 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

My Head in the Clouds

clouds_000003876801XSmall.jpg

Up until I read Ian Foster’s Cloud Computing post, I had paid little attention to what the term meant to people.  Personally, I had already chalked up the idea as a rebranding of Grid computing.  So I asked a number of friends what they thought the differences between the two were.  Of course many people not actively involved in the community are not familiar with either concept.   (I find the fact that computer professionals know what the latest buzz is around SaaS, and SOA but do not seem to consider how and where they might land these systems peculiar.)  In any event, here is a summary of the answers I received:

  • Grid is characterized by more formalized computing arrangements between user and vendor whereas a Cloud is more for ad hoc resource utilization;
  • The types of computation on a Grid are typically parallel in nature whereas Clouds are for more simple calculations;
  • Grid was usurped by vendors to indicate that their services were distributed for better performance and reliability while Cloud had become the term for a generalized set of distributed resources;
  • Clouds are ethereal – anybody who watches them as they cross the sky knows that…

While I found it rather interesting that while there was some overlap of technical perspectives, I did not get any answers that were identical.  I believe that the last description explains this situation nicely while also offering the most interesting take on the topic.  A Cloud is a nuanced term that invokes the idea of something beautiful which also evolves rapidly, contains a lot of power, and then is gone.  Meanwhile the grid, like the utilities it was conceived from, is known for its reliability, ability to tap into reserve power on a moments notice, as well as their accommodating levels of service.  Notice how the first two answers all adhere to this concept?  I don’t think this is an accident nor do I think that this escaped the attention of the marketing departments of industry-leaders like Amazon and Google, both of whom operate in what they term Cloud space.  While it is distinctly possible that the term organically evolved, it is interesting that they chose to stick with it.

Once more, I found it particularly noteworthy that not one person I queried mentioned the amount of data to be processed. Foster and the Business Week article he references, as well as many others, suggest that we need to think in terms of a great deal more data than we have before.  For example, Google wants their people to think in terms of a thousand times more data than that to which they are accustomed.

Heck, I was thinking about writing about the so-called “Data Tsunami” myself – but not in terms of thinking about significantly larger datasets. The datasets we were working with a decade ago were suitably massive for what we were trying to accomplish.  Like today, it was not economically feasible to keep it all online at once.  The fact is that the incredible leaps in computational capacity have led us to build more complicated problems that demand still more data.  As such, a thousand times more data is probably still not enough.  If only the networking and storage companies had kept up with the leaps in processing capacity (I was on a gigabit network five years ago and I am using a gigabit network today >sigh<).

Consequently we still have to use the tried and true standard operating procedure of:

  • First carefully selecting a fraction of the data available for storage (i.e. triggering);
  • Next this rough dataset is pruned further by pre-processing to find the most interesting records;
  • Finally detailed analysis is performed on what is computationally feasible.

For example, the Large Hadron Collider will be examining billions of collisions per second but will only store a few hundred per second for later processing (see recent article on Dr. Heuer). We have always needed to think in terms of a thousand times more data than we can possibly process or to become accustomed.  Basically the scope of what is economically feasible has changed dramatically over the last few decades, while we continue to be quite resource constrained.  Which brings us back to the concept of capacity computing, whether in the form of a transitory Cloud, a steadfast Grid, or even the comfortable @home project.  The key here is that people are continuing to push passed the boundaries of what is feasible.

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  شنبه چهارم اسفند 1386ساعت 7:42 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

The Hot Trend for 2008

abacus_000002770572XSmall.jpgFortune 400 businesses, and many smaller ones, run clusters in many different locations around the world.
As facility, management and other costs continue to become larger and large shares of corporate IT budgets, networking costs continue to fall. The result is that data center consolidation becomes a more reasonable goal.

I've seen this in a few of my customers. Beginning last year people started looking more toward grid technology to help them manage this. As the economy has tightened more people have considered this. Particularly as part of a plan toward cost reduction by moving to open source tools.

The general pattern is that the IT group decides they need to find ways to more effectively manage large and disconnected sets of resources. They turn to grid computing to help them manage that cloud and in the process realize that they have a lot of special purpose machines that are being quite underutilized and that they have enormous duplication of effort in the management of those data centers.

As we've entered into a bear market, many companies are taking a second look at their IT costs and looking for ways to tighten their belts. The combination of open source and grid/cloud computing models offers the ability to do that with open source offering a lower cost software acquisition model and grid computing allowing reduction in IT staff through centralization.

I've also been working with folks on the lost art of environment management. But more on that in a future blog...

منبع : http://gridgurus.typepad.com

 

 |+| نوشته شده در  شنبه چهارم اسفند 1386ساعت 7:39 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Gridbus/GRIDS Lab Annual Report

The GRIDS Lab and the Gridbus Project is pleased to release Annual
Report of its key activities and outcomes during the academic year 2007.
Please browse:

http://www.gridbus.org/reports/GRIDS-Lab-AnnualReport2007.pdf

منبع : http://www.gridbus. org

 

 |+| نوشته شده در  پنجشنبه دوم اسفند 1386ساعت 9:28 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

TeraGrid '08

TeraGrid '08 Welcomes Papers, Posters, Demos, and Visualizations!

The 3rd annual conference will showcase TeraGrid's impact in research and education through presented papers, demos, posters, and visualizations. TG08 will foster collaborations among leading researchers, developers, and educators that build on the growing TeraGrid infrastructure. TG08 will also provide information and training to enable current and future users to achieve maximum impact using TeraGrid resources and services. All interested individuals and organizations are invited to participate. Visit the Call for Participation.

  • Registration Open: March 1, 2008
  • Applications for student volunteers due: March 14, 2008
  • Abstracts of 500 Words or Less Due: March 18, 2008
  • Full Papers of 7-10 Pages Due: April 1, 2008
  • Demonstration, Poster, and Visualization Abstracts Due: April 18, 2008
  • Final Accepted Papers for Publication Due: May 5, 2008
  • Notifications of Posters and Visualizations: May 5, 2008
  • Challenge Competition Updates Due: May 30, 2008

For complete conference details, visit http://www.tacc.utexas.edu/tg08/index.php

منبع : http://www.teragrid.org

 

 |+| نوشته شده در  پنجشنبه دوم اسفند 1386ساعت 9:17 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

درباره TeraGrid

TeraGrid is an open scientific discovery infrastructure combining leadership class resources at eleven partner sites to create an integrated, persistent computational resource.

Using high-performance network connections, the TeraGrid integrates high-performance computers, data resources and tools, and high-end experimental facilities around the country. Currently, TeraGrid resources include more than 750 teraflops of computing capability and more than 30 petabytes of online and archival data storage, with rapid access and retrieval over high-performance networks. Researchers can also access more than 100 discipline-specific databases. With this combination of resources, the TeraGrid is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research.

TeraGrid is coordinated through the Grid Infrastructure Group (GIG) at the University of Chicago, working in partnership with the Resource Provider sites: Indiana University, Oak Ridge National Laboratory, National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing Center, University of Chicago/Argonne National Laboratory, the National Institute for Computational Sciences, the Louisiana Optical Network Initiative, and the National Center for Atmospheric Research.

منبع : http://teragrid.org/about

 

 |+| نوشته شده در  پنجشنبه دوم اسفند 1386ساعت 9:15 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

 

 

 

3Tera's Grid University provides live online training on various aspects of grid and utility computing. If you are a new AppLogic user, a 3Tera partner, or just want to get a better feel of what grid computing is all about, Grid University is the easiest way to learn. Grid University classes are taught by experienced 3Tera application engineers and developers who are building and troubleshooting grids and grid applications for a living. All you need to attend classes is a browser and a broadband Internet connection. Best of all, Grid University is free!


Signing up for Grid University is easy. First, pick a class you'd like to attend from the table below. Click on the class name to read the synopsis. Click on the calendar icon to register. You will receive an email with the URL and WebEx meeting info required to login to the class. We recommend that the first time you login to a Grid University class, you allow 15 min for setting up WebEx on your computer. Enjoy!



Indicates recorded sessions (click on the icon to play the session)

Level

General Knowledge

Grid Usage

Grid Maintenance

Introduction

What's New in AppLogic 2.1
Overview of new features and functionality

Foundations

Introduction to Grid and
Utility Computing
   Play Recording

Overview of AppLogic    Play Recording

Data Center Architecture    Play Recording

Application Operation

AppLogic Applications    Play Recording

Application Provisioning    Play Recording

Custom Application Deployment

Application Migration

Grid Provisioning    Play Recording

Appliances

AppLogic Appliances    Play Recording

Custom Appliances    Play Recording

NEW! Hands-on Custom Appliances

Creating Custom Appliance Catalog

New Linux Distro Appliances

Catalog Upgrades

Application Development

Sample Applications    Play Recording

Application architecture and development

Building Applications for Scalability

Creating Assemblies

Installing cPanel on Applogic

Adding / Removing Resources    Play Recording

Grid Maintenance

 

Volume Maintenance

Failure Handling and Recovery

Automation

Troubleshooting common issues

Upgrades / Downgrades    Play Recording

Advanced Topics

Theory of Operation

High Availability

Scalable cPanel Application Overview (by request)


Backup and disaster recovery strategies

 

 

 

 

 

 

 

 

 

http://www.3tera.com/grid_university.html

 

 

 |+| نوشته شده در  دوشنبه بیست و نهم بهمن 1386ساعت 10:45 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Grid, more Grid

I’m a bit behind some of the other early movers…

3tera.  Taking grid and virtualization in a different direction.  They provide services for entire virtual clusters, virtual data centers, and more.

If implementing massive super computers and data centers becomes little more than filling in a sales web form, watch out hardware, hosting, and desktop sellers.

Perhaps google will get some competition now that massive CPU resources are being made available to anyone with an idea.

 

 |+| نوشته شده در  دوشنبه بیست و نهم بهمن 1386ساعت 10:42 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Britain's fastest supercomputer unveiled

'Hector' offers the power of 12,000 desktop PCs
 

January 14, 2008 (Reuters) -- A supercomputer that could help answer some of science's biggest questions is being unveiled today.

With the power of 12,000 desktop PCs, the mammoth machine called Hector is the U.K.'s fastest computer and one of the most powerful in Europe. It can make 63 trillion calculations each second, allowing scientists to conduct research into everything from climate change to new medicines.

The machine is housed in 60 wardrobe-sized cabinets in the University of Edinburgh's advanced computing center near the Scottish capital. After years of development, Chancellor Alistair Darling is due to attend the official launch ceremony for the machine, which cost £113 million.

Hector, which stands for High-End Computing Terascale Resource, was made U.S. manufacturer Cray Inc.

"Hector will enable us to do research that we simply could not do in any other way," said Jane Nicholson, a researcher at the Engineering and Physical Sciences Research Council, the public body that acts as the project's managing agent. "We want to push forward the boundaries of knowledge."

Researchers plan to tap into the computer's power to study ocean currents, build tiny parts for advanced computers and make warplanes less visible to radar. Other projects include research into superconductors, combustion engines and new materials. Scientists working in fields ranging from cosmology and atomic physics to disaster simulation and health care will also use the computer.

Despite its vast power, Hector falls short of the power produced by the world's biggest computer: Blue Gene/L. Housed at the Lawrence Livermore National Laboratory in California, Blue Gene is used to study nuclear weapons without the need for underground testing.

Editing by Steve Addison.


Reprinted with permission from

This article is reprinted by permission from Reuters.com, Copyright (c) 2006 Reuters. Reuters content is the intellectual property of Reuters or its third-party content providers. Any copying, republication or redistribution of Reuters content, including by caching, framing or similar means, is expressly prohibited without the prior written consent of Reuters. For additional information on other Reuters Services, visit the Reuters public Web site.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:18 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

OLPC CTO Mary Lou Jepsen quits nonprofit effort

She's off to commercialize tech she invented in the OLPC development process
 

 (IDG News Service) -- The One Laptop Per Child project suffered a blow this week, with Chief Technology Officer Mary Lou Jepsen quitting the nonprofit to start a for-profit company to commercialize technology she invented with OLPC.

Jepsen, who joined OLPC as its first employee in 2005 after Nicholas Negroponte started the effort, will pursue an opportunity to chase after "her next miracle in display technology," OLPC said in an e-mail sent on Sunday.

Jepsen was responsible for hardware and display development for the rugged and power-saving XO laptop, designed for use by children in developing countries. Though the laptop has struggled to find buyers, it has been praised for its innovative hardware features and environmentally friendly design.

Her last day with the organization is Dec. 31, though she will continue consulting with OLPC, according to the e-mail. Dec. 31 is also the end of OLPC's Give One Get One program, in which two XO laptops can be purchased for about US$400, with a user getting one laptop and the other being donated.

Satisfied that XO laptops were shipping in volume, Jepsen noted in an e-mail that she was starting a for-profit company to commercialize some of the technologies she invented at OLPC.

"I will continue to give OLPC product at cost, while providing commercial entities products they would like at a profit," Jepsen wrote in an e-mail.

"I believe that the work I led in the design of the XO laptop is just the first step in changing computing," she wrote.

Powered by solar power, foot pedal or pull-string, the laptop doesn't rely on an electrical outlet to run, making it useful for situations where power is unreliable or unavailable. The laptop consumes between 2 watts to 8 watts of electricity from a specially designed lithium-ferro phosphate battery depending on usage, compared to 40 watts on commercial laptops depending on usage.

The laptop's battery lasts up to 21 hours because of custom-designed, efficient power-saving features implemented at the hardware and software level. Batteries in commercial laptops may explode at high temperatures, while XO's batteries can run and recharge in temperatures around 100 degrees Fahrenheit (38 degrees Celsius), Jepsen said in earlier interview.

OLPC is also designing a cow-powered generator that works by hooking cattle up to a system of belts and pulleys.

For connectivity, the laptop has mesh-networking features for Internet access.

An earlier version of this article incorrectly stated the OLPC's power consumption features, and the eighth paragraph was updated on 1/2/08 to reflect the accurate information.


Reprinted with permission from


Story copyright 2006 International Data Group. All rights reserved.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:16 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Insider charged with hacking California canal system

Ex-supervisor installed unauthorized software on SCADA system, indictment says
 

November 29, 2007 (IDG News Service) -- SAN FRANCISCO -- A former employee of a small California canal system has been charged with installing unauthorized software and damaging the computer used to divert water from the Sacramento River.

Michael Keehn, 61, former electrical supervisor at the Tehama Colusa Canal Authority (TCAA) in Willows, Calif., faces 10 years in prison on charges that he "intentionally caused damage without authorization to a protected computer," according to Keehn's Nov. 15 indictment. He did this by installing unauthorized software on the TCAA's Supervisory Control and Data Acquisition (SCADA) system, the indictment states.

Keehn accessed the system on or about Aug. 15, according to the indictment. He is set to appear in federal court on Dec. 4 to face charges of computer fraud.

As an electrical supervisor with the authority, he was responsible for computer systems and is still listed as the contact for the organization's Web site.

With a staff of 16, the TCAA operates two canals, the Tehama Colusa Canal and the Corning Canal, that provide water for agriculture in central California, near the city of Chico. Both systems are owned by the federal government.

The security of SCADA systems, which are used to control heavy machinery in industry, has become a hot-button topic in recent years. In September, video of an Idaho National Laboratory demonstration of a SCADA attack was aired on CNN, showing how a software bug could be exploited to destroy a power generator.

In the video, the turbine was gradually worn out and left shuddering and smoking. Sources familiar with the hack say this was done by turning the generator off and on while it was out of phase with the power grid, putting excessive stress on the turbine and causing its components to wear out.

It's not clear how much damage the attack on the authority's SCADA system could have caused, but in 2000 a disgruntled former employee was able to access the SCADA system at Maroochy Water Services in Nambour, Australia, and spill raw sewage into waterways, hotel grounds and canals in the area. That man, Vitek Boden, was eventually sentenced to two years in prison.

Even if an attack were to knock the TCAA's SCADA system offline, the canals could continue to operate, said Robin Taylor, assistant U.S. attorney with the U.S. Department of Justice, which is prosecuting the Keehn case. "When the computer doesn't work, they have to go to manual operation," she said.

The intrusion cost the TCAA more than $5,000 in damages, Taylor said.


Reprinted with permission from


Story copyright 2006 International Data Group. All rights reserved.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:15 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Cancer research gets boost from World Community Grid project

Researchers to accomplish 162 years of research in one to two years
 

November 06, 2007 (Computerworld) -- Harnessing the power of more than 795,000 computers around the world, a new research project that will analyze human proteins in the fight against cancer begins today using the World Community Grid, which was built and is maintained by IBM.

By using the combined computing power of the grid, the Help Conquer Cancer project will allow cancer researchers to drastically shorten the amount of time it would take to analyze 90 million images of crystallized proteins, from 162 years using existing computing systems to between one to two years using the harnessed power of the grid.

"Even with the largest computers we have, it would not be possible to finish this task," said Igor Jurisica, who leads the research team at the Ontario Cancer Institute in Canada, where the work is being done. Also participating in the work are scientists at Princess Margaret Hospital and the University Health Network.

The researchers will analyze the results of experiments on proteins using data collected by other scientists at the Hauptman-Woodward Medical Research Institute in Buffalo, N.Y.

The World Community Grid was created by IBM about three years ago as a way to harness unused global computing power to help solve a variety of health and scientific issues. The project calls on home and corporate PC users to register with the grid, then download and install a small software program that allows their unused computer cycles to work on critical scientific research.

Robin Willner, vice president of global community initiatives at IBM, said the total number of grid participants so far is about 795,000 around the world and grows daily. The combined computer power so far would create a supercomputer that would be the fifth most powerful in the world if it were in one place, she said. The grid uses participants' computers when the systems are idle.

The results of the research will go into the public domain and will be used by cancer researchers around the world, she said.

Three levels of security are part of the grid system and security audits are done constantly, Willner said.

By using the grid to better understand the structure of human proteins, researchers are trying to understand disease-related proteins and how they function, Jurisica said.

Once the 90 million images of some 9,400 different proteins are analyzed, data mining techniques will be used to go through the results, he said. Previous experiments have looked at smaller groups of samples because the means didn't exist to analyze them all, he said.

"This will be important for future research," Jurisica said. "Hopefully, it will shed light on the principles or mechanisms of the proteins."

"We know that most cancers are caused by defective proteins in our bodies, but we need to better understand the specific function of those proteins and how they interact in the body," he said. "We also have to find proteins that will enable us to diagnose cancer earlier, before symptoms appear, to have the best chance of treating the disease -- or potentially stopping it completely."

Eight other projects have been run so far on the World Community Grid, including protein folding and FightAIDS@Home, which completed five years of HIV/AIDS research in six months. Additional projects are also being scheduled.


 
 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:14 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Sun releases T2, its multicore processor sequel

Vendor targets virtualization apps for three new systems based on the chip
 

October 09, 2007 (Computerworld) -- Sun Microsystems Inc. today released its next generation of multicore chip technology, the Niagara 2 processor, which it says more than doubles the performance of its predecessor chip. Sun also disclosed that the next version of the chip, the 16-core Rock processor, will ship next year.
 
The UltraSparc T2, which is shipping in rack-mounted and blade server models, doubles the threads on an eight-core chip to 64, Sun said.

John Fowler, Sun's executive vice president of systems, said the latest release is part of a Sun effort to "move to systems that are designed for very high core and thread count." He also noted that the new system is ideally suited for virtualization, a direction that will envelop "basically our entire product line over time."

Sun described the T2 as an attractive virtualization platform, with logical partitions or LDoms as Sun calls them, that can support up to 64 copies of Solaris.

Fowler also noted that the development team has also married cryptographic security technology directly on the chip instead of having it on a separate card, which helps boost performance.

The new chip also offers improved floating-point capability, and it consumes 15% to 20% more power than the predecessor T1 processor, he said.

The initial product release will include a blade server, the T6320, which is priced from $9,995, and two rack systems T5120 and T5220, which start at $13,995.

Nathan Brookwood, an analyst at research firm Insight 64 in Saratoga, Calif., said virtualization capabilities included with the system as well as its performance per watt, will appeal to users. He believes the new systems "will be a compelling story" to Solaris users, and to companies running Linux with applications that have a Solaris equivalent.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:11 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Acquisition to provide Sun with Lustre file system

Set to purchase assets of Cluster File Systems for undisclosed sum

 (Computerworld) -- Sun Microsystems Inc. Wednesday agreed to purchase most of Cluster File Systems Inc.'s business assets and intellectual property, including the Lustre file system, an open-source software distribution tool.

Terms of the deal, expected to close on Oct. 1, were not disclosed.

In a statement, Sun said that it plans to port the Lustre file system to Solaris and to step up efforts to augment Lustre on the Linux-based systems of multiple vendors. When contacted, Sun officials refused to elaborate on their plans for the technology.

Sun and Cluster File Systems in July had agreed to jointly integrate Lustre and the OpenSolaris ZFS file system.

The Lustre file system is typically used to power large-scale server applications running in high-performance computing environments, because of its ability to support massive amounts of storage capacity and server clusters without severe performance impact.

The acquisition comes amid questions surrounding Sun's legal ownership of the ZFS, which emerged last week when Network Appliance Inc. contended in a lawsuit that the technology infringes on patents it owns. The lawsuit was filed last week in federal court in Lufkin, Texas.

Earlier this year, Sun donated its ZFS code to the open-source community. That effort prompted analysts to fear that the Network Appliance lawsuit could have a far-reaching effect -- potentially adverse -- on the future of open-source technology.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:10 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Power management middleware bows for server grids

 (Network World) -- Grid middleware vendor Appistry Inc. Monday launched a software module that automatically powers down servers when they are not needed by applications, thus saving on energy consumption.

The company's Enterprise Application Fabric (EAF) virtualizes applications enabled with Appistry middleware across x86 servers. The new EnergySaver module lets administrators define policies that establish acceptable workload levels and turn off computers when application use is low. When additional capacity is required, EnergySaver policies reactivate the servers.

EAF is used best in power-hungry, transaction-intensive environments. Because applications are decoupled from the grid of servers on which they run, energy can be saved by powering off servers when they are not needed. Additionally, EAF contains load-balancing and workload management features. The software provides high availability by replicating the state of a request to multiple places, so if a machine goes down, the request can be executed on another machine in the grid.

One customer, GeoEye Inc. in St. Louis, is getting ready to deploy EnergySaver. GeoEye collects satellite imagery for the Department of Defense and other customers. Ray Helmering, vice president of product engineering at GeoEye, said that with EnergySaver, he can set policies to shut down servers when the output of the satellites varies because of geographical position or weather conditions.

"We have variations in our processing schedule depending on the operations of our satellites," he said. "As imagery comes in, we need the processing power, but as there are slower times, we'll be able to save on energy. We don't know the actual impact yet of energy savings, but initial review says that this feature could be very important to us."

GeoEye develops its imaging application in-house and grid-enables it with an Appistry wrapper that allows its operations to be parallelized across the grid. This application requires huge amounts of computations and a large number of processors to run. Helmering's Appistry implementation, for instance, requires 50 dual-core x86 servers.

Analysts are encouraged with Appistry's efforts to consume less power in the data center. "The principle that Appistry is addressing is going to be really important," said Simon Mingay, an analyst at Gartner Inc. in Egham, England "Most data centers have the opportunity to alter the power status of the storage and servers in their infrastructure when that capacity is not required. In data centers, you run everything 24/7 and everyone is incented to keep things that way, which in a world where energy costs are not important, is perfectly fine. In a more energy-conscious world, that becomes more questionable."

Mingay said that many organizations have approached the idea of energy consumption by using job-scheduling software, such as Sun Microsystems Inc.'s N1 Grid Engine or CA Inc.'s Unicenter Autosys Job Manager, which allows applications to run when conditions are optimal for them.

The downside of EnergySaver, according to Mingay, is that it has to be deployed on Appistry-enabled applications. "We are going to see more of this technology, but right now applications need to be modified to work in the Appistry environment. That renders it generally unapplicable."

Appistry was founded in 2001 and is focused on data-intensive intelligence agencies, oil and gas and logistics organization.


Reprinted with permission from

For more information about enterprise networking, go to NetworkWorld.com
Story copyright 2006 Network World, Inc. All rights reserved.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:9 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

IBM targets health care market with grid computing

 (Network World) -- Grid middleware vendor Appistry Inc. Monday launched a software module that automatically powers down servers when they are not needed by applications, thus saving on energy consumption.

The company's Enterprise Application Fabric (EAF) virtualizes applications enabled with Appistry middleware across x86 servers. The new EnergySaver module lets administrators define policies that establish acceptable workload levels and turn off computers when application use is low. When additional capacity is required, EnergySaver policies reactivate the servers.

EAF is used best in power-hungry, transaction-intensive environments. Because applications are decoupled from the grid of servers on which they run, energy can be saved by powering off servers when they are not needed. Additionally, EAF contains load-balancing and workload management features. The software provides high availability by replicating the state of a request to multiple places, so if a machine goes down, the request can be executed on another machine in the grid.

One customer, GeoEye Inc. in St. Louis, is getting ready to deploy EnergySaver. GeoEye collects satellite imagery for the Department of Defense and other customers. Ray Helmering, vice president of product engineering at GeoEye, said that with EnergySaver, he can set policies to shut down servers when the output of the satellites varies because of geographical position or weather conditions.

"We have variations in our processing schedule depending on the operations of our satellites," he said. "As imagery comes in, we need the processing power, but as there are slower times, we'll be able to save on energy. We don't know the actual impact yet of energy savings, but initial review says that this feature could be very important to us."

GeoEye develops its imaging application in-house and grid-enables it with an Appistry wrapper that allows its operations to be parallelized across the grid. This application requires huge amounts of computations and a large number of processors to run. Helmering's Appistry implementation, for instance, requires 50 dual-core x86 servers.

Analysts are encouraged with Appistry's efforts to consume less power in the data center. "The principle that Appistry is addressing is going to be really important," said Simon Mingay, an analyst at Gartner Inc. in Egham, England "Most data centers have the opportunity to alter the power status of the storage and servers in their infrastructure when that capacity is not required. In data centers, you run everything 24/7 and everyone is incented to keep things that way, which in a world where energy costs are not important, is perfectly fine. In a more energy-conscious world, that becomes more questionable."

Mingay said that many organizations have approached the idea of energy consumption by using job-scheduling software, such as Sun Microsystems Inc.'s N1 Grid Engine or CA Inc.'s Unicenter Autosys Job Manager, which allows applications to run when conditions are optimal for them.

The downside of EnergySaver, according to Mingay, is that it has to be deployed on Appistry-enabled applications. "We are going to see more of this technology, but right now applications need to be modified to work in the Appistry environment. That renders it generally unapplicable."

Appistry was founded in 2001 and is focused on data-intensive intelligence agencies, oil and gas and logistics organization.


Reprinted with permission from

For more information about enterprise networking, go to NetworkWorld.com
Story copyright 2006 Network World, Inc. All rights reserved.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:8 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Sun's grid computing service goes global

(IDG News Service) -- Sun Microsystems Inc. is expanding its Network.com utility computing service from the U.S. to 23 countries in Europe and Asia, the company said Thursday.

The utility computing service, in which customers pay an hourly rate for access to a Sun data center, began as a U.S.-only pilot in March but is now ready for a large geographic expansion, said Rohit Valia, group product manager for the Sun Grid Compute Utility.

Sun charges $1 per CPU per hour to access a network of Sun x64 hardware running the Solaris 10 operating system. End users can now access the utility from Australia, Austria, Belgium, Canada, China, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, India, Ireland, Italy, Japan, New Zealand, Poland, Portugal, Singapore, Spain, Sweden and the U.K.

IBM, Hewlett-Packard Co. and other computer vendors provide similar services. Utility computing, also called on-demand computing or, more informally, computing "in the cloud," is for organizations that have a short-term need for extra computing capacity but don't want to incur the expense of adding onto their own data centers. By taking advantage of utility computing services, they only have to build out their own IT infrastructures to handle an average level of usage, not the occasional peak usage, said Valia.

"Our business model is around charging for CPU cycles, not idle CPUs. We only charge when your CPU is actually processing data," he said.

Sun is also adding a feature called Network.com Internet Access that enables customers to interact, through Sun's utility data center and the Internet, with other companies that have resources the customer might want to use for a particular project. The company will also offer a limited beta program for developers called Job Management Application Programming Interfaces. This offering allows users to perform production-scale tests when they're building software applications using Network.com.


Reprinted with permission from


Story copyright 2006 International Data Group. All rights reserved.

 |+| نوشته شده در  جمعه بیست و ششم بهمن 1386ساعت 4:6 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

The Gridbus Middleware

Download the middleware now!

The Gridbus Project is engaged in the design and development of grid middleware technologies to support eScience and eBusiness applications. These include visual Grid application development tools for rapid creation of distributed applications, competitive economy-based Grid scheduler, cooperative economybased cluster scheduler, Web-services based Grid market directory (GMD), Grid accounting services, Gridscape for creation of dynamic and interactive testbed portals, G-monitor portal for web-based management of Grid applications execution, and the widely used GridSim toolkit for performance evaluation. Recently, the Gridbus Project has developed a Windows/.NET-based desktop clustering software and Grid job web services to support the integration of both Windows and Unix-class resources for Grid computing. A layered architecture for realisation of low-level and high-level Grid technologies is shown in the figure below. Some of the Gridbus technologies discussed below have been developed by making use of Web Services technologies and services provided by low-level Grid middleware, particularly Globus Toolkit and Alchemi. A summary and status of various Gridbus technologies is listed below.

For more information please have a look at the:
Flash Demos: Demos
Manual: [PDF Version] [Word version]
Flyer: [PDF Version] [Word version]

The Gridbus Project

منبع : http://www.gridbus.org

 

 |+| نوشته شده در  پنجشنبه بیست و پنجم بهمن 1386ساعت 8:40 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

The Hot Trend for 2008

abacus_000002770572XSmall.jpgFortune 400 businesses, and many smaller ones, run clusters in many different locations around the world.
As facility, management and other costs continue to become larger and large shares of corporate IT budgets, networking costs continue to fall. The result is that data center consolidation becomes a more reasonable goal.

I've seen this in a few of my customers. Beginning last year people started looking more toward grid technology to help them manage this. As the economy has tightened more people have considered this. Particularly as part of a plan toward cost reduction by moving to open source tools.

The general pattern is that the IT group decides they need to find ways to more effectively manage large and disconnected sets of resources. They turn to grid computing to help them manage that cloud and in the process realize that they have a lot of special purpose machines that are being quite underutilized and that they have enormous duplication of effort in the management of those data centers.

As we've entered into a bear market, many companies are taking a second look at their IT costs and looking for ways to tighten their belts. The combination of open source and grid/cloud computing models offers the ability to do that with open source offering a lower cost software acquisition model and grid computing allowing reduction in IT staff through centralization.

I've also been working with folks on the lost art of environment management. But more on that in a future blog...

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و پنجم بهمن 1386ساعت 8:36 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

How to Build Utility Computing Infrastructures with Globus

This is a guest post by Ignacio Martín Llorente, Professor of Distributed Systems Architectures at Universidad Complutense de Madrid.

While research institutions are interested in Partner Grids that provide access to a higher computing performance to satisfy peak demands and support to face collaborative projects; enterprises understand grid computing as a way to address the changing service needs in an organization. They are interested in in-house resource sharing, to achieve a better return from their information technology investment, supplemented by outsourced resources, to satisfy peak or unusual demands. An Outsourced/Utility Grid would provide pay-per-use computational power when Enterprise Grid resources are overloaded. Such hierarchical grid organization may be extended recursively to federate a higher number of Partner or Outsourced Grid infrastructures with consumer/provider relationships. This would allow supplying resources on demand, making resource provision more agile and adaptive. It would offer, therefore, access to a potentially unlimited computational capacity, causing IT costs to transform from fixed to variable.


fetch.png

In the context of the GridWay project we have developed a Grid Gateway that exposes a WSRF interface to a metascheduling instance, so enabling the creation of hierarchical grid structures. GridGateWay consists of a set of Globus services hosting a GridWay Metascheduler, thus providing a uniform, standard interface for the secure and reliable submission, monitoring and control of jobs. Most functionality is provided through GRAM (Grid Resource Allocation and Management), while scheduling information is provided through MDS (Monitoring and Discovery Service). The security requirement at the user level is addressed by GSI (Globus Security Infrastructure).

The new technology allows different layers of metaschedulers to be arranged in a hierarchical structure. In this arrangement, each target grid is handled as another resource, that is, the underlying grid is characterized as a single resource in the source grid, by means of grid gateways. This strategy encourages companies to federate their grids in order to have a better return of IT investment, and also satisfy peak demands of computation. Furthermore, this solution allows for gradual deployment (from fully in-house to fully outsourced), in order to deal with the obstacles for grid technology adoption, such as enterprise scepticism and IT staff resistance.

This approach also provides the components required for interoperability between existing Grid infrastructures. It is clear that we can’t wait for a single global grid to arise or to become predominant. Instead, we should work to build a seamless integration of the existing grids, which may eventually constitute the ultimate, capital-letter Grid, Grid of grids, or InterGrid, in the same way that the Internet was born. Grid interoperability can be achieved by means of common, ideally standard, grid interfaces, whose existence is an important (if not essential) characteristic of grid systems. Unfortunately, common interfaces (and even less standard ones) are not always available for given services. Then, the use of grid adapters and gateways becomes necessary. In particular, an interoperability solution based on grid gateways provides the infrastructures with significant benefits in terms of autonomy, scalability, deployment and security.

Well, what are you waiting for?, components are open-source, license is Apache v2.0, and we are willing to collaborate with you.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  پنجشنبه بیست و پنجم بهمن 1386ساعت 8:33 قبل از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

 نحوه نصب  Globus (قسمت اول )

 

برای نصب Globusموارد زیر را باید در نظر بگیریم

نسبت به نرم افزاری که از Globus دانلود کردیم باید linux مربوط به آن را نصب کرده

حال برای پیاده سازی باید ورژن جاوا در linux و gcc آن را چک کنیم که اگر آن ورژن ها را ساپورت نمی کرد آنها را نصب کنیم ورژن جاوا باید 1.6 باشد

 

برای چک کردن ورژن جاوا دستور زیر را در ترمینال تایپ می کنیم

 

Java -version

و برای update  ورژن جاوا مراحل زیر را انجام می دهیم

 

با این دستور java 1.6   از zip خارج می شود    

                                   

  (اسم فایل (java    tar xzvf

 

نکته: می توان برای سریعتر نوشتن اسم فایل حرف اول را نوشته و بعد Tabرا بزنید با این کار سریع بقیه اسم فایل را به صورت اتوماتیک می آورد

 

بعد محتویات java 1.6  را در شاخه HOME\USER  که رفته ایم OverWrite      می کنیم

به این ترتیب ورژن جاوا update  می شود

 

بعد ورژن gcc   را چک میکنیم با دستور زیر

 

gcc –v

باید ورژن gcc   4.1 نباشد چون باگ دارد می توان از ورژنهای

  3.2. 3.2.1 و  2.95.x استفاده کرد

و gccرا نمی توان  updateکردو نسخه ایی از linuxکه این ورژن را دارد نصب می کنیم

 

نرم افزار Tomcatرا هم باید نصب کرد ولی در زمان کامپایل به آن نیاز نداریم و در زمان Runtime  به آن نیاز داریم

اگراز لینوکس suse  استفاده میکنید به هیچ نرم افزار جانبی احتیاجی نداریم

با دستور زیر یک user به نام Globus  درست می کنیم

root# useradd globus

 

و از شاخه system \group and user.. می توان user  مورد نظر را ساخت

 

ودر شاخه usr/local/globus   محتویاتی که دانلود کردیم از  globusکپی می کنیم و در آن شاخهf4  میزنیم و ترمینال باز میشود و دستورات زیر را تایپ می کنیم

 

نسبت به نام فولدری که در شاخه usr/local است نام آخرین فولدر را انتخاب می کنیم اگر فولدری که در این مسیر بود usr/local/globus-4.0.1  بود

این پیغام را باید تایپ کنیم

 

# mkdir /usr/local/globus-4.0.1
# chown globus:globus /usr/local/globus-4.0.1

حالا از user root خارج می شویم و به user Globus می رویم با سویچ کردن

و دستورات زیر را در کنسول تایپ می کنیم

 

معنی globus$ این است که از مسیری که هستیم این دستور را اجرا کنیم

و مثلا برای اجرای ./configure نباید اولش globus$ را تایپ کنیم و بعد از آن را. در اینجا  globus$یعنی در مسیر usr/local/globus-4.0.1 باشیم و

./configure --prefix=$GLOBUS_LOCATION را تایپ کنیم

 

globus$ export GLOBUS_LOCATION=/usr/local/globus-4.0.1
globus$ ./configure --prefix=$GLOBUS_LOCATION

 

اگر در root بودیم و ./configure می کردیم error می داد.

ولی حالا باید این پیغام را بدهد

1.     Optional Features:
2.       --enable-prewsmds       Build pre-webservices mds. Default is disabled.
3.       --enable-wsgram-condor  Build GRAM Condor scheduler interface. Default is disabled.
4.       --enable-wsgram-lsf     Build GRAM LSF scheduler interface. Default is disabled.
5.       --enable-wsgram-pbs     Build GRAM PBS scheduler interface. Default is disabled.
6.       --enable-i18n           Enable internationalization. Default is disabled.
7.       --enable-drs            Enable Data Replication Service. Default is disabled.
8.       [...]
9.     Optional Packages:
10.    [...]
11.    --with-iodbc=dir        Use the iodbc library in dir/lib/libiodbc.so.
12.                            Required for RLS builds.
13.    --with-gsiopensshargs="args"
14.                            Arguments to pass to the build of GSI-OpenSSH, like
15.                            --with-tcp-wrappers

 

 

 

در مرحله چهارم

 

     globus$ make

 

نکته :اگر شما یک log file  بخواهید  داشته باشید باید تایپ کنید

 

globus$ make 2>&1 | tee build.log

 

در مرحله پنجم و آخر

 

globus$ make install

 

در این مرحله کامل شده است Install و حالا شما باید پیکربندی کنید قسمتهایی که در زیر شرح داده شده است

 

توصیه میکنیم که Install  کنید هر security

حالا شما مراحل security را طبق این step ها باید نصب کنید

که در بر می گیرد   به دست آوردن host certificates و user certificates  و ساختن

grid-mapfile  که در صفحات بعدی به آن اشاره می شود

با security setup شما میتوانید شروع کنید سرور GridFTP

پیکربندی DB برای RFT  و پیکربندی  WS-GRAM

و شما همچنین میتوانید  شروع کنید یک GSI-OpenSSH daemon

و setup کنید یک سرور MyProxy و اجرا کنید RLS  و استفاده کنید CAS

 

 |+| نوشته شده در  چهارشنبه بیست و چهارم بهمن 1386ساعت 6:51 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

IT  هر روز لينوكس را بيشتر به كار می‌برد و گزينه‌های grid نيز با استفاده هر چه بيشتر مواجهند

نويسنده: Carol Sliwa
Computer World
مترجم: زهره چكنی

فرامينگهام- آن دسته از كاربران اداری كه تاكنون در مورد استفاده از Open-Source مردد بوده‌اند هفته آينده فرصت دارند كه گزينه‌های آماده‌ای را از سوی سازندگان مطرح دنيا پيش رو داشته باشند، كه تلاش دارند خريد، استفاده و مديريت از سيستم‌های مبتنی بر لينوكس را سهولت بيشتری بخشند.

شركت‌های دل، هيوليت‌پاكارد و آی‌بی‌ام از جمله سازندگان فراوانی هستند كه از كنفرانس و نمايشگاه Linux World Conference & Expo در سانفرانسيسكو بهره جسته‌اند تا خدمات و محصولات خود را معرفی كنند، خدمات و محصولاتی كه برای راحتی هر چه بيشتر كاربران در انتخاب لينوكس و ساير محصولات نرم‌افزاری منبع باز طراحی شده‌اند.

به عنوان مثال، شركت دل قصد دارد پردازشگرهای اينتل دو هسته‌ای در سرورهای 850 و 830 محصول Power Edge را معرفی كرده و به مشتريان فرصت دهد گزينه مجموعه نرم‌افزار منبع باز و سخت‌افزار را در يك جا تجربه كنند.

كاربران می‌توانند ردهت يا SuSE را به علاوه ديتابيس MySQL و سرور برنامه JBoss داشته باشند. به علاوه آنها می‌توانند آبونمان پشتيبانی برای MySQL Network و JBoss Network را مستقيما از دل خريداری كنند.

Judy Charis مدير بخش توسعه تجاری و اعتلاف جهانی برای لينوكس و منبع باز در شركت دل می‌گويد، هدف كمك به كاربران برای استفاده و اجرای سريع با سيستم آزمايش شده و پشتيبانی شده‌ای است كه مثل سرور ويندوز بيرون از جعبه كار می‌كند.

 

تطبيق آسانتر

قابليت دسترسی به محصولات يك جا شده برای بسياری از كاربران اوليه لينوكس چندان امر مهمی محسوب نمی‌شد، زيرا آنها خود مهارت‌های لازمه خانگی برای پيكربندی و نصب سيستم را داشتند.

Joseph Foran، مدير IT در FSW Inc واقع در بريج پورت ايالت كنتاكی می‌گويد برای اين شركت خدمات رسانی غيرانتفاعی، نصب لينوكس و بقيه به اصطلاح رده LAMP كه خود شامل سرور MySQL ، Apache web و Perl، PHP يا زبان برنامه نويسی Python می‌شد را هرگز مسئله جدی نمی‌دانسته است. يك استك LAMP پيشرفته كه دارای يك سرور برنامه پيكربندی شده با برنامه‌های تجاری ممكن است بسيار مفيد باشد. اما اگر و تنها اگر شما مهارت لازمه را داشته باشيد، در غير اينصورت به درد نمی‌خورد.

به هر حال با سرعت گرفتن لينوكس در جريان استفاده IT، بيشتر شركت‌ها بالاخره به سازندگانی رو می‌آورند كه استفاده از تكنولوژی مذكور را تسهيل می‌كنند.

Dankusentcky از تحليلگران در فرامينگهام ماساچوست می‌گويد ، نبرد نرم‌افزار برنامه لازم و نبود مهارت كافی در سايت‌های مشتريان مانع اصلی و بزرگ بر سر راه اتخاذ لينوكس بوده است.

HP با افتتاح چهار مركز Linux Expertise در ايالات متحده برای سازندگان نرم‌افزار، برنامه نويسان و ادغام كننده‌های سيستم‌ها استفاده از نرم‌افزار منبع باز را ترويج كرده است و به اين وسيله توانسته هماهنگی محصولات اين افراد را با سخت‌افزار خود مطمئن‌تر سازد.

HP قصد دارد بيش از 200 بسته نرم‌افزاری منبع باز را برای سرورهای Integrity NonStop عرضه كند.

آی‌بی‌ام با عرضه بسته "Grid and Grow" كه شامل يكی از انتخاب‌های سرور Blade Center با يك شاسی آماده گسترش، يك سيستم عامل، ميان افزار grid و خدمات می‌باشد، تلاش می‌كند كاربران بيشتری را به استفاده از محاسبات grid تشويق كند. قيمت اين پكيج 49000 دلار است.

Al Bunshaft قائم مقام بخش Grid Computing در آی‌بی‌ام می‌گويد بيش از دو سوم تستهای grid كه شركت در آن دخالت داشته است مبتنی بر لينوكس بوده‌اند. او می‌گويد grid حال و هوای پيچيده‌ای دارد و می‌خواهيم پيچيدگی را از آن دور كنيم.

Torsten Geers قائم مقام SAP می‌گويد، يكی از علائمی كه نشان می‌دهد سازندگان نرم‌افزاری تلاش به جلب توجه هر چه بيشتر به سمت پيشينيانشان از لينوكس دارند در نمايشگاه Linux World از سالن SAP AG به چشم می‌خورد، كه در آن تلاش می‌شد كاربران هر چه بيشتر با برنامه‌هايی آشنا شوند كه در اين سيستم عامل عمل می‌كنند. درصد كاربران SAP ‌با لينوكس كم ولی به سرعت در حال رشد است.

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 7:7 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Why You Should Never Rip and Replace

construction_000004081191XSmall.jpg
At Univa UD we deal with a variety of different customers. These folks are trying to solve business problems in a semiconductor, life science, financial services, big science and lots of other sectors. What they have in common is that they have existing infrastructure and are hyper-concerned about business disruption while moving in a new direction. They should be.

There are a lot of ways to approach grid computing that require you to replace what you have with something new. This is particularly the case for vendors of proprietary tools. These tools are built on proprietary protocols that make it difficult to integrate other services or applications. Combine these two issues and it can be tough to get anything bigger than a cluster up and running. If you already have a cluster, or more, up and running, this disruption will have a real impact on your ability to accomplish your goals.

To borrow an old saying, you want your approach to be evolutionary rather than revolutionary. This means moving in a new direction using a phased delivery that allows existing work or research to continue without interruption.

With Globus, this is achieved by creating an additional layer atop existing resources. A common security platform is built on local security layers. A common job submission mechanism replaces product specific ones. A monitoring system that can aggregate information from multiple sources replaces those that only report data from their specific resource.

With these steps in place, new users, applications and clusters can be provisioned in ways that allow flexible cluster usage, better aggregate throughput and higher cluster utilization rates. Then, as time permits, existing applications -- and particularly scripts and workflows -- can be ported from their existing platform to interfaces that will allow them to utilize all the bandwidth available in the organization. Dig?

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:44 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Why the grid is still important

Grid computing is celebrating 11 years next month, and is poised to become increasingly mainstream in the coming years.  There are a number of reasons that this is true, and most of them are the time tested ideas that have been proving themselves in your research institutions and businesses for years.  The grid is about allowing your organization to run more efficiently and more effectively than can be done with more conventional technology solutions.  It's about bringing many machines together in coordination around a task.  It's about bringing data storage and movement to bear in a coordinated fashion with your application.  It's about allowing people from different parts of your organization to work together more easily.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:44 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Why Are 92% Of Users Waiting for I/O?

stoplight_000004020985XSmall.jpgIDC recently published a finding that 92% of users have applications that are I/O constrained. That's a shockingly large number given the options that exist for reducing this pain. Let's break this down into three major categories:

  • Working storage
  • Near-line storage
  • Long-term storage

The nature of each of these domains is very different and the options available to reduce the problems are similarly different.

Working Storage

Administrators unfamiliar with the demands of certain classes of applications will sometimes mount their scratch disk as (oh, the horror!) NFS. Over the past five years the average knowledge level has crept up as the community has grown and gained experience, but I have seen in the last couple months that there are still clusters out there than are doing significant I/O to slow network scratch.

Near-line Storage

Typically consisting of disk, NAS or SAN devices people have a tendency to not buy enough bandwidth (either in the form of network I/O or aggregate disk I/O) or view the purchase of these facilities as one time expenses, failing to keep up with their users expanding demand as time progresses.

Long-term Storage

Migrating data to tape is the only game in town for long term, high capacity storage (so far the decade old promise of using disk for long term storage still seems to be a decade away). The problem is that with drives and automated libraries costing enormous amounts of money, coupled with the latencies inherent in this type of storage, applications are left sitting tapping their feet waiting for data to stream in.

The Solution

The services in the grid must be programmed to be aware of the data they require. An early example of this is the DDM system in incubation in dev.globus. This system knows what data is located in different resources on the grid and can thus be integrated with workflow and scheduling systems to pre-stage data to working storage before the application is started. This completely eliminates two of the three I/O constraints, and the last one is the easiest and cheapest. Just stop mounting your scratch on NFS...

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:42 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Stop Wasting Time

wasting-time_000004044762XSmall.jpgSeth wrote:

The traffic engineers in New York think nothing of wasting two minutes of each person's time as they approach a gated toll booth. Multiply that two minutes times 12,000 people and it's a lot of hours every day, isn't it?

The truth in this is obvious, and it applies to the grid also. I've talked with folks that has hundreds or engineers each spending a third of their time managing jobs and data on their clusters. That's a lot of time that could be spent advancing their business wasted. Even if just on an expense basis, that's $3M in labor costs. And that's on the low end.

There were significant wins in moving from SMP boxes to clusters. There were significant wins in moving from clusters to grids. Now it's time to realize the next win by managing your grid effectively.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:41 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Warning: Don't Patronize Your Users

patronize_000003227641XSmall.jpgOne of my favorite quotes is from E.B. White:

No one can write decently who is distrustful of the reader's intelligence, or whose attitude is patronizing

Pawel Plaszczak and I certainly took this sort of goal seriously when we wrote our Savvy Manager's Guide. You should take it seriously when you design your grid.

The single biggest mistake people make is to not trust their users to provide reasonable requirements. Designers and architects go out and talk to users, then write-off the feedback they get as being general guidance, rather than hard requirements.

Google, as an example, took their users seriously from day one. They could have created yet another site so littered with ads that it was unreadable, but instead created a user experience that is now the subject of design classes. You can do the same. Talk to your users. Spend a day understanding how they interact with their system. Get a bit deeper into the business issues that justify the IT expenses that feed your children and pay your mortgage.

Take your users seriously, feel their pain and be their hero.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:40 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Three Reasons Why High Utilization Rates Are the Wrong Thing to Measure

abacus_000002770572XSmall.jpgAttending the working sessions at various conferences I hear a theme over and over again, "how can grid computing help us meet our goal of 80% utilization"? People post graphs showing how they went from 20% utilization to 50% and finally 80%. People celebrate achievement of this number as an axiom. The 80% utilized cluster is the well managed cluster. This is the wrong goal.

The way to illustrate this is to ask how 80% utilization brings a new drug to market more quickly? How does 80% create a new chip? How does 80% get financial results or insurance calculations done more quickly?

Of course, it does none of those things. 80% isn't even a measure of IT efficiency, though most people use it as such. It's only a statistic that deals with a cluster itself. It is, however, measurable, so it's easy to stand up as an objective that the organization can meet. The question to ask is, does an 80% target actually hurt the business of the company?

That target has three problems:


     

  • It takes the focus off the business problem the clusters are solving
  • Most people choose the wrong target (80%, rather than 50%)
  • We would fire a CFO who only measured costs, we are we willing to only measure them here?

If your clusters are running at 80% that means that you have a lot of periods when work is being queued up and waiting. Think about the utilization pattern of your cluster. Almost every cluster out there is in one of two patterns. They are busy starting at nine in the morning when people start running work and the queue empties overnight. Or, they are busy starting at three in the afternoon when people have finished thinking about what they need to run overnight and the queue empties the next morning.

During the times when the queues are backed up, you are losing time. These jobs waiting represented people who are waiting, scientists who aren't making progress, portfolio analysts who are trailing the competition and semiconductor designers who are spending time managing workflow instead of designing new hardware.

For most businesses it's queue time and latency that matters more than utilization rates. Latency is the time that your most expensive resources, your scientists, designers, engineers, economists and other researchers are waiting for results from the system. Data centers are expensive. Don't get me wrong, I'm not arguing that it's time to start throwing money at clusters without consideration. It's just that understanding the way the business operates is critical to determining what the budget should be. Is the incremental cost of having another 100 or 1000 nodes really more than the cost of delaying the results that your business needs to remain viable?

Don't be willing to be the manager that measures what is convenient rather than what is valuable to the future of your business. Be 'savvy' in your approach. Find ways to understand the behavior of your drug discovery processes on your clusters, even if you are an IT guy instead of a computational chemist. Find ways to demonstrate how your approach of reducing cluster latency is turning up the heat on the next chip design. Find ways to measure what keeps your business around so that you can be part of the process of creating value instead of viewed by that CFO as nothing more than a cost center to be optimized away.

The message is that cost is only one part of the equation. Likely, it's even a minor part of the equation. Don't get yourself lost measuring the price of your stationary when it's the invoices you're putting in the envelopes that matters.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:39 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

The Grid and Hosting

Jeremy Sherwood from opus:interactive has a good write up of HostingCon 2007.

My experience with Grid Computing goes back to the late 1990s with distributed.net in helping making encryption that much secure. With the technology originally designed to harness unused CPU cycles to solve complex problems, to now being used to hosting an infinite number of hosting environments. It is amazing the level of reliability and scalability options that are available with the system. The ability to grow in resources at an unlimited rate -on the fly- with little to no exposure to change, is outstanding. The other great aspect of this system of technology is the ability to contribute to a sustainable mindset. If done properly, you can reuse old servers and hardware that in a normal life cycle would be recycled, now can be reprovisioned back into a production environment with little concern of impact of hardware failure. This rejuvenation of hardware opens up a great opportunity to get that-much-more out of your initial investment as well as being able to pass those saving onto the customer.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:38 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Optimizing Your GridFTP, part 1: Transfers Without I/O

disk_000000967564XSmall.jpg

I am not a GridFTP developer but I use GridFTP. A lot.

Often I find myself helping others optimize GridFTP transfers across networks and between machines about which I know little.When I sit down with an engineer or scientist trying to move files fast from one location to another the system administrators (fighting more important fires) and the network engineers (hiding from users) are often unavailable.

So it happens a lot that I have no idea as I begin trying to optimize a GridFTP transfer if the disk I/O for the machines willeven support faster transfers. Moving data at near wire speeds doesn't help if the disks can only read and write at half the wire speed.

One can find lots of good tools for measuring disk I/O but before I grab for those I like to try some GridFTP transfers without disk I/O on either end to get a feel for what role disk I/O might (or might not) be playing.

The globus-url-copy command client in the latest versions of the Globus Toolkit makes it easy to transfer some bits using GridFTP without any disk I/O. You simply have to "pull" or"read" data from the "file" /dev/zero and "write" data to the "file" /dev/null. The syntax is straight forward:

globus-url-copy -vb -p 4 gsiftp://one.machine.com/dev/zero file:/dev/null

Try using that syntax the next time you sit down to optimize a GridFTP transfer and you want to get a feel for the networkinfrastructure without being hindered by disk I/O on either end of the transfer.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:37 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Scripting Grid Engine Administrative Tasks Made Simple

baby-programmer_000004216652XSmall.jpgGrid Engine (GE) is becoming increasingly popular software for distributed resource management. Although it comes with a GUI that can be used for various administrative and configuration tasks, the fact that all of those tasks can be scripted is very appealing. The GE Scripting HOWTO document already contains a few examples to get one started, but I wanted to further illustrate the usefulness of this GE feature with a simple example of a utility that modifies shell start mode for all queues in the system:

#!/bin/sh
 
# Utility to modify shell start mode for all GE queues.
# Usage: modify_shell_start_mode.sh 
#  can be one of unix_behavior, posix_compliant or script_from_stdin
 
# Temporary config file.
tmpFile=/tmp/sge_q.$$
 
# Get new mode.
newMode=$1

# Modify all known queues.
for q in `qconf -sql`; do
# Prepare queue modification.
echo "Modifying queue: $q"
cmd=”qconf -sq $q | sed 's?shell_start_mode.*?shell_start_mode $newMode?' > $tmpFile”
eval $cmd

# Modify queue.
qconf -Mq $tmpFile

# Cleanup.
rm -f $tmpFile
done

Using the above script one can quickly modify the variable for all queues without having to go through the manual configuration steps.

The basic approach of 1) preparing new configuration file by modifying the current object configuration, and 2) reconfiguring GE using the prepared file, works for a wide variety of tasks. There are cases, however, in which the desired object does not exist and has to be added. Those cases can be handled by modifying the EDITOR environment variable and invoking the appropriate qconf command. For example, here is a simple script that creates set of new queues from the command line:

#!/bin/sh

# Utility to add new queues automatically.
# Usage: add_queue.sh

# Force non-interactive mode.
EDITOR=/bin/cat; export EDITOR

# Get new queue names.
newQueues=$@

# Add new queues.
for q in $newQueues; do
echo "Adding queue: $q"
qconf -aq $q
done

Utilities like the ones shown here get written once and usually quickly become indispensable tools for experienced GE administrators.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:35 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Better Know a VM: Part 1 of 435

robot-handshake_000003470462XSmall.jpg

Every day we wake up to a new barrage of virtualization articles.  I can't even read them all anymore, instead scanning headlines guided by statistical sampling (or is that stochastic?).

The hype is thick in the air, but it's not entirely unfounded. Somewhere in there we can see grid computing's going to be affected long term by OS virtualization in one way or another.

In this series we'll look at what's happening with various grid-VM efforts, often through a Globus lens (I work on the Globus Virtual Workspaces project so it's almost going to be impossible to avoid that).

There's a tradeoff between application performance improvements and developer time.  Developers are expensive, development is time consuming.  Perhaps it's worth waiting an extra few hours for results if it means you can start right now and stop paying those fine people.  Obviously any particular calculation is going to be more nuanced than this, but I just wanted to set up an analogy.

In a similar vein, with virtualization you can take your prepared application+environment and get going on a new platform in minutes, not months.  Cycles can be acquired and the exact compute environments can be provisioned out to the provider site's nodes.  Resource consumption can be quantified well by the site (and even enforced at a fine grain).  Less of the client's and site's administrators time (someone's money) needs to be spent on setup, environment conflicts, etc.

For all this you may take a small performance hit, but sometimes that's just worth it.

It sounds perfect, maybe.  It's not quite, and we will look at a few problems, many of which only look temporary.  A lot of progress is being made to get rid of the complexity, encapsulate it better, or factor it in such a way that the person/role who should be handling that complexity actually does (instead of it being unecessarily multiplied or divided across many people/roles).

Part 2?  I'd like to talk about coordinating many VMs to work together, something being called contextualization.  The fightin' Contextualization!

(Apologies to Stephen Colbert)

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:34 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Globus in Seattle Next Week

Next week is OGF21, where grid gurus from around the world assemble to discuss technologies, applications, standards, and how gray the weather is in Seattle.

We have organized a full day of Globus material on Wednesday October 17. We'll have overviews of old favorites such as GridFTP, RLS, OGSA-DAI and the GT4 distribution, as well as introductions to some of our many new Incubator projects: Shannon Hastings, OSU, discussing the service authoring tool Introduce, Steve Tuecke of UnivaUD discussing Data Catalyst, their open source higher level data solution, and Stephan Erberich who will overview the Internet2 IDEA Award-winning MEDICUS medical data tool, among others. Come hear about the latest updates and where Globus is going to next, and/or to talk to Globus architects and developers about things like:

  • Your applications and how you can apply Globus technologies

  • Problems or questions with Globus technologies

  • Your wish list for future Globus features

  • How to contribute your software to the dev.globus community

If you'd like to meet with someone from the Globus team in Seattle, please email us: we'll see you there!

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:33 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Virtual Grid Nodes: The Tension

handstand_000004002888XSmall.jpg

Lately I have been putting a lot of thought into the challenges that grid managers face in building an enterprise grid.  Primarily they must support the various stakeholders throughout the enterprise, each of whom has their own sets of application workflows used to meet their business needs. 

The software packages that each interested group uses may have a significant overlap with one another, but the similarity stops there.  Because each group ostensibly has a different goal, the usage patterns are almost guaranteed to be unique.  This implies that the community as a whole will demand any of the following:

  • A wide range of operating systems including Linux, Microsoft Windows, or any of the varied flavors of Unix;

  • Support for multiple versions of the same software package; and

  • A wide range of operating environments particularly with respect to memory, CPU performance, network usage, and storage.

When you consider users’ needs in more detail, you will recognize that a number of implications further complicate things:

  • The set of applications that users wish to run will likely run under a two or more different major OS revisions (e.g. Linux kernel 2.4 versus 2.6 or Windows XP versus Vista);

  • Similarly, there are applications that steadfastly refuse to run under a specific patch level.  For example, a minor revision of the Linux kernel that is lacking a specific security patch might be required.  You might be able to force the software to install but then the software is likely to no longer be supported;

  • Off-the-shelf installations which seek to upgrade rather than coexist with a previous version;

  • Custom software that expects a very specific behavior from a package that has changed in its most recent update;

  • Software which requires particular kernel tuning which is not appropriate for general operation; and

  • Software packages which have 32/64-bit library compatibility issues;

Meanwhile, grid managers will most likely be focused on providing a stable, secure, and easy to maintain infrastructure that is both cost-effective and capable of meeting the users’ core requirements.  Clearly the priorities between the individual groups and the support team will be at odds much of the time.

The most elegant solution to these issues is to build a grid whose execution environments are all virtualized.  In this situation, each usage pattern would have its own environment tailored to its own unique needs while the core OS would be under the complete control of the infrastructure staff.  Clearly there would be a stakeholder driven set of virtual servers available for use on each node in the grid. 

It seems simple enough: rather than creating a complicated infrastructure that will not accommodate all of the situations your users will require, you simply will give them their own isolated operating environments.  As you might expect, nothing is that straightforward.  The standard tools that you use for grid and virtualization management do not work well in this architecture.

In future posts, we will explore the challenges and possible solutions in detail. In particular we will focus on:

-    Networking
-    Virtual Server Management
-    Job Scheduling
-    Performance Monitoring
-    Security
-    Data Lifecycle

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:32 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Does your grid make Fords or Volvos?

volvo_000004230954XSmall.jpgAsk a user why they use a grid, a cluster, or any other type
of distributed system and you’ll hear, “Why, to get my work done faster, of
course.” But that’s an ambiguous statement at best, since it can mean two things:
faster runtimes or higher throughput. And although they might seem similar,
they’re really not.

Runtime is defined as the wallclock time it takes to complete one task. If you parallelize a task, for instance with MPI, or by taking advantage of the data splitting capabilities of Grid MP, you can get your job back in less time. If you can parallelize your job into 10 parallel sub-jobs and run it on 10 nodes, you can expect that job to complete on average in 1/10th of the time. Plus a bit of overhead of course, but let’s keep it simple for now.  In Volvo’s innovative Uddevalla plant, groups of workers assemble entire automobiles in less time than it takes for one worker to complete a whole car. So with 10 workers in a group, you could potentially make a car in 1/10th of the time.

However, sometimes your task cannot be parallelized any further, but you might have lots of them pending. Grids can still help since they can increase the throughput of your jobs. Queuing theory states that with 10 nodes and 10 jobs, you can still expect a unique job to complete on average in 1/10th of the runtime of a single job, without using any parallelism. In a traditional American automotive plant, the car advances on the assembly line and at no point more than one operator is working on one car, so there’s no parallelism involved. It might take up to a day before one car is completed from start to finish, but a new car rolls off the end of the line every few minutes.

So next time when a user brags about his fancy new cluster, ask him whether he’s producing Fords or Volvos.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:31 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Building Software Against Binary Globus Toolkit Releases

squarepeg_000004493197XSmall.jpg

Today I read about GridWay winning the “Best Demo Prize” at the EGEE 2007 Conference in Budapest (Congratulations to the GridWay Team!), and this reminded me about the problem of building applications against the binary Globus Toolkit (GT) releases. Namely, building software like GridWay against the binary GT install usually fails with link errors. The problem is that the .la files in the $GLOBUS_LOCATION/lib directory have hardcoded the original build path for the dependency libraries. This issue has been known for some time (see, e.g. GT bug #174), and it persists in the 4.0.x releases of the Toolkit. The easiest solution is to build and install your GT from sources. However, if this is not an option, one can use a script that modifies the hardcoded paths in the binary GT install (do not worry, the script does not modify binary files :-)):

#!/bin/sh

# fix_paths.sh

# Script for modifying hardcoded library dependency paths in the binary

# Globus Toolkit installation.

# Usage.

usage() {

  echo "Usage: $0 [oldPath] [newPath]"

}



oldPath=$1

newPath=$2


if [ $# -ne 2 ]; then

  usage

  exit 1

fi



if [ "$GLOBUS_LOCATION" = "" ]; then

  echo "\$GLOBUS_LOCATION is not defined."

  exit 1

fi

echo "Replacing $oldPath by $newPath in various ASCII files."

cd $GLOBUS_LOCATION

# Try to avoid header files, *.gar and *.jar files, config xml files, etc.

fileList=`find . -type f ! -name '*.h' -a ! -name '*.gar' -a ! -name '*.xml' -.

cnt=0

for f in $fileList; do

  isAscii=`file $f | grep ASCII`

  if [ "$isAscii" != "" ]; then

    cmd="cat $f | sed 's?$oldPath?$newPath?g' > $f.tmp"

    eval $cmd

    diffPath=`diff $f.tmp $f`

    if [ "$diffPath" != "" ]; then

      echo "Fixing: $f"

      mv $f.tmp $f

      cnt=`expr $cnt + 1`

    else

rm -f $f.tmp

    fi

  fi

done

echo "Fixed $cnt files."

exit 0

In order to use the above script, one has to determine the hardcoded paths by looking into one of the .la files in the $GLOBUS_LOCATION/lib directory. For example:

$ export GLOBUS_LOCATION=/scratch/veseli/devel/lib/

globus-4.0.5/$ cd

 

$GLOBUS_LOCATION/lib$ pwd/scratch/veseli/devel/lib/

globus-4.0.5/lib$ grep dependency_libs libxmlsec1_openssl_gcc32.ladependency_libs=' -L/home/condor/execute/dir_22100/userdir/install/

lib'$ ~/fix_paths.sh /home/condor/execute/

dir_22100/userdir/install/lib /scratch/

veseli/devel/lib/globus-4.0.5/libReplacing /home/condor/execute/dir_22100/userdir/

install/lib by /scratch/veseli/devel/lib

/globus-4.0.5/lib in various ASCII files.…Fixed

 330 files.$ grep dependency_libs libxmlsec1_openssl_gcc32.ladependency_libs=' -L/scratch/veseli/devel/lib/globus-4.0.5/lib'

 

Once you correct the library dependency paths using this script, you should be able to compile and link external software packages against your binary GT installation.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:21 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

No CPU Left Behind

playground_000002664739XSmall.jpg

For some time now, I've been really interested in the potential applications of grid computing in higher education and, possibly, in secondary education. So, I was really intrigued when I read about Google and IBM's computing cloud for students. Just looking at the headline, my first impression was that students anywhere would be able to have their own computing cloud to use as a playground for learning and experimentation. As it turns out, Google and IBM's computing cloud will be initially used by only five universities, with the goal of giving students a platform in which to learn about parallel programming and Internet-scale applications. Although still a very cool project, I thought this would be a good opportunity to share some ideas of how grid computing could end up benefiting education. Like fellow gridguru Tim Freeman, I'm a part of the Globus Virtual Workspaces project, so my ideas are biased towards how grid computing and workspaces could benefit education.

I have talked with many Computer Science and Engineering lecturers and professors at small colleges and universities who cannot teach certain courses for lack of computing resources. For example, while teaching an introductory programming course requires minimal computing resources (such as a computer lab), teaching a course on parallel programming or distributed systems may require more expensive resources. To get students to practice parallel programming in a somewhat realistic setting, you would like them to have access to a properly configured and maintained cluster. If, furthermore, you wanted to teach students how to set up a cluster, you would need a couple of clusters (ideally, one cluster per student) that the students could have unfettered access to.

There are two main issues with the above scenario. First of all, clusters aren't generally cheap, and some institutions can't afford one. Of course, you can easily build a cluster out of commodity hardware, but you also need someone to actually set it up and jiggle the handle whenever something goes awry. In one specific case, a department built a cluster with off-the-shelf PCs, and used it successfully... until the grad student charged with keeping the cluster running graduating. Apparently, that cluster has been sitting idly in a room for years now. Second, even if the institution can afford a cluster and a sysadmin, no sysadmin in his right mind is going to give root access to that cluster to undergrads, specially if that cluster is also used by researchers.

Enter virtual workspaces. In a nutshell, a virtual workspaces is an execution environment that you can dynamically and securely deploy on the grid with exactly the hardware and software you need. You need a 32-node dual CPU Linux cluster for a couple of hours to teach a parallel programming lab, with a very specific version of libfoobar installed on it? Just request a workspace for it, and that hardware will be allocated somewhere on the grid for you, and the software will be set up thanks to software contextualization, which Tim will discuss in his posts. There's no need for the institution to keep a cluster running 24/7, or even spend any time configuring a cluster (requiring a sysadmin, or burdening the lecturer or a grad student with this task). From a repository of ready-made workspaces, simply choose the one you want (or pay a one-time fee to have someone configure a workspace exactly the way you want it), deploy it on the grid ever Monday from 2pm to 4pm, and start teaching.

Unfortunately, we're not quite there yet, but virtual workspaces are being actively researched (yes, right now, even as you read this blog post!). Currently, virtual machines are the most promising vehicle to automagically stand up these custom execution environments on a grid. The Globus Virtual Workspaces Service, which uses the Xen VMM to instantiate workspaces, is still in a Technology Preview phase so, although you can still do a number of very cool things with it, you can't deploy arbitrary workspaces on arbitrary grids... yet. However, we're getting much closer, and in future blog posts I'll explain what progress we're making towards that goal.

When we do get there, I believe that workspaces stand to make really exciting contributions to Computer Science and Engineering education. Not only can they facilitate access to computational resources by underprivileged institutions, they can also enhance existing curriculums by enabling students to gain more practical experience than before (e.g., by giving each student their own cluster). In fact, workspaces will enable the creation of more complex "playgrounds", from virtual clusters to virtual grids, that students can use to learn and experiment.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:17 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Why Model Scheduling Policies?

IMG_3767.jpg

Modeling is a very effective means in which to accurately measure the advantages of one scheduling policy over another in specific environments. High level abstraction models can be developed rapidly in order to observe efficiency benefits. In this type of an environment the most meaningful measurements that you would observe are the queue wait time of the jobs that have been submitted to the system as well as expansion factors that are partially derived from queue wait times. Although utilization is another measurement to observe, in a fully loaded system, high utilization is an already known fact and squeezing efficiency out of the system is more important, this is done by reducing queue wait times.

A prerequisite to accurate modeling is retrieving accurate job accounting data for the past year or more. This data is good for a number of reasons but the following two are most important. First, a modeler does not have to develop a distribution dataset of what is thought of as an accurate job data flow. Secondly, the data that is used is accurate as to job submission and run times, priority, and resources utilized. Expansion factor data can also be derived from part of this accounting data as well. All jobs are bounded in this environment and would eliminate any reservation slipping. In this modeling environment, you are attempting to improve on numbers that have already been produced in order to implement more efficient policies for the future.

Future segment: Developing an architecture for modeling a scheduling process that utilizes a priority queue policy with normal backfill algorithms.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:16 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Dream Big, Dream Grid

oil_000004501460XSmall.jpg

Last time we talked about two similar
yet different benefits of using grids. Today we will expand on that list with
other benefits you might not have yet thought about. Just to be clear, we’re
purely talking about technical benefits here, the business benefits are left
for a whole other column.

Let’s first review what we found last time. The obvious benefits revolve around speedup of your parallel applications and higher throughput of your batch jobs. A typical example of the former is a crash-simulation with PAM-CRASH and MPI, a typical example of the latter is doing virtual high-throughput screening with applications such as LigandFit from Accelrys, where many potential drug targets are screened against a single protein target. But there are other less obvious use-cases for grid that can benefit you.

Imagine running a simulation that has many tweakable parameters that you’ve always set to a pre-set value. When you now move your computations to a grid, you might not need to get your results back any faster, so you could now opt to increase the accuracy of your computation by running the same simulation with different parameter sweeps on different nodes. Further expansion of your grid will suddenly increase the validity and accuracy of your results, rather than decrease runtime. An example of such computation can be found in the Oil and Gas industry where a more refined and accurate computational model of an oil-field can prevent costly dry holes.

One could assert that Monte Carlo situations are in fact also "accuracy-increasing" applications of grid, but there are two subtle differences. First, Monte Carlo simulations run usually on a much more massive scale, with thousands of very short simulations, where parameter sweep modeling typically utilizes larger models on a limited (less than a hundred) number of iterations. Second, typical Monte Carlo simulations only end once a pre-set certain resolution has been achieved,  regardless of the number of grid nodes to your disposal. As such, it is better to categorize Monte Carlo simulations in the "throughput" category.

Once you understand these three basic benefits (speed-up, throughput and accuracy), there’s really no limit to what your imagination can come up with in terms of new applications of grid. Take the Ligandfit example that I mentioned earlier. United Devices' recently retired grid.org looked at the throughput use-case and took it to the extreme by simply taking a protein crucial to the internal workings of cancer cells and running every single possible potential drug target in the library against that protein. It took a leap of imagination to dream up six years of running billions of drug targets against multiple proteins.

The most rewarding moment during a consulting engagement is when I see that users "get" the basic use-cases and start dreaming big. Can you dream big?  What can the grid do for you?

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:15 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Avoiding YAGMF (Yet Another Grid-Mapfile)

file-stack_000004145220XSmall.jpg

The grid infrastructure I work with daily is deploying more and more services based on the Globus Toolkit, and in particular on the Globus Toolkit Java Web services. Each of these services requires users to be authorized to invoke the service operations.

Most often the authorization is managed using our old Globus friend the static grid-mapfile. These grid-mapfiles work fine during development but as we scale out during production we hear the moans from the site administrators of "not another grid-mapfile!"

You can easily Google and find an entire zoo of projects aimed at helping production grids manage authorization for services. Each community seems to have its own effort and we can only hope at some point for a clear winner (I didn't say standard...and yes interoperability is nice but I still would like just a few "best of breed" tools that interoperate. I am naive in that way.)

What if, however, you are a grid architect or developer and you need to tie authorization to grid services into an existing authorization infrastructure? Does the solution necessarily have to involve pulling out authorization details from the legacy infrastrcuture, creating grid-mapfiles, and then having to manage all those grid-mapfiles?

No. A better approach might be to write your own authorization plugin for your Globus Toolkit Java Web services. It is surprisingly simple to do. Your approach might be as simple as writing one or two Java classes representing a Policy Decision Point (PDP) and/or a Policy Information Point (PIP).

Tim Freeman and Rachana Anathakrishnan have written a great tutorial on how to do just that.  If you are wondering how you can tie together Globus grid services and a legacy authorization infrastructure do give it a read before you add one more grid-mapfile to your grid fabric.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:12 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

How to Decipher Grid Engine Statuses – Part II

status-board_000004506559XSmall.jpgIn Part I of this article I’ve discussed meanings of various queue states that one might see after invoking the Grid Engine qstat command. The list of possible job states is just as long as the list of queue states:

• d (deletion) — Indicates that a job has been deleted using qdel.

• r (running) — Indicates that a job is about to be executed or is already executing.

• R (restarted) — Indicates that the job was restarted. This state can be caused by a job migration or because of one of the reasons described in the -r section of the qsub man page.

• s (suspended) — Shows that an already running job has been suspended using qmod.

• S (suspended) — Show that an already running job has been suspended because the queue that it belongs to has been suspended.

• t (transferring) — Indicates that a job is about to be executed or is already executing.

• T (threshold) — Show that an already running job has been suspended because at least one suspend threshold of the corresponding queue was exceeded, and that the job has been suspended as a consequence.

• w (waiting) — Indicates that the job is suspended pending the availability of a critical resource or specified condition.

• q (queued) — Indicates that the job has been queued.

• E (error) — Indicates that the job is in the error state. You can find the reason for this state using the qstat command with “-explain E” option.

• h (hold) — Indicates that the job is not eligible for execution due to a hold state assigned to it via qhold, qalter, or qsub -h command. 

Just like with queue states, one also frequently encounters various combinations of the above job states.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:11 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

How to Decipher Grid Engine Statuses – Part I

status-board_000004506559XSmall.jpg

In all likelihood most of the Grid Engine (GE) end users and administrators have at some point invoked the qstat command and found themselves wondering what do some of the resulting queue and job status letters mean. While some of those letters are pretty intuitive (e.g., ‘E’ stands for error), some are not entirely trivial to decipher. Unfortunately, it does not seem to be very easy to find explanation for these statuses. One usually has to resort to digging through the qstat man pages or through the various GE software manuals that one can find on the web. So, I’ve compiled below information about possible queue statuses:

• a (alarm) – At least one of the load thresholds defined in the load_thresholds list of the queue configuration is currently exceeded. This state prevents GE from scheduling further jobs to that queue. You can find the reason for the alarm state using the qstat command with “-explain a” option.

• A (Alarm) – At least one of the suspend thresholds of the queue is currently exceeded. This state causes jobs running in that queue to be successively suspended until no threshold is violated. You can see the reason for this state using the qstat command with “-explain A” option.

• c (configuration ambiguous) – The queue instance configuration (specified in GE configuration files) is ambiguous. The state resolves when the configuration becomes unambiguous again. This state prevents you from scheduling further jobs to that queue instance. You can find detailed reasons why a queue instance entered this state in the sge_qmaster messages file, or by using the qstat command with “-explain c” option. For queue instances in this state, the cluster queue's default settings are used for the ambiguous attribute.

• C (Calendar suspended) – The queue has been suspended automatically using the GE calendar facility.

• d (disabled) – Queues are disabled and released using the qmod command. Disabling a queue will prevent new jobs to be scheduled for execution in that queue, but it will not affect jobs that are already running there.

• D (Disabled) – The queue has been disabled automatically using the GE calendar facility.

• E (Error) – The queue is in the error state. You can find the reason for this state using the qstat command with “-explain E” option.  Check that daemon's error log for information on how to resolve the problem, and clear the queue state afterwards using the qmod command with the -cq option.

• o (orphaned) – The current cluster queue's configuration and host group configuration no longer needs this queue instance. The queue instance is kept because unfinished jobs are still associated with it. The orphaned state prevents you from scheduling further jobs to that queue instance. It disappears from qstat output when these jobs finish. To help resolve an orphaned queue instance associated with a job, you use the qdel command. You can revive an orphaned queue instance by changing the cluster queue configuration so that the configuration covers that queue instance.

• s (suspended) – Queues are suspended and un-suspended using the qmod command. Suspending a queue suspends all jobs executing in that queue.

• S (Subordinate) – The queue has been suspended due to subordination to another queue. When queue is suspended, regardless of the cause, all jobs executing in that queue are suspended too.

• u (unknown) – The corresponding GE execution daemon (sge_execd) cannot be contacted.

I hope that those who are new to Grid Engine find the above descriptions useful. In Part II of this article I will cover possible job statuses.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:10 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Grids, grids, grids: Which side of the pond wins?

Dan Ciruli at West Coast Grid writes

Europe is years ahead of the US in terms of large grids...

Is Europe years ahead of the US?

Open questions that come to mind include:

  • What is a "large" grid?

  • What makes one region "ahead" of another?

  • What makes one region "years" ahead?

  • If one region is years ahead, what are the reasons for it?

  • What of other regions outside of Europe and the US?

Certainly the US and Europe both have some very large grids, so the question is, what was Dan taking into account when making his claim.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:7 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Hookin' Up is Hard to Do

networking_000001861352XSmall.jpg

Previously we discussed the tension that grid managers face when supporting various stakeholders on an enterprise grid.  In particular we concluded that providing isolated virtual operating environments to each of the business units operating in your environment would be the easiest way to meet their competing and divergent needs.  In this post we will explore the networking challenges that a grid of virtualized systems poses.

The primary challenge you face in this architecture is how to connect it all together.  At first glance it seems simple enough: take your current grid, install a hypervisor on each of its nodes, and then start implementing your user’s specific environments.  Sadly, this will probably not work.

In a typical grid you already have to consider the challenges of connecting several hundred compute nodes to one another and a storage network while keeping network latency low. 

In order to illustrate the networking problems you would have in a virtualized grid, consider a system with a significant number of nodes used by several operational units.  For example, imagine a large financial services company that provides banking, brokerage services, insurance, mortgage, and financing.  Each of these business lines, while related, has their own distinct set of business application workflows.   While there may be some overlap of the specific applications used by each of the units, there is little guarantee that each group will use those applications in the same way let alone use the same versions. Worse yet, a business unit may have multiple operational workflows which do not operate in similar environments (e.g. windows versus Linux specific applications suites).  Finally, we grid managers would like to have development, test, and production instances segregated but running on the same hardware . 

It is easy to project having to support at least ten times more virtual than physical operating environments.  The actual number should be proportional to the number of unique operating environments required by the users. In a standard grid you have a fixed set of computational resources that are reasonably static; in other words systems do not appear and disappear on a regular basis.  However in the virtualized grid, operating environments are going to appear and disappear as a function of the business workflows scheduled by your users.  You can imagine how quickly this can become complicated.

What is the best way to deliver these operating environments to the physical hardware?  If we keep all of the images on local disk then we need to guarantee that there is sufficient disk space on each node; a practice which not only can be costly but does not scale well.  If we choose to keep no more than the maximum number of nodes supported by any application in each operating environment, we can reduce the number of virtual machines we require.  Of course this implies that these images are either stored on a SAN or are transported to the individual physical nodes before booting the virtualized environment.  Sadly, both of these approaches significantly increase network loads.  We will discuss scheduling and managing individual virtual machines in subsequent posts.

How do we connect these virtual environments? If these systems were on segregated physical hardware (think Microsoft Windows versus Linux) we would likely keep them on their own network and/or VLANs.  After all, these environments generally should not interact with one another.  Consequently, shouldn’t we also do this for the virtualized grid?  If we chose not to and instead used DHCP based upon physical topology to provide addresses to the virtualized environments, we could quickly run into trouble.  Specifically, a single job executed on n nodes could conceivably land on n distinct networks and/or VLANs.  This would significantly increase the size of the broadcast domain as well as require more work from your network switches.  Therefore it would add significant latency to all communications between the nodes. Clearly this is a poor choice unless you are always using most of your nodes for each job.

Thus my preferred solution is to segregate operational environments, so that every physical node bridges traffic for several distinct networks over the same interface.  Addresses would be assigned by virtual MAC addresses rather than physical location.  As in the counter-example, this occurs because we will not be able to guarantee where on the physical network topology a particular job is scheduled.  In fact, we probably want to use VLAN tags on our packets so that our switches could more efficiently operate.  Additionally if your grid nodes have secondary interfaces, all communication with the hypervisor should be segregated to its own management network.

If this has not scared you away from the concept of  the virtualized grid (I hope it hasn’t), we will continue to explore other hurdles inherent with this architecture in future posts.

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:5 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

Grid.org Relaunches

logo_grid.pngFor a number of years United Devices operated grid.org as a philanthropic site for cancer research. That mission was completed earlier this year and Univa UD has relaunched the domain to expand the scope of the project to open source cluster and grid management. This will allow many people who want to do large scale computing, but haven't had the ability to use existing tools, to download an easy to use cluster management suite that will allow them to run a variety of applications.

The press release associated with the launch follows:

RENO, Nev. (Nov. 13, 2007) – An online community for open source grid and cluster users, administrators and developers debuted today on the Internet at http://grid.org

Grid.org provides the single aggregation point for information and interaction by the community of users, administrators, and developers interested in a complete open source grid and cluster stack.

The site sponsor, Univa UD, unveiled Grid.org during the Supercomputing ’07 conference.

“The site has been built to support the needs of users of the open source Cluster Express release from Univa UD, which includes many open source components including Grid Engine, Globus and Ganglia,” said Steve Tuecke, co-founder and chief technology officer at Univa UD and a primary architect of the Grid.org community.  “By aggregating information from many distinct open source grid and cluster efforts and facilitating interaction between users who have historically been left on their own to struggle with the integration of these components, Grid.org should be a valuable additional resource not just for new grid and cluster users but also for members of the current open source communities.”

Grid.org is designed as a destination for community members who want to connect easily and productively with those who have similar interests and who want to engage in a vibrant, functioning community of active participants.  At Grid.org, community members can engage with others to discuss issues as well as give and receive help and contribute to the Cluster Express open source software project.

It also will be a resource for professionals who want to learn more about open source grid and cluster computing in general.

Besides providing links to other open source grid and cluster sites, Grid.org will include areas for participants to build their personal professional networks, participate in forums and blogs, access white papers and case studies, explore upcoming events and download free Univa UD Cluster Express open source software for integrated cluster management.

“We are hosting Grid.org to promote the broad adoption of open source grid and cluster technologies,” said Dr. Ian Foster, co-founder and chief open source strategist at Univa UD.  “There are many very good resources relating to open source today, and we want to provide a single site that lets the community navigate this wealth of information and build on it.”

Grid and cluster pioneers will recognize Grid.org as the Web site where Univa UD precursor United Devices operated a public interest Internet research grid with connections to more than 3.6 million devices worldwide, with its primary mission being to demonstrate the power of early grid technology. In this capacity, Grid.org processed data related to cancer, smallpox, and human genome research among other projects.

About Grid.org
Established in 2001, Grid.org is an online community for open source grid and cluster software users, administrators and developers. The site’s current mission is to work with community members to broaden the reach of the site and encourage use of open source technologies for grid and cluster computing at large.  The site provides a single location where open source grid and cluster information can be aggregated so that people with a similar range of interests can easily exchange information, experiences and ideas related to Univa UD’s complete open source grid and cluster software stack.

About Univa UD
Univa UD is the leading provider of open source products for grid and cluster computing environments.  The company’s industrial-strength offerings range from departmental and HPC cluster management to enterprise-wide grids, and represent the proven and cost-effective alternative to traditional proprietary products that customers have been waiting for.  Based on a combination of open source and proprietary components, Univa UD offerings include a downloadable open source cluster management product, a proprietary cluster product with rich functionality, and a comprehensive enterprise grid product based on award-winning technology.  All Univa UD products are run by Fortune 1000 companies in large-scale, production environments.  Univa UD is headquartered in Lisle, Ill. with offices in Austin, Texas. For more information, contact us Univa UD at 1-800-370-5320 or visit us at www.univaud.com .

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:4 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

HealthGrid Comes to Chicago, June 2-4 2008

logo.png

The opportunities to apply grid  computing
methods in health care are, simply put, enormous. (Irving
Wladawsky-Berger refers to it as the "ASCI of Grid" to imply that the
challenges are comparable in their extreme scale to those tackled by
the DOE ASCI program in simulation. That is an understatement.) There
is an urgent need for community, best practices, standards, and the
like.

These considerations motivated the formation of the HealthGrid.US Alliance (HG.US), a partnership of scientific, medical and technology professionals from academia, industry and government, whose shared mission is to promote the application of advanced information technology to solve cutting-edge problems in Biomedical Science and Healthcare. HG.US is an affiliate of the international HealthGrid Association.


As a first action, HG.US is sponsoring the first HealthGrid Annual Meeting to be held outside of Europe, in Chicago, Illinois, USA, June 2-4 2008. See the announcement (pdf). The previous five meetings (2003-2007, held in Europe, have formal published proceedings that are also available from the website.

Many biomedical and health related problems are characterized by diverse collaborators needing access to great quantities of complex heterogeneous data, which is distributed across multiple computing systems, maintained by loosely connected institutions, often across international boundaries. Example projects addressing these challenges include sharing datasets to enable a cure for cancer (caBIG, ACGT) and science portals that enable neuroscientists to better visualize the morphology of the brain (BIRN). These and other projects have begun to demonstrate the power and potential of the Grid approach in biomedicine.


Initially, Grid technology development was driven by computing needs of the particle physics research community and enabled by the availability of high-performance networks. The term "grid" rapidly evolved toward a concept of ubiquitous and transparent computing to support a wide variety of applications, and builds on the well-known metaphor of the pervasive "electricity grid". Today, the HealthGrid space represents some of the most interesting drivers for progress in knowledge-based ubiquitous and transparent computing.

The international HealthGrid Association, based in Europe, provides a firm conceptual foundation for efforts in the US and is fully supportive of the HealthGrid.US Alliance. A HealthGrid white paper articulates the broad scope of the concept. US government agencies have begun to develop complementary strategies. These have been captured in TATRC's Integrated Research Team strategic report on HealthGrid: Grid Technologies for Biomedicine and the US Government interagency HealthGrid Core Strategic Planning Group

از وبلاگ : http://gridgurus.typepad.com

 

 |+| نوشته شده در  جمعه نوزدهم بهمن 1386ساعت 3:3 بعد از ظهر  توسط حامد سلیمی پور  |  داغ کن - کلوب دات کام

How to Improve qconf Productivity