Double T Parallel Computing
     Math 5345/CS 5379


Parallel Computing (Math 5345/CS5379) February
  Announcements
    02/20/01
    02/08/01
    02/06/01
    01/02/01
  Matalloc
    Loop Nest Parallelism
      Loop 1
      Loop 2
      Loop 3  ( Laplacian 5 point star)
      Loop 4 (Forward recurrence)
      Loop 5 (Backward recurrence)
      Loop 6
      Loop 7
      Loop 8
      Loop 9
  Laplace's Equation
      Five Point Approximation (2D)
      Seven point approximation (3D)
      Output for n = 100
  OpenMP
      Scheduling options
  MPI
      Submitting a Job to Pleione
      Greetings
      Trapezoid Rule
  Articles
      Future chips headed for heat problem
      JUNO ANNOUNCES VIRTUAL SUPERCOMPUTER PROJECT
      CHIP DESIGNERS TAKING THE HEAT
      The Ideas Machine (Genetic Programming)
      CANARIE TO BUILD WORLD'S LARGEST DISK DRIVE
      HIGH PERFORMANCE COMPUTING FOR GENETICS RESEARCH
  Assignments
  Mid Term
  Parallel Computing Links
  Contact Information








Announcements


02/20/01

Assignments due Thursday Pages 90—92.  Code the examples up and see how they compile and run.


Exercise
Team
   
1 a, b
mia Hwang         hwangmia@yahoo.com
Sunghyuck Hong    hongsh@hotmail.com  
Ran Zhang         rzhang@ttacs.ttu.edu
1 c, e
Ian Martines imartine@math.ttu.edu
Wayne McGee wmcgee@math.ttu.edu

2
baker@aladdin.chem.ttu.edu
chang@aladdin.chem.ttu.edu
3
Kandle Kulish
Shelly Davenport
4
Jiyuan Wang
5
Marcello Balduccini
3
Channa Navaratna
Menaka Navaratna
Parambee
  Param vir singh
  M S Anam

02/08/01


For the midterm assignments I want you to package the computation in a subroutine or function.  Use good software design practice and minimize the number of parameters that are to be passed.  For example:  a univariate quadrature routine might have the argument list:

double quad(double (*ptf) (double), double left,
double        right, double error);


After class today I want to meet with the two "solving linear systems" groups.

02/06/01


One week from Thursday (Feb 15) I want a presentation from each group (10 minutes max) to describe their project and their progress to date.  

Does anyone feel the need to use MPI on the project?  My current thought is to initially write the project in OpenMP and then re-implement in MPI.

I want to have a short discussion with each of the groups.  After class today I want to meet with the two quadrature groups.

02/01/01


See assignments.


Matalloc



double **matalloc(int n1, int n2)

{
    int i;
    double ** mat, * temp_ptr;

            /*  Allocate space for the array  */
    temp_ptr   = (double *) calloc(n1*n2, sizeof(double));
    if((void *)temp_ptr == NULL){
       /* *inform = 4; */
        return NULL;
    }

    mat = (double **) calloc(n1, sizeof(double *));
    if((void *)(mat) == NULL){
        /**inform = 4;*/
        return NULL;
    }

    for(i=0; i< n1; i++)
        mat[i] = &(temp_ptr[i*n2]);

    return mat;
}

void free_mat(double **mat)
{
free(*mat);
free(mat);
}

Loop Nest Parallelism



Loop 1


DO J = 1, N
   A(0,J) = 0
   DO I = 1, M
      A(0,J) = A(0,J) + A(I,J)
   ENDDO

ENDDO

Loop 2


DO J=1, N
   DO I = 1, M
      A(I,J) = (A(I-1,J-1) + A(I,J-1) + A(I+1,J-1))/3.0
   ENDDO
ENDDO

Loop 3  ( Laplacian 5 point star)


DO J=1, N
   DO I = 1, M
      A(I,J) = (A(I-1,J) + A(I,J-1) + A(I+1,J) &
                       + A(I,J+1))/4.0
   ENDDO
ENDDO

Loop 4 (Forward recurrence)


DO I = 1, N
A(I) = 2.*A(I-1)
ENDDO

Loop 5 (Backward recurrence)


DO I = 1, N
A(I) = 2.*A(I+1)
ENDDO


Loop 6


DO I = 2, N, 2
A(I) = A(I) + A(I-1)
ENDDO

Loop 7


DO I = 1, N/2
A(I) = A(I) + A(I+N/2)
ENDDO

Loop 8


DO I = 1, 1 + N/2
A(I) = A(I) + A(I+N/2)
ENDDO

Loop 9


DO I = 1, N
A(IDX(I)) = A(IDX(I)) + B(IDX(I))
ENDDO


Laplace's Equation


Five Point Approximation (2D)



      implicit none   
      real(kind=8), dimension(:,:), allocatable :: a
real(kind=8) rtime(3)
real(kind=8) error
   integer n, i, j

      n=200
      allocate(a(n,n))
do i=1,n
         do j=1,n
               a(i,j)= 0.d0
          end do
      end do
      a(n/2, n/2) = 1.d0   

!Initialize the matrix by setting the boundary values
      do i=1,n, n-1
       do j=1,n
             a(i,j)= 100.d0*(j-1)*(n-j)/((dble(n/2))**2)
        end do
      end do

do i=1,n, n-1
       do j=2,n-1
            a(j,i)= 100.d0*(j-1)*(j-n)/((dble(n/2))**2)
        end do
      end do

         
call cpu_time(rtime(1))
      call five_pt_it(a, n, error)
call cpu_time(rtime(2))
rtime(3) = rtime(2) - rtime(1)
         
print*, "time elapsed = " , rtime(3)
      print*, "value in a(n/2,  n/2) = ", a(n/2,n/2)
      print*, "The last iteration error = " , error
deallocate(a)
        
      stop
      end




     subroutine five_pt_it(x, n, error)
      implicit none
   real(kind=8), dimension(n,n) :: x
      real(kind=8) error
      integer n, i, j, k
      real(kind=8), dimension(:,:), allocatable :: y

allocate(y(n,n))
! Initialize y and error
          
      do j=1,n
         do i=1,n
               y(i,j)= x(i,j)
          end do
      end do

error = -1.d0
!End Initialization

! Main iteration
! Notice that the number of iterations depends on the size of the system   
do k = 1, 10*n

do j=2,n-1
   do i=2,n-1     
     y(i,j) = (x(i-1,j)+x(i+1,j)+x(i,j-1) +x(i,j+1))/4.d0
   end do
end do

      do j=2,n-1
   do i=2,n-1     
      x(i,j) =(y(i-1,j) +y(i+1,j) +y(i,j-1) +y(i,j+1))/4.d0
end do
end do

enddo
!End Main Iteration

!Compute maximum error in the last iteration
do j=1,n
         do i=1,n
               error = max(error, abs(y(i,j) - x(i,j)))
          end do
       end do
         
         deallocate(y)
return
end

Seven point approximation (3D)



       implicit none   
       real(kind=8), dimension(:,:,:), allocatable :: a
       real(kind=8) rtime(3)
       real(kind=8) error
       integer n, i, j, k

       n=10
       allocate(a(n,n,n))
       do k = 1, n
       do j=1,n
       do i=1,n
               a(i,j, k)= 0.d0
       end do
       end do
       end do
               a(n/2, n/2, n/2) = 0.d0   

!Initialize the matrix by setting the boundary values
       do i=1,n
       do j=1,n
          a(i,j, 1)= 100.d0*(j-1)*(n-j)/((dble(n/2))**4)
     *                    *(i-1)*(n-i)
       end do
       end do


         do i = 1,3
call cpu_time(rtime(1))
         call seven_pt_it(a, n, error)
call cpu_time(rtime(2))
rtime(3) = rtime(2) - rtime(1)
         
print*, "time elapsed = " , rtime(3)
      print*, "value in a(n/2,n/2, n/2) = ",a(n/2,n/2,n/2)
      print*, "The last iteration error = " , error
      print*, " "
      end do
deallocate(a)
        
         stop
         end




     subroutine seven_pt_it(x, n, error)
      implicit none
   real(kind=8), dimension(n,n,n) :: x
      real(kind=8) error
      integer n, i, j, k, kk
      real(kind=8), dimension(:,:,:), allocatable :: y

allocate(y(n,n,n))
! Initialize y and error
       do k = 1,n
       do j=1,n
       do i=1,n
               y(i,j,k)= x(i,j, k)
       end do
       end do
       end do

   error = -1.d0
!End Initialization

! Main iteration
! Notice that the number of iterations depends on the size of the system   
       do kk = 1, 10*n

       do k=2,n-1
       do j=2,n-1
       do i=2,n-1     
     y(i,j,k) = ( x(i-1,j,k)+x(i+1,j,k)+x(i,j-1,k)
     *        +x(i,j+1, k)+x(i,j,k+1)+x(i,j,k-1) )/6.d0
       end do
       end do
       enddo


       do k=2,n-1
       do j=2,n-1
       do i=2,n-1     
     x(i,j,k) = ( y(i-1,j,k)+y(i+1,j,k)+y(i,j-1,k)
     *        +y(i,j+1, k)+y(i,j,k+1)+y(i,j,k-1) )/6.d0
       end do
       end do
       end do
          
       enddo
!End Main Iteration

!Compute maximum error in the last iteration
       do k=1,n
       do j=1,n
       do i=1,n
              error = max(error, abs(y(i,j,k) - x(i,j,k)))
       end do
       end do
       end do
       deallocate(y)
       return

       end

Output for n = 100


time elapsed =  219.88183000000001
value in a(n/2,  n/2, n/2) =  3.6006768920467627
The last iteration error =  6.29176966166156149E-3
  
time elapsed =  201.24601800000002
value in a(n/2,  n/2, n/2) =  8.2712635898124276
The last iteration error =  1.61423126757753721E-3
  
time elapsed =  167.28506600000003
value in a(n/2,  n/2, n/2) =  10.108651521649408
The last iteration error =  5.41940407988050765E-4

Compile line f90 -O3 threedheat.f



OpenMP



Scheduling options



The
schedule clause allows one to select the type and chunk size and it modifies the parallel do construct.

The syntax is
schedule( type[, chunk])

Name Type Chunk Size
     
Simple Static simple N/P
Interleaved simple C (required)
Simple Dynamic dynamic C (optional)
Guided guided Decreasing from N/P (optional)
Runtime runtime varies


See page 89 in PPIO for more details.




MPI


Submitting a Job to Pleione


Here is an example of a batch file for an MPI submission


filename batchfile

#BSUB -n #processors
rm outputfile.txt
cpuset -q batch -A mpirun -miser #processors ./a.out > outputfile.txt

Now to submit the job:
bsub -q short < batchfile

In the above example one might replace #processors by 4 to run on four processors:

#BSUB -n 4
rm outputfile.txt
cpuset -q batch -A mpirun -miser 4 ./a.out > outputfile.txt





Greetings
/* greetings.c -- greetings program
*
* Send a message from all processes with rank != 0 to process 0.
*    Process 0 prints the messages received.
*
* Input: none.
* Output: contents of messages received by process 0.
*
* See Chapter 3, pp. 41 & ff in PPMPI.
*/
#include <stdio.h>
#include <string.h>
#include "mpi.h"

main(int argc, char* argv[]) {
    int         my_rank;       /* rank of process      */
    int         p;             /* number of processes  */
    int         source;        /* rank of sender       */
    int         dest;          /* rank of receiver     */
    int         tag = 0;       /* tag for messages     */
    char        message[100];  /* storage for message  */
    MPI_Status  status;        /* return status for    */
                               /* receive              */


    /* Start up MPI */
    MPI_Init(&argc, &argv);

    /* Find out process rank  */
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

    /* Find out number of processes */
    MPI_Comm_size(MPI_COMM_WORLD, &p);

    if (my_rank != 0) {
        /* Create message */
        sprintf(message, "Greetings from process %d!",
            my_rank);
        dest = 0;
        /* Use strlen+1 so that '\0' gets transmitted */
        MPI_Send(message, strlen(message)+1, MPI_CHAR,
            dest, tag, MPI_COMM_WORLD);
    } else { /* my_rank == 0 */
        for (source = 1; source < p; source++) {
            MPI_Recv(message, 100, MPI_CHAR, source, tag,
                MPI_COMM_WORLD, &status);
            printf("%s\n", message);
        }
    }

    /* Shut down MPI */
    MPI_Finalize();

} /* main */

Trapezoid Rule



/* trap.c -- Parallel Trapezoidal Rule, first version
*
* Input: None.
* Output:  Estimate of the integral from a to b of f(x)
*    using the trapezoidal rule and n trapezoids.
*
* Algorithm:
*    1.  Each process calculates "its" interval of
*        integration.
*    2.  Each process estimates the integral of f(x)
*        over its interval using the trapezoidal rule.
*    3a. Each process != 0 sends its integral to 0.
*    3b. Process 0 sums the calculations received from
*        the individual processes and prints the result.
*
* Notes:  
*    1.  f(x), a, b, and n are all hardwired.
*    2.  The number of processes (p) should evenly divide
*        the number of trapezoids (n = 1024)
*
* See Chap. 4, pp. 56 & ff. in PPMPI.
*/
#include <stdio.h>

/* We'll be using MPI routines, definitions, etc. */
#include "mpi.h"


main(int argc, char** argv) {
    int         my_rank;   /* My process rank           */
    int         p;         /* The number of processes   */
    float       a = 0.0;   /* Left endpoint             */
    float       b = 1.0;   /* Right endpoint            */
    int         n = 1024;  /* Number of trapezoids      */
    float       h;         /* Trapezoid base length     */
    float       local_a;   /* Left endpoint my process  */
    float       local_b;   /* Right endpoint my process */
    int         local_n;   /* Number of trapezoids for  */
                           /* my calculation            */
    float       integral;  /* Integral over my interval */
    float       total;     /* Total integral            */
    int         source;    /* Process sending integral  */
    int         dest = 0;  /* All messages go to 0      */
    int         tag = 0;
    MPI_Status  status;

    float Trap(float local_a, float local_b, int local_n,
              float h);    /* Calculate local integral  */

    /* Let the system do what it needs to start up MPI */
    MPI_Init(&argc, &argv);

    /* Get my process rank */
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

    /* Find out how many processes are being used */
    MPI_Comm_size(MPI_COMM_WORLD, &p);

    h = (b-a)/n;    /* h is the same for all processes */
    local_n = n/p;  /* So is the number of trapezoids */

    /* Length of each process' interval of
     * integration = local_n*h.  So my interval
     * starts at: */
    local_a = a + my_rank*local_n*h;
    local_b = local_a + local_n*h;
    integral = Trap(local_a, local_b, local_n, h);

    /* Add up the integrals calculated by each process */
    if (my_rank == 0) {
        total = integral;
        for (source = 1; source < p; source++) {
            MPI_Recv(&integral, 1, MPI_FLOAT, source, tag,
                MPI_COMM_WORLD, &status);
            total = total + integral;
        }
    } else {  
        MPI_Send(&integral, 1, MPI_FLOAT, dest,
            tag, MPI_COMM_WORLD);
    }

    /* Print the result */
    if (my_rank == 0) {
        printf("With n = %d trapezoids, our estimate\n",
            n);
        printf("of the integral from %f to %f = %f\n",
            a, b, total);
    }

    /* Shut down MPI */
    MPI_Finalize();
} /*  main  */


float Trap(
          float  local_a   /* in */,
          float  local_b   /* in */,
          int    local_n   /* in */,
          float  h         /* in */) {

    float integral;   /* Store result in integral  */
    float x;
    int i;

    float f(float x); /* function we're integrating */

    integral = (f(local_a) + f(local_b))/2.0;
    x = local_a;
    for (i = 1; i <= local_n-1; i++) {
        x = x + h;
        integral = integral + f(x);
    }
    integral = integral*h;
    return integral;
} /*  Trap  */


float f(float x) {
    float return_val;
    /* Calculate f(x). */
    /* Store calculation in return_val. */
    return_val = x*x;
    return return_val;
} /* f */

Articles


Future chips headed for heat problem


If processors continue to be built using current methods, a CPU's power requirements will cancel out its usability, warned Intel's Patrick Gelsinger at the International Solid-State Circuits Conference. Gelsinger said a way to boost computing speed without producing additional power must be developed, and he detailed several options that Intel is researching, including multihead CPUs, L2 caches, multiple CPUs on a single die, and low-power transistors.
Heat sinks and fans used to cool down chips will not work in the future, Gelsinger said. On a more positive note, Gelsinger predicted that future processors would be able to support such functions as network communications, speech to text, encryption, 3D gesture recognition, and natural language processing. According to Gelsinger's forecast, processor performance will be measured in tera instructions per second, or TIPS, instead of millions of instructions per second, or MIPS. Gelsinger added that future desktop PCs will have the processing power of current ultraexpensive models.
http://www.pcworld.com/resource/article/0,aid,40466,00.asp

JUNO ANNOUNCES VIRTUAL SUPERCOMPUTER PROJECT


02/09/01

New York, NY -- Juno Online Services, Inc. today announced the establishment of the Juno Virtual Supercomputer Project, a distributed computing effort of unprecedented scope that aims to harness unused processing power associated with the free portion of its subscriber base in order to execute computationally intensive biomedical and other applications on behalf of commercial clients and research institutions.

Juno is one of the nation's largest Internet access providers, with 14.2 million total registered subscribers and 4.0 million active subscribers in December 2000. While the personal computers owned by different Juno subscribers have different performance characteristics, preliminary studies recently completed by the company suggest that if the computers of all of Juno's active free subscriber base were simultaneously working on a single computational problem, they would together represent the world's fastest supercomputer (measured in terms of aggregate instructions per second), and might approach or break the "petahertz barrier" (with a hypothetical effective processor speed on the order of a billion megahertz). Although the achievable effective computing power (and the level of any associated revenues) are likely to be significantly lower in practice for a variety of reasons, Juno's management believes that the unused computing power of its free subscriber base represents a potentially valuable asset from the viewpoint of both potential revenue generation and potential contribution to society.

The company expects to focus particular attention on prospective clients involved in bioinformatics research who are beginning to confront such computationally demanding applications as the determination of the structure of proteins encoded by gene sequences discovered through recently completed efforts to sequence the human genome, and searching through millions of "virtual molecules" to find promising candidates for new pharmaceutical products. Juno's management believes that many such problems can be effectively divided into a large number of smaller computational tasks in such a way as to capture substantial potential gains in both speed and cost-effectiveness by comparison with traditional supercomputing approaches. Additionally, Juno's service is designed to make it possible for customers to access very large amounts of processing power over relatively short periods of time without having to bear the high overhead costs associated with the in-house acquisition or rental of conventional supercomputers or "computer farms."

The Juno Virtual Supercomputer Project will make use of patented technology Juno currently employs in connection with its display of advertising to download computational tasks to subscribers' computers for processing offline during time when such subscribers are not using their computers. The results of such offline computations will then be uploaded to Juno's central computers during a subsequent connection, in much the same way that Juno currently collects responses to the advertisements it shows offline. Applications will run as "screen savers" on the computers of participating subscribers when their machines would otherwise be idle, performing calculations when the computer is on but not in use. Neither the download nor the operation of these applications are expected to have any significant impact on the user experience or on the connection speeds subscribers experience while using Juno.

Initially, Juno expects to recruit volunteers from among its millions of subscribers to provide the computing power required for its early virtual supercomputing activities. Subscribers to Juno's free basic service may ultimately be required to make their unused computing power available to the project as a condition for using that service. While the company's billable subscribers may be offered the opportunity to participate on a strictly optional basis in order to advance biomedical research and/or other forms of scientific and technical progress, the company does not currently expect to require the participation of such subscribers.

Juno also announced today that it has hired Yury Rozenman, formerly of Applied Biosystems, to spearhead the Juno Virtual Supercomputer Project. Mr. Rozenman, who joined Juno earlier this month as a vice president, comes to the company with more than 13 years of scientific and bioinformatics experience, predominantly in protein and DNA analysis and its application to understanding biological processes.

"Pharmaceutical firms spend many years, and an estimated $400 million dollars, for every drug that is eventually brought to market," commented Rozenman. "To the extent that groundbreaking new applications software running on Juno's Virtual Supercomputer is able to bring lifesaving new drugs to market in a fraction of the time and at a fraction of the cost, this could be enormously valuable from both an economic and a human perspective."

"We are excited about the prospect of teaming up with our subscribers to offer researchers a supercomputing tool of extraordinary power," said Charles Ardai, Juno's president and chief executive officer. "We believe this project has the potential to further diversify our revenue streams, and in particular, to enable us to derive additional revenue from our free subscribers. If we can do so while making a significant contribution to biomedical research, this would be even more gratifying."

About Juno

Juno Online Services, Inc. is one of the nation's leading Internet access providers, with 14.2 million total registered subscribers as of December 31, 2000, and 4.0 million active subscribers during that month.

Founded in 1995, Juno provides multiple levels of service, including free basic Internet access, billable premium dial-up service, and (in certain markets) high-speed broadband access. The company's revenues are derived primarily from the subscription fees charged for its billable premium services, from the sale of advertising, and from various forms of electronic commerce.

Web site: http://www.juno.com/corp

CHIP DESIGNERS TAKING THE HEAT


02/09/01

Michael Kanellos reported for ZDNet: The technology industry is headed for a meltdown, warns Intel's chief technology officer. Heat is becoming one of the most critical issues in computer and semiconductor design, according to Intel CTO Pat Gelsinger, who will discuss the issue in a keynote Monday at the International Solid-State Circuits Conference. The five-day convention in San Francisco is dedicated to semiconductor research.

Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second--about the same number of calculations that the world's fastest supercomputer can perform now.

Unfortunately, Gelsinger said, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor. Not only that, but with more than a billion transistors, they will start to look like rodeo belt buckles. From an engineering standpoint, as well as a financial one, that is untenable.

"We believe power and power density becomes a fundamental issue," he said. "We have a huge problem to cool these devices, given normal cooling technologies...We need to put much more emphasis on transistor design."

Gelsinger's speech will outline some of the methods Intel is experimenting with to reduce power consumption. One technique, for example, focuses on creating more special-purpose sub-sections inside the larger microprocessor. These sections would perform only certain tasks and be activated only when necessary.

Also at the conference, Intel will present papers describing McKinley, a 64-bit server chip coming out in samples later this year. The chip, which will be the second version of the often-delayed and yet-to-be-released Itanium chip, will run fast than 1.2 GHz and contain 214 million transistors, according to the conference materials.

Meanwhile, IBM will discuss a 1.1GHz Power4 server processor that contains two processor cores while Sun will present a paper on a dual processor chip from its MAJC family for devices. Compaq will also discuss a 1.2GHz Alpha chip.

Intel is also looking at integrating multiple microprocessors onto a single piece of silicon, a technique IBM is already working on for its server chips. Power consumption goes down, essentially, because electrons don't have to travel as far. Intel is also tinkering with insulating techniques that prevent transistors from "leaking" electricity.

In addition, software will be tweaked to reduce redundant requests to the processor.

The looming heat issue is an unintended consequence of Moore's Law, Gelsinger said. The famous maxim, coined by Intel co-founder Gordon Moore, dictates that the number of processors on a microprocessor doubles every two years.

Increasing transistor count has served as the bedrock of computing advances over the past 30 years. However, it has also meant an increase in the amount of power required to run a processor, which in turn leads to heat. In addition, microprocessors are growing about 14 percent in size every two years.

"We see a landscape where, for the next 10 years, we can keep Moore's Law running," Gelsinger said. But unless insulating techniques are created and adopted, the pace of development will likely slow.

The emphasis on heat, he added, is relatively new.

Powering up "We haven't limited our designs by power in the past. We have limited them by cost and manufacturability," Gelsinger said. "Some of the things we did in the past are no longer applicable."

Heat and power consumption emerged as an issue in the computer world in January 2000 when Transmeta introduced a line of notebook chips that the company asserted consume less energy than competing products from Intel or Advanced Micro Devices.

The appeal of Transmeta's Crusoe processors initially came from the fact that the chips would let notebooks run longer on a single battery charge. However, a group of start-ups is adopting the chips for servers that should hit the market in the first half of the year.

With ASPs (application service providers) packing hundreds of servers into small rooms, heat, not to mention the cost of electricity, has become a huge problem. Servers, for instance, have been known to melt after air-conditioning failures.

"Heat is the primary killer of electronic hardware," said Chris Hipp, chief technology officer at RLX Technologies. "Servers are getting more dense and the processors are getting hotter and hotter and consuming more power."

Although Transmeta has come to prominence through the heat issue, Intel executives said they have been incorporating heat-management technologies into their chips for some time, said Glen Hinton, an Intel research fellow. Some of these techniques were incorporated into the Pentium 4.

With the Pentium 4, he said, "We've been able to reduce power consumption compared to what historical trends have been." The chip also has an automatic switch-off feature that prevents meltdowns.

"On the Pentium 4, power was an important factor but not the most important factor," Hinton said. In the future, "it will be as important or more important than performance...It will change the way we design the microarchitecture."



The Ideas Machine (Genetic Programming)



New Scientist (01/20/01) Vol. 169, No. 2274, P. 26; Matthews, Robert

"Genetic algorithms" are widely used today in commercial applications such as finding the most efficient airline schedules and designing circuitry. John Koza of Stanford University has been developing the next generation of that idea, genetic programming (GP). GP begins with arranging mathematical formulas and goal parameters given by human-engineered software. GP goes a step further by not dealing with lists of data or testing every single combination of possibilities. Instead GP chooses a limited number of plausible formulas, and then breeds them thousands of times to produce the best possible design or method. In the first-ever issue of Genetic Programming and Evolvable Machines, Koza and his team describe a number of designs its GP computer arrived at on its own. Its "inventions" ranged from electronic thermometers to the Christmas Tree-type antennas atop homes across the world. The results of Koza's experiments have yielded both old answers and new innovations. For example, when GP was asked to find a method for predicting planetary movement with basic formulas describing the operation of the Solar system, it eventually answered with Kepler's Third Law of Planetary Motion, discovered in 1618. GP has also come out with some real benefits. The Stanford team found the process especially adept at designing circuits that regulate non-linear feedback, a huge problem for human engineers because of the complex mathematical problems involved. Koza says GP does not know about those complexities when it randomly chooses components to use in its evolving designs. Human engineers, however, would probably be hesitant to use those components because their effects are hard to calculate. Koza, who has been working on the project for years, says his team is nearing effective commercial application but currently lacks the processing power to create 3D designs noting, "it would take years." He hopes that, if processing speeds continue to keep in step with Moore's law, they should be able to do so soon.

CANARIE TO BUILD WORLD'S LARGEST DISK DRIVE



02/16/01

Ottawa, CANADA -- CANARIE, Canada's advanced Internet development organization, today announced a plan to construct the world's largest "disk drive" called a Wavelength Disk Drive which will be constructed around wavelengths of light on CANARIE's national optical research network, CA net 3. The wavelength disk drive will be more than 8000 km in diameter.

"For several years, researchers have recognized that harnessing the computing power of thousands of personal computers connected to the Internet would provide more computing power than even the largest super computers," said Andrew Bjerring, President and CEO of CANARIE. "This innovative project is intended to address one of the challenges inherent in realizing this dream: the difficulty of sharing large amounts of data efficiently among thousands of computers, each trying to communicate with the others."

Instead of having the computers send and receive data from each other -- which slows the collaborative effort down to the slowest computer or slowest network link -- the computers will simply read and write the data to the optical network as if it were one large, shared disk drive. Because the intrinsic carrying capacity of a multi-wavelength optical network like CANARIE's CA net 3 is so large, it acts as a gigantic, nation-wide optical storage device. Since all the participating computers will have ready access to all the data circulating on the network, their collective ability to solve problems quickly will be greatly enhanced.

This revolutionary use of optical networks has immediate applications in such collaborative research fields as environmental modelling, genomic and pharmaceutical simulations and astrophysics, to name a few.

"This is an exciting concept," says Andy Woodsworth, Director General, Institute for Information Technology, National Research Council of Canada. "We operate Canada's largest distributed network of servers as part of The Canadian Bioinformatics Resource, a national service dedicated to providing researchers with convenient, effective access to high-quality genomics and bioinformatics tools and databases, interconnected to our new 6912 processor Paracel GeneMatcher in Halifax. This technology will allow more efficient distribution of data between these servers."

The implications of this concept to telecommunications carriers could be equally significant. Purchasing a super computer in the future might simply involve buying optical wavelengths from a favorite carrier just as silicon chip based machines are purchased from computer manufacturers today. This revolutionary use of optical network technology could also become an important use for unused bandwidth. No longer would networks be simply a means of computers exchanging data directly with each another; the network will be the computer.

CANARIE will be deploying a proof-of-concept wavelength optical disk drive on its CA net 3 network over the next few months. The capacity of this trial disk drive is expected to be several gigabytes.

About CANARIE Inc.

World-renowned CANARIE Inc. is Canada's advanced Internet development organization. CANARIE is a non-profit corporation dedicated to accelerating the development of Canada's Internet and the creation of innovative applications that exploit the power of that infrastructure to benefit Canadians. CANARIE plays the critical role of facilitator, bringing together experts from private industry, government, and the research and education community to form project partnerships.

CANARIE Inc. was established in 1993 and is supported by Industry Canada and 120 members from the private and public sectors.

CANARIE is a catalyst for advancements in areas such as "fibre optic networks", "fibre to the home networks", "DWDM", "bandwidth revolution", "customer-owned fibre networks", "affordable bandwidth", "dark-fibre networks", E-Learning and telemedicine. One of CANARIE's key successes is the deployment of the world's first national optical R&D Internet network -- CA net 3. This achievement affirmed Canada's position as the global leader in advanced networking.

Since its inception, CANARIE Inc. has funded more than 200 advanced Internet applications projects involving more than 500 companies and creating thousands of high technology jobs.

For further technical information on the Wavelength Disk Drive please see http://www.canet3.net/library/papers.html

HIGH PERFORMANCE COMPUTING FOR GENETICS RESEARCH



02/16/01

There is no doubt that the mapping of the human genome, completed in June 2000, is one of the greatest scientific advancements in history. There is also no question that this breakthrough in biological research was made possible by advancements in high performance computing technology. High-speed computers are necessary to analyze the hundreds of terabytes of raw sequence data and correctly order the 3 billion base pairs of DNA that comprise the human genome.

Compaq Computer Corporation is at the forefront of this technological breakthrough, and it is Compaq hardware and software technology that is providing the powerful computing capabilities required as companies, organizations and agencies open new and exciting chapters in life science applications.

Today it is increasingly difficult to separate the advances in biotechnology from advances in high performance computing. Many leading scientists believe that high-end computing is the future of biology and medicine. Recent developments in genetics research, computer science and information technology have given rise to the new, interdisciplinary science of bioinformatics which involves the use of computers to retrieve, process and analyze biological information - primarily genetic data.

The staggering quantity and complexity of data generated by the Human Genome Project has created an insatiable demand for high performance computing. It is estimated that the doubling rate for genetic databases is six-to-eight months. Genomic research organizations such as Celera Genomics and the Sanger Centre are already managing multiple terabytes of data, larger than the Library of Congress in size. That data will grow exponentially over the next several years. The computing resources required for the Human Genome Project are millions of times greater than used to land a man on the moon.

And mapping the genome is only the beginning. Now that the human genome is decoded, researchers begin the quest to figure out exactly how each gene functions -- and, more importantly, how each gene malfunctions to trigger deadly illnesses from heart disease to cancer. It will take increasingly powerful computers and software to gather, store, analyze, model and distribute information. The availability of the biological data, coupled with powerful software analysis tools and powerful computer systems, will allow scientists to develop new diagnostics, drug therapies and new strategies and methods for identifying disease genes. In addition, agricultural/chemical companies (ag/chem) are also investing in bioinformatics to develop new strains of seeds or herbicide resistant crops.

Bioinformatics is the largest R&D investment that ag/chem, pharmaceutical, and biotech companies are making today as they strive to develop and bring products to market faster. In addition, industry analysts estimate up to 100 percent annual revenue growth for genomics companies, and information technology spending is expected to match or exceed that revenue growth. Computing companies, like Compaq, are responding with powerful, faster, more reliable and cost-effective computing solutions.

Compaq Market Position:

Compaq stands as the leading supplier of computing systems for bioinformatics, working with many major research centers worldwide including the world's largest genomic sequencing facility at Celera Genomics. With Compaq working as a partner, the process of mapping the full human genome was dramatically compressed through the more aggressive and large scale application of sequencing and computing technology.

Most of the leading commercial and public genomics centers, such as Celera Genomics, the Sanger Centre, the Whitehead Institute/MIT Center for Genome Research, and Genoscope use Compaq Alpha systems either exclusively or predominantly. Major pharmaceuticals such as Genentech are also prominent Compaq customers.

- Celera Genomics, a pioneer in human genomics, chose Compaq as its IT partner in developing the world's largest genomic sequencing and computing facility. Compaq designed and equipped Celera's data center, which includes over 600 interconnected Alpha processors and Compaq StorageWorks systems managing a 70 terabyte database.

- The Sanger Centre is one of the world's major genome centers. Sanger's vast bioinformatics computing resources include more than 400 Compaq AlphaServer systems and workstations running Tru64 UNIX software, as well as Compaq PCs running Linux and over 20 TB of Compaq StorageWorks storage.

- The Whitehead Institute/MIT Center for Genome Research operates the largest public genomic sequencing center in the U.S. Whitehead selected Compaq to supply the IT infrastructure. They rely on AlphaServer ES40 systems and Compaq StorageWorks to manage and analyze their genomic data.

- Genoscope (Centre National de Sequencage) is a non-profit organization located in Evry, France and owns the second largest sequencing facility in Europe. Genoscope chose Compaq AlphaServer systems and Compaq StorageWorks as the basis for its IT architecture.

- Geneva Proteomics, Inc. announced in October 2000 that it has selected Compaq as its exclusive IT partner for the company's two major research proteomics factories in Geneva, Switzerland, and in the United States. As part of the agreement, Geneva Proteomics will use Compaq Global Services, StorageWorks storage systems, and industry-leading AlphaServer systems running Tru64 UNIX to power and manage critical aspects of the company's pioneering proteomic research facilities. The contract value is expected to exceed $70 million (US) over a period of four years.

- Genentech, a biotechnology pioneer, relies on Compaq AlphaServer systems as a key element in their IT infrastructure - running everything from email to high performance bioinformatics, protein and molecular biology applications on Tru64 UNIX, Alpha systems.

- The Institute for Genomic Research (TIGR) is a leading non-profit genomics research institute that focuses on analyzing and describing entire genomes and making this valuable information available to the scientific community. TIGR moved to the 64-bit Compaq Alpha platform to alleviate memory limitations experienced with its 32-bit systems. TIGR has reduced response time by more than 90% with its Alpha systems

Computing Power for the Post-Genomic Era

The challenges of the post-genomic sequencing era promise to be even more intensive than the initial genome mapping phase. As the Human Genome Project moves into the annotation and analysis work, Compaq has launched several programs to further its commitment.

- Compaq BioCluster. One of Compaq's most recent contributions to the Human Genome Project is a cluster of 27 AlphaServer ES40 systems (over 100 processors) and over one terabyte of storage located at Compaq's Enterprise Systems Lab in Littleton, MA. The system is being made available to public sector research institutions to complete the annotation of the human genome -- identifying where the genes are located in the human chromosome, and also doing an initial analysis. Since it was set up in May 2000, more than one million jobs have been run on the BioCluster. Various users, located in Boston, St. Louis, California, and France access the cluster through the web.

The BioCluster is comprised of 27 AlphaServer ES40 4-processor SMP nodes, each with 54 GigaBytes of local storage. The BioCluster" is networked together using dual switched 10/100 Ethernet. Twenty-five of the nodes have 4 GB of RAM and one has 16 GB of RAM. In addition, the system has a central file server with 1 terabyte of secondary storage. A standalone AlphaServer ES40 system is also available for testing scripts and any new user code before running on the main cluster.

Unlike commercial genomic labs, the public sector has limited funds. The public Human Genome Project project lacked the computing capacity and funds to complete the annotation work on schedule and requested Compaq's assistance. Compaq agreed to help by providing the computing tools and storage to complete the annotation.

MIT's Whitehead Institute has been the biggest user, employing the BioCluster to run RepeatMasker and BLAST queries on successive releases of draft human sequence to catalog the repetitive DNA in the genome. There are approximately 100,000 genes in the human genome. The rest is comprised of repetitive, although conceivably just as important, code whose purpose has yet to be determined. In another MIT lab, the cluster has been used to perform genome annotation.

The University California at Santa Cruz has used the BioCluster for the layout and arranging of DNA pieces into the best version of the sequence. Researchers at Washington University also use the system for various gene expression and annotation work.

Genoscope in France ran the first analysis of the complete draft of the human genome on the BioCluster. The results will provide highly accurate prediction of the total number of genes in the human genome. The analysis used the LASSAP (LArge Scale Sequence compArison Package) sequence comparison software package. The complete analysis of the whole draft dataset took only 38 hours on the 100-cpu BioCluster. According to the authors of LASSAP, this was 25% less time to complete a run 2.5 times larger than all previous runs made on any system available from any vendor.

Compaq's Cambridge Research Lab (CRL) uses the BioCluster to find the exact location of the genes. CRL is developing algorithms and systems to support large-scale genomic analysis. The lab is currently working on an efficient and high accuracy gene discovery software pipeline that will allow them to gain an insight into the efficiency of Compaq platforms to support genomic analysis, as well as, develop new computational ideas to improve the quality and speed of this analysis. The computational tools that are under development include gene detection, functional genomics and comparatives genome analysis. CRL's objective is to follow standard academic methodology and release procedures for the internally developed software to the general public.

- Cooperative Research and Development Agreement. On January 19, 2001 the U.S. Department of Energy (DOE) announced that Sandia National Laboratories and Celera Genomics signed a Cooperative Research & Development Agreement (CRADA) to develop the next generation software and computer hardware solutions specifically designed for the demands of computational biology and a full range of life sciences applications. Compaq Computer Corporation will provide the project technology to increase computing capability with the goal of achieving a system that will provide 100 trillion operations per second (100 TeraOPS). By sharing computing technology, Sandia, Celera and Compaq may ultimately reach the petacruncher level of 1,000 TeraOPS.

- $100 Million Life Sciences Investment Program. On September 26, 2000, Compaq Computer Corporation announced that it would invest an initial $100 million in early-stage life sciences companies in the areas of genomics, bioinformatics, and related market disciplines. This will involve a mix of direct investment in such companies, and investment in venture capital funds targeting these areas. Compaq's goal is to spur the growth of discovery in life sciences companies through a combination of financial support and early access to Compaq's industry-leading AlphaServer systems running the Tru64 UNIX operating system, Compaq StorageWorks systems, solutions and services.

Initial investments include an equity investment in Geneva Proteomics and an investment in the Cambridge Massachusetts-based Applied Genomic Technology Capital Fund, L.P., a venture capital fund focusing on investments in genomics and bioinformatics.

Compaq's Bioinformatics Solutions

Compaq delivers competitive bioinformatics solutions that include:

- Full range of high-performance computing systems. Compaq offers a complete choice of HPTC systems from the low end to the high end, Tru64 UNIX or Linux. With its Alpha systems, Compaq is a leader in scalable 64-bit systems supporting very large memory database applications. Today's AlphaServer DS, ES, GS, and SC systems provide the performance of the latest Alpha microprocessors in scalable configurations from the low-cost, single processor AlphaServer DS10 to high-end, switched SMP systems such as the AlphaServer SC series supercomputer that supports up to 512 processors.

- Best absolute performance with Alpha systems. AlphaServer systems deliver not only the best absolute performance, but also the best-sustained application performance available with Tru64 UNIX or Linux. According to D.H. Brown Associates in a recent report, Compaq Alpha technology continues to hold the performance pole position. For sheer computing power, the 667 MHz Alpha 21264 outpaces it nearest rival by 68% (average) on both integer and floating-point performance measured by SPEC2000 benchmarks.

Most bioinformatics codes and applications achieve their best performance on Compaq Alpha systems. AlphaServer systems took the lead in independent genetic sequence analysis benchmarks run by the University of Nebraska Medical Center. The Compaq AlphaServer ES40 outperformed the competition by a factor of two or more in all of the benchmark tests. In Celera Genomics' benchmark tests, the next closest computing system was literally an order of magnitude slower than the AlphaServer system. Celera took a large bioinformatics benchmark and gave it to all vendors. Only two vendors could run it. One ran it in 87 hours. The Compaq AlphaServer system ran it in seven hours. Similarly, Genoscope found that Compaq's AlphaServer BioCluster system took 25% less time to complete a LASSAP sequence comparison analysis run that was 2.5 times larger than all previous runs made on any system available from any vendor.

- Compaq StorageWorks for high reliability, flexible storage. Bioinformatics customers value Compaq StorageWorks systems for their expandability and redundant components (dual heads, controllers and power supplies) that ensure no single point of failure.

- Optimized Application Portfolio. Compaq has collaborations and joint development efforts with the leading academic and commercial software developers for bioinformatics. Compaq engineers work closely with original code developers to optimize their code for Alpha systems.

Over sixty public domain codes are available on the Alpha platform and many are optimized for Alpha (e.g. BLAST, FASTA, SWAT, PHRAP, HMMER, etc.). The most widely used commercial software from companies such as Incyte Genomics and GCG Software/Oxford Molecular Group, Advance Visual Systems, and Southwest Parallel Software are also available on Alpha systems. Compaq continues to work with new software vendors in this market, such as LION Bioscience, InforMax and Synomics. Oracle provides the leading database performance for bioinformatics on Alpha.

- Capable of handling the largest information demands. Compaq delivers high-performance AlphaServer systems that provide the foundation for a powerful, robust, reliable computing environment to meet the data availability, reliability, and distribution challenges posed by the global, virtual teams addressing the decoding of the human genome.

- Worldwide planning, deployment, and management services. Compaq has assembled one of the largest, most advanced service organizations in the world. As part of its Professional Services portfolio, Compaq provides expertise specifically for high-performance computing environments.



Assignments


The purpose of these assignments is to test your parallel computing ability.  Thus to the extent possible please emphasize that portion of the project.  It is better to have a naïve implementation working in parallel than a complicated version with bugs.  You are encouraged to have several versions of the software that you develop.  Please submit code and a written report (electronically) on the due date.

Mid Term


This first major assignment is due March 7.  Let me suggest the following possible projects.  

1. Matrix Multiply
2. Optimization
3. Quadrature (numerical integration)
4. Sort
5. Iterative solve of linear system
6. Simulation of a simple system
7. TSP
8. Game Playing
9. You may suggest another project


Your group must get approval from me on the project that you choose.  Please submit a written statement of work by February 1.
This project consists of two parts: Submission of a written report and Submission of working code.  Both must be completed by March 7 and in my e-mail box on that date.  

Topic
Team
   
Sort mia Hwang         hwangmia@yahoo.com
Sunghyuck Hong    hongsh@hotmail.com  
Ran Zhang         rzhang@ttacs.ttu.edu
Solving a linear system Ian Martines imartine@math.ttu.edu
Wayne McGee wmcgee@math.ttu.edu

Simulation baker@aladdin.chem.ttu.edu
chang@aladdin.chem.ttu.edu
Quadrature Kandle Kulish
Shelly Davenport
Matrix Multiply Jiyuan Wang
Models of a logic program under the answer set semantics. Marcello Balduccini
Quadrature Channa Navaratna
Menaka Navaratna
Parambee
Solving a linear system Param vir singh
Matrix Multiply M S Anam

Parallel Computing Links


http://ei.cs.vt.edu/~history/Parallel.html
http://www.msi.umn.edu/origin/tutorials/intro/intro.html
http://scv.bu.edu/SCV/Origin2000/
http://www.arsc.edu/pubs/MPPnews.shtml
http://www.indiana.edu/~rac/hpc/links.html
http://www.ahpcc.unm.edu/
http://hpcf.nersc.gov/
http://www.msi.umn.edu/
http://www.wes.hpc.mil/msrc/training
http://www.acl.lanl.gov/nirvana/userguide/
http://www-unix.mcs.anl.gov/dbpp/text/node1.html
http://www.arl.hpc.mil/PET/training/index.html
http://www.lanl.gov/asci/bluemtn/examples/
http://www.lanl.gov/orgs/cic/computingatlanl/
http://www-unix.mcs.anl.gov/dbpp/
http://elastic.org/~fche/mirrors/www.jya.com/hpc/chap4.htm
http://www.llnl.gov/computing/tutorials/
http://science.nas.nasa.gov/Groups/SciCon/refs.html
http://hpcf.nersc.gov/training/tutorials/
http://www.npac.syr.edu/copywrite/pcw/
http://www.cs.berkeley.edu/~demmel/cs267/
http://web.msi.umn.edu/tutorials/MPI/content3_new.html
http://web.msi.umn.edu/tutorials/MPI/content3_new.html
http://www-c.mcs.anl.gov/mpi/mpich/indexold.html
http://www-jics.cs.utk.edu/SP_LECT/sp_mpi/index.html
http://www.cc.ukans.edu/~acs/docs/other/intro-MPI.shtml
http://www.epcc.ed.ac.uk/epcc-tec/documents/mpi-course/mpi-course.book_1.html
http://www.netlib.org/utk/papers/mpi-book/node1.html
http://www.npac.syr.edu/users/gcf/cps615nov796/index.html#local13


Contact Information


Philip W. Smith
philip.smith@ttu.edu
Director, HPCC
Texas Tech University
806.885.0336
Fax 806.885.1847

www.hpcc.ttu.edu


Mailing address:
Dr. Philip W. Smith, Director
High Performance Computing Center
Box 41163
Lubbock TX 79409-1163





Document converted from ms word 8 by MSWordView(mswordview 0.5.14 (bw6))
MSWordView written by Caolan McNamara