Image courtesy of the Singapore Tourism Board
1st International
World Wide Work Flow Grid
Workshop

1WWWFG

An APBioNet-GridAsia Project


GridAsia2007

About Singapore

Introduction

Who Should Come

Programme

Keynote

Speakers

Conference Track 1

Conference Track 2

Conference Track 3

Tutorials

Scientific Committee

REGISTER NOW!
Registration via GridAsia


Venue

Accommodation

Call for Participation

Call for Tutorials

Official
Sponsors


National Grid Singapore

British High Commission Singapore

UK-Singapore Partners in eScience

EPSRC Discovery Net

APBioNet

AMBIS

1ST INTERNATIONAL WORKSHOP ON
WORLD WIDE WORKFLOW GRID (WWWFG) 2007 @ GRID ASIA 2007
SINGAPORE


Keynote Speakers

Professor Carole Goble Professor Carole Goble | Bio
University of Manchester, UK

The workflow ecosystem: why plumbing is not enough | PPT ready |
Workflows have become the fashionable, de facto, mechanism for linking together scientific resources, coordinating and orchestrating services and scheduling jobs over grids. A plethora of systems are available, aimed at different layers of the software stack, covering a spectrum of capabilities and catering for a range of situations. These systems, including our own workflow workbench, Taverna, are becoming routinely used by scientists. Workflows are intended to ease the routine and repetitive burden of plumbing together components. But just enabling good plumbing is not enough. Ecosystems of tools, methods, mechanisms and components surround workflows, the scientific objects that accompany them, their workflow systems and workflow e-Scientists that use them. The services orchestrated could be third party "in the wild" and out of the workflow environment's control. Tools are needed that support the whole scientific method, including design, discovery, and publication of workflows, their components and the resources they flow work through. A scientist should be able to mix and match workflows, regardless of their host system, and straightforwardly mash workflows onto their own applications. Workflows are valuable knowledge assets in their own right, to be pooled, shared and remixed as easily as citizens share photos and videos on the Web. myExperiment, (http://myexperiment.org) for example, is our new initiative to create a social networking site for encouraging workflow workers to share and discuss scientific workflows and their related scientific objects, and to harness this social intelligence for the common good. Drawing on my practical experiences from the myGrid/Taverna (http://www.mygrid.org.uk) and myExperiment projects, I will explore this workflow ecosystem, and in particular the technical and social implications of releasing workflows, and their outcomes, "into the cloud".


Conference Co-Chairs

Christopher J O Baker Christopher Baker
Institute of InfoComm Research (I2R), Singapore Co-Chairman, Track 2
Tan Tin Wee Tan Tin Wee
National University of Singapore Co-Chairman, Track 1


Conference Speakers Tentative List below (To be finalised)

Robert Stevens
Robert Stevens | Bio
University of Manchester, UK

Using Ontology to Classify Members of a Protein Family | PPT/PDF |
In this talk, I will describe recent work on using ontologies to help classify members of the protein phosphatases in a genome. Classification of proteins expressed by an organism is an important step in understanding the molecular biology of that organism. Traditionally, this classification has been done by human experts and it is regarded as the gold standard method. Human knowledge can recognise the properties that are sufficient to place an individual gene product into a particular protein family group. Automation of this task usually fails to meet this gold standard because of the difficult recognition stage. The need to automate the classification process by making human knowledge accessible in computational form is motivated by the growing number of genomes, the rapid changes in knowledge and the central role of classification in the annotation process. We capture human understanding of how to recognise members of the protein phosphatase family by domain architecture as an ontology. By describing protein instances in terms of the domains they contain, it is possible to use description logic reasoners and our ontology to assign those proteins to a protein family class. We have tested our system on classifying the protein phosphatases of the human and Aspergillus fumigatus genomes and found that our knowledge-based, automatic classification matches that of the human curators and for these two species we have also found putative new phosphatase proteins. We have made the classification process fast and reproducible and, where appropriate knowledge is available, the method can potentially be generalised for use with any protein family.

David De Roure
Professor David C De Roure
School of Electronics and Computer Science
University of Southampton, UK

The Story of the Semantic Grid | PPT/PDF |
At its outset in 2000, the UK e-Science programme presented a vision of new scientific outcomes enabled by an infrastructure which would provide a high degree of easy-to-use and seamless automation, with flexible collaborations and computations on a global scale. At that time, there were a number of grid applications being developed and technologies that provided fragments of the necessary functionality. A group of researchers observed the gap between these endeavours and the richness of the e-Science vision, and suggested that Semantic Web technologies would help fill the gap. Thus was born the Semantic Grid, an initiative promoting Semantic Web for e-Science, in which information and services are each given well-defined meaning, better enabling computers and people to work in cooperation. Seven years of research and development has seen the maturing of these technologies, the emergence of best practice and new visions of the Grid in the context of the evolving Web.

Yike Guo
Yike Guo
Professor, Department of Computing, Imperial College, UK and CEO, Inforsense Ltd | PPT/PDF |

Making Workflows Work

Ingeniousness does not equal increased productivity, and history of software is littered with corpses of brilliant ideas that failed to fulfil on their promises and change the face of computing. Today, workflows seem a perfect fit for application construction and delivery in the era of heterogeneous, service-based tools, where every aspect of software, from algorithm to processing cycles is treated as a commodity in its own right. But what are the factors that will ultimately determine the success or failure of workflow technologies? Do they lie along the common lines of user acceptance and price levels, or are there more fundamental issues at stake, such as the capability to continually adjust to the changing face of the software industry and manage service coordination between multiple providers? The talk will address these questions and attempt to foresee the role of workflows in the future field developments.

Tom Oinn
Tom Oinn
European Bioinformatics Institute, UK | PPT / PDF ready |

When reality attacks! Four years of building workflow middleware for real scientists.
Over the four years since the first release of the Taverna workflow workbench we have been working intensively with a community of bioinformaticians and biologists. In this time both we as computer scientists and providers of middleware and our user communities as guides and consumers have learnt a great deal about the properties required for grid and other technologies in order for them to be truly useful. In this talk I will present the design of Taverna 1, the changing requirements and corresponding architectural evolution of that system and discuss how this evolutionary process is now informing a complete redesign of the system as Taverna 2. I will show in detail how the new workflow architecture can support scenarios such as fine grained transient virtual organization management, data streaming, collaborative workflow authoring and invocation and semantic introspection over data and process.

Akihiko Konagaya
Akihiko Konagaya
RIKEN GSC, Japan | PPT ready |

Automatic Generation of Drug Metabolic Pathway from ADME Ontology on OWL-DL
In order to predict individual differences in drug response and molecular interaction events, the role of in silico prediction of drug interaction events at a pathway level becomes more and more important. We developed a Dynamic Pathway Assertion System, which uses Web Ontology Language (OWL) instances for atoms of interaction based on Drug Interaction Ontology (DIO). The system dynamically generates pathways as a result of triggered molecular interaction and asserts these pathways into the ontology as OWL instances. The generated pathways can be used as the seeds for quantitative simulation for compartment model and momentum analysis in pharmacokinetics. We tested the system using known drug interactions between irinotecan (CPT-11) and ketoconazole.

Lim Teck Sin
Lim Teck Sin
KOOPrime Pte Ltd, Singapore

Integrative workflows for Bio and Medical Research -
KOOPlatform: BioWorldWideWorkflow Integration, Bio-eManufacturing and BioSurveillance
| PPT ready (large 12Mbytes) | PDF ready (3Mybtes) |
Infectious diseases and epidemics such as influenza and avian bird flu plague many parts of the world. An unpredictable pandemic could potentially kill millions. Such impact cannot be felt until it is too late, such as during the outbreak of Severe Acute Respiratory Syndrome (SARS) which has shaken the entire healthcare community. BIO-IMSS Integrated Pipeline Against Infectious Diseases has been developed by KOOPrime in conjunction with collaborators in the National University of Singapore and Nanyang Technological University. It aims to set up a Biological Integrated Manufacturing and Services System (BIO-IMSS) conceived as a first response to epidemics like SARS. It can also be used for BioSurveillance and monitoring of environmental samples. This system is based on KOOP, the Knowledge Object Oriented software designed for bioinformatics workflow integration, arguably the earliest integrated workflow system, available since 1998. It was commercialised by the NUS spinoff company KOOPrime Pte Ltd. Entirely coded in Java, this workflow integration GUI, central master server engine and slave server software package has now been interoperable with other efforts in workflow integration, e.g. Taverna-MyGrid and Goalnet.

Bertil Schmidt
Bertil Schmidt
University of New South Wales UNSWAsia | PDF ready |

Quascade-MP2 Workflows
The exponential growth rate of biological databases has established the need for high performance computing (HPC) in bioinformatics. Typically, an HPC setup operates on a clustered computing environment consisting of multiple computers that communicate over fast switches. For popular database scanning applications such as Blast and HMMER the benefits of clusters are immediate and linear speedups can be easily achieved. However, the evolving challenges in life sciences research cannot be all addressed by off-the-shelf bioinformatics applications. Life scientists need to analyse their data using novel approaches that might be published in recent journals or based on their own hypotheses and assumptions. Quascade-MP2 has been developed to address this need. It is a visual prototyping tool created especially for data-driven, high performance scientific applications. It is a complete development platform for data-driven tools, at the same time offering an easy-to-use and intuitive interface.
In this talk a workflow for the phylogenetic analysis of influenza viruses is presented using Quascade-MP2. It is highlighted how the packages ClustalW and PHYLIP were integrated in the biologist-friendly workflow system, which is Grid-enabled and High Performance Computing (HPC) compatible. As a proof of concept proteomic data of Neuraminidase is used to identify several clades that are clearly geographical in distribution. Different techniques such as the character-based Maximum Parsimony and the Maximum Likelihood algorithm as well as distance based solutions like UPGMA and Neighbor Joining have been integrated in the workflow system to simplify phylogenetic analysis.

Lane Shen
Lane Shen
Nanyang Technological University, Singapore | PPT ready |

Goalnet: Intelligence in Workflow orchestration
The Goal-Orientation is one of the key features in agent systems. Goal Net, a system developed in the Nanyang Technological University proposes a new methodology for multi-agent system development. The methodology covers the whole life cycle of the agent system development, from requirement analysis, architecture design, detailed design to implementation. A Multi-Agent Development Environment (MADE) that facilitates the design and implementation of agent systems is presented. Goal Net has now successfully integrated KOOPlatform and Taverna/MyGrid with an OWL ontology-based approach for Workflow agent interoperability.

Arun Krishnan
Arun Krishnan
Institute for Advanced Biosciences, Keio University, Japan | PPT ready | Audio ppt (warning: large 119Mbytes) |

Wildfire/GEL: an integrated solution for building and executing workflows
We observe two trends in bioinformatics: (i) analyses are increasing in complexity, often requiring several applications to be run as a workflow; and (ii) multiple cpu clusters and Grids are available to more scientists.
The traditional solution to the problem of running workflows across multiple cpus required programming, often in a scripting language such as PERL. The need for programming places such solutions beyond the reach of many bioinformatics consumers. We present Wildfire, a graphical user interface for constructing and running workflows. It provides an intuitive interface based on a drawing analogy and, like Jemboss, presents program options using graphical user interface elements; thus Wildfire hides the precise syntax of scripting languages and command-line options from the user. However, unlike Jemboss, which can only run one application at a time, Wildfire allows the user to compose several applications into a workflow which we illustrate by presenting some examples. In contrast to Taverna and ICENI, it works directly with program executables, rather than Web- or Grid-Services. For execution, it uses GEL (Grid Execution Language) which can run the workflow over the compute nodes of a cluster, similar to Biopipe. However, GEL can also run executables directly, or on the Grid. Thus, Wildfire and GEL bring supercomputing power to the bioinformatician. The talk will focus on the design considerations that prompted us to develop Wildfire/GEL and the challenges that are still to be met for the effective distribution of workflows on the grid.

Olivo Miotto
Olivo Miotto
Institute of Systems Science, Singapore | PDF ready! |

Semantic Web Technologies for Biological Knowledge Aggregation
The flexible representation model and powerful reasoning capabilities of Semantic Web (SW) technologies offer great promise for the integration of heterogeneous biological data. Applying these technologies to current aggregation tasks provides useful insights on issues affecting the adoption of the SW platform. In a large-scale study of the Influenza A virus proteome, we analyzed over 40,000 annotated sequences, retrieved from public databases and encoded in RDF, using a problem-specific OWL ontology. The simple structure of RDF meant that data warehouses were not needed, while simple rules and an off-the-shelf reasoner were powerful tools for restructuring and cleaning our datasets. Such large-scale tasks can thus benefit from SW technologies today, and drive "grassroot" growth of the Life Sciences Semantic Web. Although public databases are plagued with data quality issues, SW technologies can be combined with other knowledge aggregation methods, such as structural rules, to ease these problems. Packaging these approaches into intuitive end-user tools is a major research challenge, and SW capabilities can enhance current sequence analysis software. The integration of such tools with computing grids is also desirable, as reasoning tasks can outgrow current desktops capabilities.

Miao Chunyan
Miao Chunyan
Nanyang Technological University, Singapore | PPT ready |

GSAF: A Grid-based Services Transfer Framework: Bio Services Transfer in a Grid Environment
In this talk, we present a new framework, B-GST (Bio-Grid Services Transfer) Framework. The core idea is to migrate and execute bio services dynamically to break the tight coupling between the bio services and the computers. In B-GST, resources are categorized into software resources, hardware resources and data resources, and are managed in corresponding repositories. The dynamic binding of different kinds of resources provides a flexible pattern to execute the bio services in a grid environment.

Ross D King
Ross D King
University of Wales, Aberystwyth, UK | PPT ready |

The Expo Ontology: Describing scientific experiments
The formal description of experiments for efficient analysis, annotation, and sharing of results is a fundamental part of the practice of science. Ontologies are required to achieve this objective. A few subject-specific ontologies of experiments currently exist. However, despite the unity of science, no general ontology of experiments exists. We have proposed the ontology EXPO to meet this need. EXPO links a specified upper ontology (such as SUMO) with subject-specific ontologies of experiments by formalising the generic concepts of experimental design, methodology, and results representation. EXPO is expressed in the W3C standard ontology language OWL. We demonstrate the utility of EXPO, and its ability to describe different experimental domains, by applying it to experiments in high-energy physics, phylogenetics, and functional genetics. The use of EXPO made the goals and structure of these experiments more explicit, revealed ambiguities, and highlighted an unexpected similarity. We conclude that EXPO is of general value in describing experiments and a step towards the formalisation of science.

Simon See
Simon See and Melvin Koh
Sun Microsystems Asia Pacific Science and Technology Centre, Singapore | PPT/PDF |

Grid Workflow Composition with Directed Graphs
A Grid Workflow is critical to grid computing for its ability of creating complex grid computation by connecting different grid jobs logically. Users can easily define and reuse the workflow for their applications that are loosely coupled. In this talk, we present our research effort on designing a non-DAG workflow specification model for workflow composition. Our model allows a user to compose a workflow using directed graphs, thereby allowing modeling of sequence, parallel, choice and iteration patterns in the workflow. We have also provided for structural verification of workflows using Petri net-based analysis techniques to detect errors like deadlock and lack of synchronization. We incorporated the model into a Grid Workflow Management System using Sun Grid Engine as the resource manager.

Mark Schreiber
Mark Schreiber
Head, Bioinformatics,
Novartis Institute for Tropical Diseases (NITD), Singapore | PPT ready |

Bioinformatics workflow management: Thoughts and case studies from industry.
The purpose of bioinformatics in industry is to integrate and mine data and use that data to produce models that can be used in decision support and hypothesis generation. The productivity of this endeavour is limited by the vast heterogeneity of data and tools available to a researcher in industry. Workflow design and management tools are playing an important role in helping increase productivity by simplifying the process of bringing data and tools together in a sensible way. I will present some examples of various types of workflow approaches that have proven useful and discuss how the future development of workflow tools may improve the situation further.

Richard Kamuzinzi
Richard Kamuzinzi
Computer Engineer, Université Libre de Bruxelles (ULB) | PPT ready |

IXodus, a knowledge discovery process based on the SIMDAT-Pharma GRID technologies
The IXodus workflow has been designed and implemented to deliver an /in silico/ discovery process based upon the SIMDAT-Pharma Grid which is an industry-oriented environment integrating hundreds of Grid enabled biological data services and analysis services. The workflow is designed by combining three major SIMDAT-Pharma components: the workflow tools, the semantics enabled service discovery framework and an industry-oriented GRID infrastructure, namely GRIA (Grid resources for industrial applications). By leveraging the semantics enabled service discovery framework, the IXodus case study is authored using "abstract" workflow tasks that are (semi-) automatically mapped to concrete services. Although the current implementation of the IXodus workflow shows numerous benefits provided by workflow platforms in terms of science automation, further developments are still required to address challenging requirements driven both by the GRID environment and the need to manage the manual interventions during the workflow executions. Actually, the GRID environment imposes to deal with redundancy of services and failover strategies where manual interventions require specifying, at the workflow design phase, how the process also coordinates people who collaborate to solve a complex problem.

Robert Gill
Robert Gill
Head of Biology Domain Architecture, MDR-IT, GlaxoSmithKline | PPT ready |

Architecting the Virtual Organisation
Industry is moving from a historically closed, centralised environment to a far more open description of the workforce. In this new environment currently co-located disciplines may be distributed globally rather than managed by a single organisation. These forces for change have also caused industry to take a far broader view of its processes aiming for a flexible, agile, virtual and secure organisational structure to make the most of a rapidly developing world market.
These changes to the business need to be interpreted and supported by the IT and Informatics infrastructures delivering the same flexibility in its applications and analytical components as is being asked of its business. No longer is it possible to handle integration as an afterthought or handled in an ad-hoc manner. To support the needs of virtualisation, a high level strategy is required to map out the business into functional units understanding the interfaces and process flows. These changes have become a major driver for IT and high priority goals for architecture. Concepts such as Service Orientated Architecture, Semantic Integration, Workflow, The Grid and other technologies all show promise in the support of this virtual goal but now is the time of reckoning. Can these be scaled to support industrial application?

Yasumasa Shigemoto
Hideaki Sugawara

Yasumasa SHIGEMOTO, Yoshikazu KUWANA
Center for Information Biology and DDBJ,
National Institute of Genetics, Japan
Hideaki Sugawara
Director, Center for Information Biology and DDBJ
National Institute of Genetics, Japan | PPT/PDF |

Application of Web services to workflow navigation in bioinformatics for non-programming biologists

Web services form the backbone of online resources that underpin the laboratory biologist's informatics and computational needs Since 2003, we (Sugawara and Miyazaki, 2003) have demonstrated that bioWeb services using SOAP for inter-web-process communications contributes greatly to the interoperability of diverse biological information resources currently available on the Internet. For example, we have successfully applied this to in-house projects including The Gene Trek in the Prokaryote Space (Kosuge et al. 2006) that aimed at prediction and evaluation of ORFs in microbial genomes by use of multiple databases and analytical tools. More importantly, this frequently used service has been supplemented with workflows to further enhance the productivity of the end user.
We now report a newly developed prototype for the automated creation of Web pages with workflow navigation functions that call our Web services. In this way, Web services that were previously accessible only to skilled programmers can now be easily usable by the average biologist user in bioinformatics. We will introduce a simple procedure for the creation of the Web pages and service ontology used in a typical workflow session based on our system.
Web services and workflows: http://www.xml.nig.ac.jp/
Web pages with navigation: http://cyclamen.ddbj.nig.ac.jp/newsoap6/ (in Japanese)

Fenglian Xu
Fenglian Xu
WebSphere ESB Development, IBM UK | PPT ready |

Workflow with IBM's WebSphere Process Server
The IBM WebSphere Process Server provides Human Task Management and Process Choreography, and includes support for a variety of service interaction mechanisms. It includes the WebSphere Enterprise Service bus, which provides functions aimed at making it easier to integrate existing services with new functions. This talk will provide an overview of the WebSphere Process Server capabilities, and discuss how integration can be achieved.

Periasamy Guhan
Periasamy Guhan
Professional Services Director, TIBCO Software Inc | PPT/PDF |

Evolution of Workflow for modern predictive industry
TIBCO iProcess Suite delivers something that we call Business Process Management plus(BPM+). BPM+ refers to going beyond the traditional boundaries of BPM to be able to handle any type of process, and the entire process lifecycle. The fact is that managing business workflows involves many tools and potential technology solutions. This topic would cover how the evolution of workflow systems that help design, execute and manage business processes end to end, from a business event to the ultimate business result for any type of processes. In the same way the scientific enterprise itself can be considered a business process which is amenable to the BPM+ system.


Tutorial Trainers

Tom Oinn
Tom Oinn
European Bioinformatics Institute, UK
| Intro PPT ready| PPT / PDF ready | BioSlax-Taverna1.5.1.8 LiveCD | Floppy Boot exe | CD Boot iso | Taverna1.5.2 |

Getting Started with Taverna
Taverna provides workflow construction and invocation capabilities suited to end users, particularly bioinformaticians and similar knowledge workers in other scientific disciplines.

This tutorial will cover the use of Taverna, providing an overview of its features from a novice's perspective assuming no prior knowledge of workflow technology. While examples used in the tutorial are from bioinformatics the tutorial does not require any in depth knowledge in this domain.

By the end of the tutorial participant will know how to install the workbench software, import and run existing workflows and build their own from components available on the public internet. He or she will have learned how the semantic search technologies in myGrid assist this process by enabling service discovery, how to do basic troubleshooting of workflows using Taverna's fault tolerance and debug mechanisms and how to manage the import and export of data to and from the workflow system.

Robert Stevens
Robert Stevens
University of Manchester, UK

Modelling Biology With the Web Ontology Language | PPT ready |
AminoAcidDefinedClassesFinalVersion WMV
AminoAcidDefinedClassificationNoRed WMV
AminoAcidsDefinedClassification WMV
Phosphoprotein Mystery WMV
WMV large files >5Mbbytes
Much has been written of the facilities for ontology building and reasoning offered for ontologies expressed in the Web Ontology Language (OWL). Less has been written about how the modelling requirements of different areas of interest are met by OWL-DL's underlying model of the world. Just as small an amount has been written about how to best exploit what is possible to say in OWL. In this tutorial I will use the disciplines of biology and bioinformatics to reveal the requirements of a community that both needs and uses ontologies. I will use a case study of building an ontology of protein phosphatases to show how OWL-DL's model can capture a large proportion of the community's needs. I will demonstrate how ontology design patterns can extend inherent limitations of this model. I will give examples of non-binary relationships, lists and exceptions, and I will conclude by illustrating what OWL-DL, the proposed OWL 1.1 extensions and its underlying description logic either cannot handle in theory or because of lack of implementation. Finally, I will present an ontology building methodology called normalisation that not only follows some perceived best practice, but also allows OWL's rich potential to be exploited with relative ease.


First Created: 10 Feb 2007 Tan Tin Wee
Previous updates: Last Updated: 22, 18, 10 May; 28; 20 Apr; 2 Apr; 10 Feb 2007