The New York Times, December 12, 2011, by Steve Lohr  —   When I.B.M.’s Watson computer beat two human “Jeopardy!” champions earlier this year, it was a triumphant demonstration of the company’s technology. It was great for Big Blue’s image, but it was not a moneymaker on its own.

Yet that process is under way. Watson was a bundle of advanced technologies, including speech recognition, machine learning, natural-language processing, data mining and ultrafast in-memory computer hardware. They have been under development at I.B.M. for years, and were pulled into Watson.

The ingredients that went into the Watson arsenal are steadily finding their way into I.B.M. products. For example, WellPoint, the big health insurer, is trying out a system that uses Watson-style software to reduce redundant medical tests.

The latest entry is being announced on Thursday, I.B.M.’s Strategic Intellectual Property Insight Platform. Clearly, the Watson branding team did not work on this name.

But then again, this is not for television, where Watson performed, it is for major corporate customers seeking competitive advantage. The technology, sold as a cloud-based service, is the result of several years of joint development between IBM Research and four companies — AstraZeneca, Bristol-Myers Squibb, DuPont and Pfizer.

The insight platform uses data mining, natural-language processing and analytics to pore through millions of patent filings and biomedical journals to look for chemical compounds used in drug discovery. It searches for the names of compounds, related words, drawings of the compounds, the names of companies working with specific chemicals and molecules, and the names of scientists who created the patented inventions. It does its work quickly, retrieving information on patents in as little as 24 hours after a filing.

“It provides a landscape that shows who is working with what chemicals and drugs,” said Chris Moore, head of business analytics and optimization in I.B.M.’s global services unit.

The technology, Mr. Moore said, can be applied to everything from product strategy to recruiting to patent enforcement.

As a byproduct of its research, I.B.M. is also adding to a vast, searchable chemical database housed by the National Institutes of Health.

The company is contributing more than 2.4 million chemical compounds extracted from 4.7 million patents and 11 million biomedical journal extracts from 1976 to 2000.

The information was all published, but often in costly scientific journals or buried in the mountains of patent filings. It was so difficult to access that it was, for all practical purposes, inaccessible.

“It’s a nice contribution to the field of open chemistry — and that’s a growing trend, inspired by and similar to open source software,” said Marc C. Nicklaus, head of the computer-aided drug discovery group at the National Cancer Institute, which is part of the National Institutes of Health.

I.B.M.’s data contribution, it seems, is both generous and calibrated. The chemical compound data from patents goes from 1976 to 2000. So most of the data will be on patents that have already expired, useful for scientific research but far less useful commercially. The latter, no doubt, will be of greatest interest to I.B.M.’s paying clients.




IBM Contributes Data to the National Institutes of Health to Speed Drug Discovery and Cancer Research Innovation, New York, New York – 12 Dec 2011: IBM (NYSE: IBM) today announced it is contributing a massive database of chemical data extracted from millions of patents and scientific literature to the National Institutes of Health. This contribution will allow researchers to more easily visualize important relationships among chemical compounds to aid in drug discovery and support advanced cancer research.

In collaboration with AstraZeneca, Bristol-Myers Squibb, DuPont and Pfizer, IBM is providing a database of more than 2.4 million chemical compounds extracted from about 4.7 million patents and 11 million biomedical journal abstracts from 1976 to 2000. The announcement was made at an IBM forum on U.S. economic competitiveness in the 21st century, exploring how private sector innovations and investment can be more easily shared in the public domain.

The publicly available chemical data can be used by researchers worldwide to gain new insights and enable new areas of research. It will also help researchers save time by more efficiently finding information buried in millions of pages of patent documents. Access to this data will also allow researchers to analyze far larger sets of documents than the traditional manual process, adding a whole new dimension to the ability to search intellectual property.

The data was extracted using the IBM business analytics and optimization strategic IP insight platform (SIIP), a combination of data and analytics delivered via the IBM SmartCloud, and developed by IBM Research in collaboration with several major life sciences organizations. This new cloud-driven method for curating and analyzing massive amounts of patents, scientific content and molecular data. It uses techniques such as automated image analysis and enhanced optical recognition of chemical images and symbols to extract information from patents and literature upon publication. This is a task that otherwise takes weeks and months to complete manually, but can be done rapidly using this new technology.

“Information overload continues to be a challenge in drug discovery and other areas of scientific research,” said Steve Heller, project director for the InChI Trust, a non-profit which supports the InChI international standard to represent chemical structures. “Rich data and content is often buried in patents, drawings, figures and scholarly articles. This contribution by IBM and its collaborators will make it easier for researchers to use this data, link to other data using the InChI structure representation and derive new insight.”

Over the past six years, several major life sciences organizations have worked on this project with IBM Research gaining access to a comprehensive chemical library extracted from worldwide patents and scientific abstracts. Public structure extraction tools developed by researchers at the National Institutes of Health were also used successfully in this project.

“The scientific community will receive enormous benefit from this advancement,” said Heller. “This is an important addition to the open chemistry data sets. The comprehensiveness of the data and the new ways researchers can look at these data and cross-link to other data associated with each chemical is expected to help with drug development to fight many forms of cancers and other human diseases, as well as the development of other chemical compounds.

The data will be contributed to the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), and the Computer-Aided Drug Design (CADD) Group of the National Cancer Institute (NCI) at the National Institutes of Health. It will be incorporated in the NCBI’s PubChem, a public resource for the scientific community that serves as an aggregator for scientific results as well as in NCI CADD Group services such as the Chemical Structure Lookup Service and the Chemical Identifier Resolver.

The National Institutes of Health will make the content available on PubChem at

Read more: IBM Contributes Data to the National Institutes of Health to Speed Drug Discovery and Cancer Research Innovation – FierceBiotech


Back in April 2011, FierceBioTech wrote:



How can IBM’s Watson aid pharma researchers?


Carol Kaelson/Courtesy of Jeopardy Productions Inc., via Associated PressThe technology behind I.B.M.’s Watson computer, most known for beating two human “Jeopardy!” champions earlier this year, is steadily finding its way into other I.B.M. products.


FierceBiotech IT updates senior biotech, pharma, and IT leaders on how IT advances are shaping clinical trials and clinical research. Get your weekly briefing on clinical trial design and management systems, adaptive trials, eClinical trials, and more. Sign up today!

You might know about how IBM’s Watson supercomputer bested human contestants in the game show Jeopardy! in February. But there’s also been some thought about what the supercomputer could do to aid humans faced with sorting through vast amounts of biomedical information to make decisions. In a recent interview with the San Francisco Chronicle, IBM’s ($IBM) Dr. David Ferrucci talked about hypothetical uses of Watson in a clinical setting.

“I think what’s compelling about the medical use case is that, first of all, there’s a huge amount of information out there,” Ferrucci told the Chronicle. “It often doesn’t get considered and some of these diagnostic pieces can be very involved and very complicated but, more over, you want this evidence trail. You want to know–what did I consider? Why did I consider it? Where’s the evidence for that?”

Taking a step back from this interview, it’s not hard to imagine how Watson could potentially aid drug researchers who are now faced with a dizzying amount of data in their jobs. The bioinformatics groups at major pharmaceutical companies are working on multiple fronts to help their researchers make effective decisions based on all the information available to them. And Big Pharma also has deep pockets to pay for supercomputers.

At the IBM’s T.J. Watson Research Center in Hawthorne, NY, where the Watson supercomputer is being developed, the firm’s researchers are already working in computational biology and other areas that hold promise in drug discovery, according to IBM. Also, Swiss healthcare giant Roche last year inked a research deal with IBM to enlist the support of the tech giant in developing cheaper and faster gene sequencing technology. So we know that Big Blue is no stranger to Big Pharma.

We’ll see whether IBM’s Watson grows up to be a major force in the biomedical research world. That would be a great encore to the supercomputer’s stellar performance on Jeopardy!.




Scripps begins world’s largest computer-based project against malaria
November/December 2011  —  Malaria may have lost big when IBM’s ($IBM) Watson supercomputer won Jeopardy! in February. Scientists from the Scripps Research Institute in La Jolla, CA have garnered a portion of Watson’s winnings from the game show to mount the largest-ever computational project to combat drug-resistant malaria. Watson’s providing the money, but the scientists are using a supercomputer of another sort for the major undertaking.

The researchers plan to tap the World Community Grid, which consists of some 2 million PCs from more than half a million volunteers who have made their computers available for computing jobs. With computing capacity from the volunteers’ machines, the Scripps scientists plan to crunch data on millions of potential compounds to home in on proteins that the deadliest malaria-causing parasite needs to survive, according to the group’s post on IBM’s blog. The goal is to discover new drugs that can treat people with malaria who didn’t get vaccinated or whose vaccination wore off after a while.

The project is taking aim at the most deadly form of malaria, which is caused by the parasite called Plasmodium falciparum. While many cases of malaria are curable, certain forms of the infectious disease have built up resistance to the drugs. According to the World Health Organization, there were 781,000 deaths and 225 million cases of malaria in 2009. Malaria kills a child in Africa every 45 seconds, the organization said on its website.

Before starting the malaria project, Scripps scientists used the World Community Grid to find two compounds to attack multi-drug resistant HIV, according to the group. Computer-based analysis of compounds and disease-related proteins is nothing new, but recently members of the public have been empowered through organizations such as the World Community Grid and the online video game Foldit to help find answers to difficult scientific questions.



IBM cloud aids fight against superbugs


FierceBiotech IT updates senior biotech, pharma, and IT leaders on how IT advances are shaping clinical trials and clinical research. Get your weekly briefing on clinical trial design and management systems, adaptive trials, eClinical trials, and more. Sign up today!

Big Blue’s cloud has helped Swiss scientists in their hunt for clues about how certain bacteria form resistance against antibiotics and cause disease. It’s perhaps the latest case where the tech giant ($IBM) has been aiding life sciences concerns in managing and analyzing biological or clinical data in the cloud.

For this latest feat, IBM–which has previously made the case that its cloud could help reduce clinical trial costs–worked with Swiss cloud computing start-up CloudBroker and researchers at the prestigious technical university ETH Zurich, according to IBM. With IBM’s cloud and CloudBroker’s queuing and data management software, the Swiss researchers analyzed a huge amount of data within two weeks. Without the technology, the analyses could have taken several months.

Indeed, IBM has made life sciences customers a key market for its cloud computing offerings for years. And as scientists compute massive amounts of data from disease proteins and genomes, clouds have proven to be useful in providing the needed computing capacity in short order. The technology also offers potential cost savings for managing and analyzing clinical data from drug trials. While developers are looking for ways to make their R&D run more efficiently, cloud computing has entered the discussion as one way to achieve this.

The scientists at ETH Zurich (from which CloudBroker spun off in 2008) used IBM’s cloud to find about 250 virulence factors and conjure 2.3 million 3D models to gain a better understanding of disease-causing bacteria. For their study of streptococcus bacteria that cause strep throat, the scientists tapped nearly 250,000 computing hours on 1,000 parallel CPUs with Big Blue’s Smart Cloud Enterprise.

“For our experiments, we need very high capacity in short time frames,” Dr. Lars Malmstrom, ETH Zurich’s lead researcher, said in IBM’s release. “Cloud computing allows to reserve this computing capacity whenever researchers need it, and it is available quickly. Research teams do not need to set it up or maintain it, and thus can concentrate better on their research.”



Bloomberg: more on IBM



IBM Gives Researchers Data on 2.4 Million Chemicals


By Alex Wayne –Dec 12, 2011



(Bloomberg) — Joe Foresi, an analyst at Janney Montgomery Scott LLC, talks about International Business Machines Corp.’s third-quarter earnings report and forecast. IBM, the biggest computer-services company reported third-quarter sales that missed analysts’ estimates on slowing revenue growth at its software, hardware and services businesses. The company raised its full-year earnings forecast by 10 cents per share. Foresi speaks with Lisa Murphy and Adam Johnson on Bloomberg Television’s “Street Smart.” Julie Hyman also speaks. (Source: Bloomberg)

U.S. researchers gained access to a database of 2.4 million chemical formulas and diagrams that International Business Machine Corp. (IBM) culled from 24 years’ worth of patent applications and medical journals.

The catalogue of compounds will be housed at the National Institutes of Health in Bethesda, Maryland. Scientists can use the data to identify new candidates for drug development or new uses for existing drugs.

“The applications are very wide reaching, from life sciences to chemicals, petroleum, to food, to health,” said Chris Moore, a partner and vice president in Armonk, New York- based IBM’s global life sciences division, in a telephone interview.

The NIH has made development of drugs for rare diseases such as sickle cell anemia a priority since Francis Collins became director of the $31 billion research agency in 2009. Collins has asked Congress to create a new institute for drug development and pressed manufacturers to open catalogs of abandoned compounds.

IBM, the world’s biggest computer-services provider, created the database by “Watson-type technology,” Moore said, referring to the supercomputer that beat human competitors on the game show ‘Jeopardy.’ The data was extracted from about 4.7 million U.S., European and United Nations patents and 11 million biomedical journal abstracts from 1976 to 2000.

Researchers can search the database for free. IBM plans to sell a product that will produce more sophisticated analysis of the database that can tell customers who is conducting research on specific therapies, Moore said. He wouldn’t disclose the price.