Sponsored Links | |
Find high paying job. It's quick! It's Free!! | Earn some quick money by spending just 5 minutes!! |
The Center of Inherited Disease Research at Johns Hopkins University turned to Caringo Inc.'s CAStor content-addressed storage (CAS) software to provide data archiving and to manage its sensitive and rapidly expanding genotyping data. The Center of Inherited Disease Research (CIDR) provides genotyping and statistical genetics services for investigators looking to identify genes that contribute to human disease. It is supported by 13 National Institutes of Health (NIH).
CIDR's work is, to put it mildly, a data hog. As part of its research, CIDR scans up to 12 DNA samples on one slide, according to Lee Watkins, Jr., the center's director of Bioinformatics. One sample can produce files ranging from 2 GB to 4 GB. CIDR uses CAStor to archive the data and delete it from the Windows file share. With data from tens of thousands of DNA samples in its system, the archive builds up fast.
Watkins said the Baltimore, MD-based CIDR often generates terabytes of data a week, sometimes hitting a terabyte in one day. The center used high-capacity Capricorn Technologies PetaBox systems to store the data, but last summer the 50-person research team realized they needed help managing it all. "We knew we needed to have an archiving strategy," Watkins said. "Keeping up with all the data became unmanageable. People wanted to recover files by project, keeping track of which files go with each slide scanned. It was getting complicated." The biggest problem was finding technology that wouldn't deplete the budget. "We're well-funded, but we can't go out and buy a system from EMC or Hitachi to do this," Watkins said. "We said, 'there has to be somebody who has written software that can keep track of this.'"
CIDR became aware of Caringo through Capricorn. Caringo gave CIDR a free trial period to test CAStor. Watkins said CAStor passed the test and CIDR became a paying customer last November. It started with a 30 TB CAStor cluster, is now up to a 99.9 TB cluster with 80 TB used, and still growing. To keep up with its data growth, CIDR is installing a high-density Rackable Systems array for more capacity and will install CAStor clusters on that as well. The Rackable set up is scheduled to go live in August. At first, CIDR found CAStor had trouble keeping up with the data it threw at the clusters. "It wasn't 100% robust," Watkins said. "There were cases where a disk wouldn't fail but it would stop performing and act weird, give us little hiccups now and then. They wrote a fix a few months ago, and we haven't had that problem."
Caringo marketing vice president Derek Gascon said "they wanted to have disk capacity freed up much quicker, so we put together a new version for them that includes a faster turnaround in releasing disk capacity." Gascon said that fix is now included in the general release of the product. Watkins said CAStor has also helped provide disaster recovery, surviving various mechanical failures, and even a flood in the lab where the clusters were temporarily installed while CIDR was expanding its server. "We've had random disk failures, and power failures where all the nodes went down and we had to power it back up," he said. "We never had a problem with that, which is amazing to me."
No comments:
Post a Comment