Rancho Technology

The Rancho Data Crawler

The Rancho Data Crawler is a web-application that given a complex search string, crawls study-level metadata in PubMed, ClinicalTrials.gov, GEO, SRA, EGA, and ArrayExpress and outputs results to an XLSX (Excel) file within minutes. This application is meant to accelerate the data crawling process by putting our proprietary Rancho data crawler scripts at the fingertips of any project team that needs them, no scripting needed.

fuzzy tool

The Fuzzy Logic Tool

Supports rapid, practical data harmonization based on phonetic alignment. When building this term-mapping solution, we found that existing methods are either too restrictive (statistical methods) or too permissive (Soundex method).

Our Fuzzy Tool algorithm has been implemented as an API in addition to an Excel plugin. The implementation provides a two-pronged approach to accelerate the mapping process: first, identifying the best ontologies to map the data and subsequently leveraging an indexed ontology.

QC Tool logo green

The Rancho QC Tool

Automatic Quality Control (QC) Tool:

To enhance QC efficiency and ensure consistent QC output standards, Rancho has developed a tool to support and facilitate tabular data QC and streamline quality control measures for data model alignment. This QC tool will enable users to perform data transformations, generate table summaries, and build logical tests using an RShiny interface. Further details on the tool's functions and features, as well as instructions for use, are documented in Rancho KB.

Benefits of Rancho’s QC Tool:

  • Clear and user-friendly interface (without coding requirements)

  • Faster quality assurance and issue resolution

  • Less effort to achieve compliance

  • Consistent standards for quality control documentation

  • High-quality data model and tabular deliverables

  • Streamlined and repeatable

Rancho Accelerators

Rancho BioSciences’ Accelerators are powerful tools designed to significantly reduce the amount of time and resources spent on a project. These automated tools can be used to replace and/or improve existing protocols, and are also customizable to meet your project’s specific needs. Our Accelerators include:


Rancho BioSciences’ Terminology-Mapping Solution takes a two-pronged approach to ontologies: select the best ontologies to map to, then pre-index standard ontology terms for efficient mapping. TMS’s capabilities include:

  • Ontology store: a graph database allowing users to ingest standard and custom ontologies and perform basic ontology operations, such as text annotation, common ancestor, and level alignment.
  • The Ontology Mapping Suggester Tool suggests the best ontologies to use based on users’ existing text or term list.
  • Our ETL (Extract, Transform, and Load) scripts convert standard ontology formats (e.g., OWL, OBO, TTL) into CSV format for quick indexing.
  • Fuzzy term-mapping: Several endpoints allow users to map “dirty” terms to ontologies, score results, and then feed them to upstream applications and bulk mapping scripts.
  • A UI Front-end Application allows users to annotate terms directly from their browser, interfacing with TMS while enabling standard spreadsheet functions (for example: column mapping; range mapping; etc.).
TMS diagram
CAT diagram


Our Categorization Tool is a QC Accelerator that enables data scientists to complete projects with a higher degree of efficiency and accuracy. CAT can drastically reduce time spent on QC, and virtually eliminates human error. CAT’s capabilities include:

  • Suggesting data organization structures
  • Can be adjusted according to distances and sensitivity, with options for targeted or blind clustering
  • Categorize data for possible subdomains and outliers
  • Uses OpenAI embeddings to calculate distances between terms
  • Uses DBSCAN (density-based clustering) algorithm for clustering real-time
  • Is dependent on epsilon (distance) that can be interactively entered
  • Uses OpenAI completion endpoint to suggest category names

Data Crawler:

Rancho’s Data Crawler lets users crawl, manage and enrich data and metadata from PubMed, ClinicalTrials.gov, and other commonly used public resources. “Light Crawler” and “Deep Crawler” modes are customizable to your project’s needs to provide an efficient yet comprehensive search experience. Data Crawler’s capabilities include:

  • Access to millions of research papers, specimens, and datasets
  • Annotation Service – automatically annotates keywords in free-text fields from both metadata and sample levels against public ontologies
  • Customizable to extract data from various websites; can be used on static and dynamic web pages.
  • Can extract data in different formats, including JSON, CSV, and Excel, for easy integration into various analytical tools and platforms.

Light Crawler lets users extract valuable information from their target publications. The Light Crawler can be used for:

  • Publication identification, coverage, and metadata
  • Data parsing and cleaning
  • Easy export
  • Scalability

Deep Crawler is designed to gather sample-level metadata from publications. The Deep Crawler can be used for:

  • Searching for a wide variety of metadata, including (but not limited to) sample identification numbers, sample dates, sample processing methods, and experimental conditions
  • Comparing results across different studies and datasets

The Rancho Knowledge Base

Is an internal portal with a searchable interface for a curated inventory of existing resources. This resource provides Rancho scientists the ability to discover previous project results, code, or datasets delivered as part of previous projects. We built this catalog to support project acceleration from over 1,500 successfully delivered projects.

Create efficiencies and quicker turnaround time for clients.
Ensure best practices and deliverable quality.
Consistency and professionalism.

Our Data Technology and Tool Kits

The Rancho Tool Kit is a collection of processing and analysis code that accelerates analysis, streamlines the QC process, and encourages the development of professional, consistent, and accessible scientific products.

Rancho has been developing content and tools since 2012. Over the years, we generated a lot of in-house technology that we bring to projects and produce high-quality, efficient results for our clients.

RanchoTK Analysis Module

  • Best-Practice Method

  • Robust Testing and QC

  • Included Documentation

RanchoTK Plotting Module

  • Consistent Styling

  • Standardized Inputs

  • Efficiently Create Complex Visualizations


SMART Converter

Is a platform to collect data from clinical data sites—simultaneously checks for updates, downloads, and converts data to a format that is FAIR. Minimizes human time spent on data updates and facilitates rapid data ingestion into a platform of choice.

samrt converter