
How we help you

1. Higher AI performance

2. Shorter R&D cycle

3. Lower R&D cost
1. Higher AI performance
We offer millions of high-quality radiology and pathology scans, with optional annotations, plus clinical and molecular data for AI model training and validation—helping improve accuracy, reliability, and functionality. Our commercial-use datasets, collected from Japanese hospitals/clinics, are particularly ideal for R&D targeting Japan or Asian demographics. Longitudinal data for pharma and life sciences are also available.

Radiological and pathological scans
Radiological scans (X-ray, CT, MRI, mammography, PET, etc.), ultrasound scans, and pathology scans (WSI).

Annotations
Specialist physicians are involved in the creation of annotations, including disease segmentations, bounding boxes, classification labels, diagnostic labels, and findings information.

Clinical information
Age, sex, height, weight, medical history/family history, surgical history, main complaint, major symptoms/progress, nursing observation record, referring department, diagnosis, etc. Molecular diagnostic results may be available, such as ER, PgR, HER2 positive/negative in the case of breast cancer.


2. Shorter R&D cycle
Our commercial-use datasets, ethically sourced, de-identified, and meticulously curated, are ready-to-use for your R&D in medical AI, drug discovery AI, and clinical research.

Ready for secondary use
ethically sourced and prepared for secondary use.

De-identified
Rigorously de-identified to ensure legal compliance.

Carefully curated
Data unsuitable for medical AI, such as cases with complications or excessive noise, are excluded. Our comprehensive standardization of images, clinical data, molecular data, and annotations minimizes additional pre-processing and verification.
3. Lower R&D cost
You do not need to heavily invest in collecting, selecting, annotating, and pre-processing various medical data by yourself. Our datasets are versatile and ready for use across a wide range of R&D applications, such as medical AI, drug discovery AI, and clinical research.
Medical AI
Examples: Image diagnosis, prognosis, and dose distribution creation.
Drug discovery AI
Examples: Drug screening, biomarker discovery, and therapeutic target identification.
Clinical research
Examples: Clinical trial design, drug toxicity/effectiveness evaluation, and histopathological assessment.

Get started now
Looking for de-identified datasets from Japan?
Looking for high-quality annotated medical image datasets?
Looking for radiological/pathological scans and clinical/molecular data, for specific diseases?
FAQ
Radiological scans: DICOM or NIfTI Radiology image annotations: NRRD or NIfTI for segmentation and JSON for localization (e.g., Bounding boxes) Pathological scans: TIFF or DICOM or iSyntax or NDPI Pathology image annotations: GeoJSON Clinical data (including molecular data and radiology/pathology reports): EXCEL
We check the consistency of image quality, imaging conditions, diagnosis names, radiology/pathology findings, clinical data and annotation content. When necessary, we exclude inappropriate cases, such as those with noise or missing data, and standardize the data format. Annotations are performed by specialists such as radiologists, pathologists, orthopedic surgeons, or radiologic technologists with expertise in medical imaging, depending on the disease and modality. Depending on the difficulty and requirements, primary annotation may be performed by technologists and then reviewed by specialist physicians. Annotation types vary by dataset and may include segmentation, bounding boxes, classification labels, diagnosis labels, and findings information.
Yes. We provide datasets after removing or processing information that could identify individuals, such as patient IDs, names, dates of birth, examination dates, and other identifiers contained in clinical data, metadata, and image data. If personal information is embedded in the images, or if the images contain facial features or other information that may lead to individual identification, we perform masking or other appropriate processing as necessary while minimizing the impact on analysis.
Yes. The data are obtained and provided in a form suitable for secondary use, based on appropriate procedures at each medical institution. For each dataset, we confirm the necessary acquisition conditions, such as ethics review, opt-out procedures, informed consent, and internal approval at the medical institution, before providing the data.
Yes. The datasets can be used for R&D purposes, including medical AI, AI-driven drug discovery, medical devices, and clinical research. They can also be used in collaboration with external contractors, for regulatory submissions, academic publications, and marketing materials. We can also provide datasets to companies outside Japan.
Pricing varies based on data volume, rarity, and whether it includes clinical data, molecular data, or annotation. Please contact us for further details.
We issue an invoice for payment via bank transfer through Wise or SMBC Direct.
For custom datasets, you can specify requirements such as disease, modality, body part, number of cases, imaging conditions, manufacturer, slice thickness, clinical data, molecular data, annotation type, and exclusion criteria. In addition to extracting cases from existing datasets, we can also discuss additional annotations or the construction of new datasets based on your requirements.
Yes, we offer samples comprising 2-5 cases to aid in evaluating our scans, DICOM headers, clinical information, and annotation methodology against your specific needs.
For existing datasets, delivery is typically possible within approximately 1–2 weeks. For custom datasets or datasets requiring additional annotation, the delivery timeline depends on the number of cases, disease, modality, annotation requirements, and availability of clinical data. As a general guideline, delivery typically takes approximately 1–2 months. In principle, data are provided as password-protected files through secure cloud storage such as Box.
Yes, we provide medical feedback, customized AI development, and regulatory support for entering the Japanese medical device market. Our team comprises medical AI and MLOps engineers, software developers, diagnostic radiologists, radiation oncologists, pathologists, and dataset managers.
Callisto DataHub is provided by Callisto Inc., which operates a medical imaging data platform for medical AI and clinical research.
Need a custom dataset?
Tell us what you're looking for.
