The resulting model is a fine-tuned version of microsoft/swin-tiny-patch4-window7-224 on the imagefolder dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7695
  • Accuracy: 0.7275

The Microsoft Swin-Tiny Transformer used in this project is a deep learning model architecture designed for computer vision tasks. It is a smaller variant of the Swin Transformer family and is known for its efficiency and high performance in image classification and other vision tasks.


Skin Cancer MNIST: HAM10000

Is a large collection of multi-source dermatoscopic images of pigmented lesions.


Following skin conditions will be trained on our model:

  • Actinic-keratoses: Actinic keratoses, also known as solar keratoses, are precancerous skin lesions caused by sun exposure. They are a potential hazard as they may develop into skin cancer if left untreated.

  • Basal-cell-carcinoma: Basal cell carcinoma is a common type of skin cancer that rarely spreads but can be locally invasive. It is considered a hazard and should be treated promptly.

  • Benign-keratosis-like-lesions: Benign keratosis-like lesions are non-cancerous skin growths. They are typically not considered a hazard and are harmless.

  • Dermatofibroma: Dermatofibroma is a benign skin nodule that often appears as a small, firm bump. It is generally not a hazard and does not pose a significant health risk.

  • Melanocytic-nevi: Melanocytic nevi are common moles or birthmarks on the skin. While they are usually harmless, some may have the potential to become hazardous if they undergo malignant transformation.

  • Melanoma: Melanoma is a highly malignant form of skin cancer that can spread to other parts of the body if not detected early. It is a significant hazard and requires prompt medical attention.

  • Vascular-lesions: Vascular lesions include various skin conditions related to blood vessels, such as hemangiomas and port-wine stains. While they are generally not cancerous, treatment may be required for cosmetic or medical reasons.

Live Demo

The fine-tuned model can be seen in action using the form below. All images are only used for prediction and are not stored. A color system from green to purple indicates the hazardousness of the predicted skin condition.

Make sure the photo contains only the skin mutation.

Dataset Setup

  1. Login to your Kaggle account, scroll down to the API section and click Expire API Token to remove old tokens
  2. Click Create New API Token and download the kaggle.json to your machine

Run the following commands in Google Colab.

! pip install -q kaggle

Upload your kaggle.json file to Google Colab.

from google.colab import files 

Move the file to its correct location and correct file permissions.

! mkdir ~/.kaggle 
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

Download the Skin Cancer MNIST: HAM10000 dataset with photos of skin cancer and the csv file containing the metadata, then unzip it.

! kaggle datasets download -d kmader/skin-cancer-mnist-ham10000

Downloading to /content
100% 5.20G/5.20G [00:43<00:00, 134MB/s]

! unzip
  inflating: ham10000_images_part_2/ISIC_0034312.jpg  
  inflating: ham10000_images_part_2/ISIC_0034313.jpg  
  inflating: ham10000_images_part_2/ISIC_0034314.jpg  
  inflating: ham10000_images_part_2/ISIC_0034315.jpg  
  inflating: ham10000_images_part_2/ISIC_0034316.jpg  
  inflating: ham10000_images_part_2/ISIC_0034317.jpg  
  inflating: ham10000_images_part_2/ISIC_0034318.jpg  
  inflating: ham10000_images_part_2/ISIC_0034319.jpg  
  inflating: ham10000_images_part_2/ISIC_0034320.jpg  
  inflating: hmnist_28_28_L.csv      
  inflating: hmnist_28_28_RGB.csv    
  inflating: hmnist_8_8_L.csv        
  inflating: hmnist_8_8_RGB.csv      

As you can seen in HAM10000_metadata.csv, the image_id variable contains the names of the photos contained in the HAM10000_images_part_1 and HAM10000_images_part_2 dirs, while the dx variable identifies the type of skin tumor.

import pandas as pd

df = pd.read_csv('HAM10000_metadata.csv')

This snippet results in…

lesion_id      image_id     dx    dx_type   age      sex     localization
0      HAM_0000118  ISIC_0027419    bkl      histo  80.0     male            scalp
1      HAM_0000118  ISIC_0025030    bkl      histo  80.0     male            scalp
2      HAM_0002730  ISIC_0026769    bkl      histo  80.0     male            scalp
3      HAM_0002730  ISIC_0025661    bkl      histo  80.0     male            scalp
4      HAM_0001466  ISIC_0031633    bkl      histo  75.0     male              ear
5      HAM_0001466  ISIC_0027850    bkl      histo  75.0     male              ear
6      HAM_0002761  ISIC_0029176    bkl      histo  60.0     male             face
7      HAM_0002761  ISIC_0029068    bkl      histo  60.0     male             face
8      HAM_0005132  ISIC_0025837    bkl      histo  70.0   female             back
9      HAM_0005132  ISIC_0025209    bkl      histo  70.0   female             back
10     HAM_0001396  ISIC_0025276    bkl      histo  55.0   female            trunk
11     HAM_0004234  ISIC_0029396    bkl      histo  85.0   female            chest
12     HAM_0004234  ISIC_0025984    bkl      histo  85.0   female            chest
13     HAM_0001949  ISIC_0025767    bkl      histo  70.0     male            trunk
14     HAM_0001949  ISIC_0032417    bkl      histo  70.0     male            trunk
15     HAM_0007207  ISIC_0031326    bkl      histo  65.0     male             back
16     HAM_0001601  ISIC_0025915    bkl      histo  75.0     male  upper extremity

Now we create a dict that identifies the acronym contained in the dx variable with the name of the tumor and create a new variable in the dataset containing the names of the tumors:

lesion_type_dict = {
    'nv': 'Melanocytic-nevi',
    'mel': 'Melanoma',
    'bkl': 'Benign-keratosis-like-lesions',
    'bcc': 'Basal-cell-carcinoma',
    'akiec': 'Actinic-keratoses',
    'vasc': 'Vascular-lesions',
    'df': 'Dermatofibroma'

disease =[]
for i in range(len(df)):

Next we create a image directory that will contain the subdirs related to the tumor types.

import os
from pathlib import Path
data_dir = Path('images')

for dis in list(lesion_type_dict.values()):
    path = str(data_dir) + "/" +dis

All photos of the tumors are copied into the previously created subdirs.

import shutil

for i in range(len(df)):
    f= Path('HAM10000_images_part_1/' + df.loc[i,"image_id"] +'.jpg')
    if os.path.isfile(f):
        shutil.copy(f, Path('images/' + df.loc[i,"disease"]))
        shutil.copy(Path('HAM10000_images_part_2/' + df.loc[i,"image_id"] +'.jpg'), Path('images/' + df.loc[i,"disease"]))
    if i%1000==0:

To download our transformer we install two Hugging Face libraries.

pip install -q datasets transformers

One connects with Hugging Face where one has to create and copy a token needed for login:

from huggingface_hub import notebook_login


After succesful login, we need to install git lfs and setup the git-credential store.

! apt -qq install git-lfs
! git config --global credential.helper store

Then we create a new dataset using ImageFolder and the previously created image directory.

from datasets import load_dataset
from pathlib import Path

data_dir = Path('images')
dataset = ImageFolder(data_dir)
ds = load_dataset("imagefolder", data_dir=data_dir)

The new dataset contains 10015 rows related to photos with the variables image and label, where label represents the type of tumor associated with the image.


    train: Dataset({
        features: ['image', 'label'],
        num_rows: 10015


{'image': Image(decode=True, id=None),
 'label': ClassLabel(num_classes=7, names=['Actinic-keratoses', 'Basal-cell-carcinoma', 'Benign-keratosis-like-lesions', 'Dermatofibroma', 'Melanocytic-nevi', 'Melanoma', 'Vascular-lesions'], id=None)}


ClassLabel(num_classes=7, names=['Actinic-keratoses', 'Basal-cell-carcinoma', 'Benign-keratosis-like-lesions', 'Dermatofibroma', 'Melanocytic-nevi', 'Melanoma', 'Vascular-lesions'], id=None)

Data Preprocessing

Now we choose the Hugging Face model to do the data training with.

model_checkpoint = "microsoft/swin-tiny-patch4-window7-224" # pre-trained model from which to fine-tune
batch_size = 32 # batch size for training and evaluation

from transformers import AutoFeatureExtractor

feature_extractor = AutoFeatureExtractor.from_pretrained(model_checkpoint)

Transformer features:

ViTFeatureExtractor {
  "do_normalize": true,
  "do_resize": true,
  "feature_extractor_type": "ViTFeatureExtractor",
  "image_mean": [
  "image_std": [
  "resample": 3,
  "size": 224

Next we set up two transformation pipelines: train_transforms for training data and val_transforms for validation data. These pipelines prepare images and ensuring that they are properly resized, augmented (for training), converted to tensors, and normalized before being fed into the model.

from torchvision.transforms import (

normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
train_transforms = Compose(

val_transforms = Compose(

def preprocess_train(example_batch):
    """Apply train_transforms across a batch."""
    example_batch["pixel_values"] = [
        train_transforms(image.convert("RGB")) for image in example_batch["image"]
    return example_batch

def preprocess_val(example_batch):
    """Apply val_transforms across a batch."""
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

Divide our dataset into a training set and a validation set.

splits = ds["train"].train_test_split(test_size=0.1)
train_ds = splits['train']
val_ds = splits['test']



from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

model = AutoModelForImageClassification.from_pretrained(
    ignore_mismatched_sizes = True, # provide this in case you're planning to fine-tune an already fine-tuned checkpoint

The hyperparameters are set (in this specific case, 1 epoch takes 1.5 hours, and 3 epochs take 4.5 hours).

from datasets import load_metric
import numpy as np
import torch

model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    evaluation_strategy = "epoch",
    save_strategy = "epoch",

metric = load_metric("accuracy")

def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

trainer = Trainer(

train_results = trainer.train()

Now the training is running…

***** Running training *****
  Num examples = 9013
  Num Epochs = 1
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 4
  Total optimization steps = 70

[70/70 1:45:41, Epoch 0/1]
Epoch 	Training Loss 	Validation Loss 	Accuracy
0 	0.691100 	0.769540 	0.727545

***** Running Evaluation *****
  Num examples = 1002
  Batch size = 32
Saving model checkpoint to swin-tiny-patch4-window7-224-finetuned-skin-cancer/checkpoint-70
Configuration saved in swin-tiny-patch4-window7-224-finetuned-skin-cancer/checkpoint-70/config.json
Model weights saved in swin-tiny-patch4-window7-224-finetuned-skin-cancer/checkpoint-70/pytorch_model.bin
Feature extractor saved in swin-tiny-patch4-window7-224-finetuned-skin-cancer/checkpoint-70/preprocessor_config.json
Feature extractor saved in swin-tiny-patch4-window7-224-finetuned-skin-cancer/preprocessor_config.json

Training completed. Do not forget to share your model on =)

Loading best model from swin-tiny-patch4-window7-224-finetuned-skin-cancer/checkpoint-70 (score: 0.7275449101796407).

Rest is optional but nice to have. These lines of code involve saving the trained model, logging and saving training metrics, and saving the trainer’s state for future reference or continuation of training.

trainer.log_metrics("train", train_results.metrics)
trainer.save_metrics("train", train_results.metrics)

metrics = trainer.evaluate()
# some nice to haves:
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)


***** Running Evaluation *****
  Num examples = 1002
  Batch size = 32

[32/32 04:21]

***** eval metrics *****
  epoch                   =       0.99
  eval_accuracy           =     0.7275
  eval_loss               =     0.7695
  eval_runtime            = 0:04:31.03
  eval_samples_per_second =      3.697
  eval_steps_per_second   =      0.118


Finally we can test the model with a photo named Melanoma.jpg by obtaining the tumor classification with some accuracy:

from transformers import pipeline

pipe = pipeline("image-classification", "swin-tiny-patch4-window7-224-finetuned-skin-cancer")

from PIL import Image
import requests

url = ''
image =, stream=True).raw)



[{'score': 0.7181050181388855, 'label': 'Melanocytic-nevi'},
 {'score': 0.2611122727394104, 'label': 'Melanoma'},
 {'score': 0.011896616779267788, 'label': 'Benign-keratosis-like-lesions'},
 {'score': 0.00416933186352253, 'label': 'Actinic-keratoses'},
 {'score': 0.0018915936816483736, 'label': 'Dermatofibroma'}]

That’s all folks!