PaddleHub
PaddleHub

Customized Data

In the training of a new task, it is a time-consuming process starting from zero, without producing the desired results probably. You can use the pre-training model provided by PaddleHub for fine-tune of a specific task. You just need to perform pre-processing of the customized data accordingly, and then input the pre-training model to get the corresponding results. Refer to the following for setting up the structure of the dataset.

I. Image Classification Dataset

When migrating a classification task using custom data with PaddleHub, you need to slice the dataset into training set, validation set, and test set.

Data Preparation

Three text files are needed to record the corresponding image paths and labels, plus a label file to record the name of the label.

├─data:
  ├─train_list.txt:
  ├─test_list.txt:
  ├─validate_list.txt:
  ├─label_list.txt:
  └─...

The format of the data list file for the training/validation/test set is as follows:

Path-1  label-1
Path-2  label-2
...

The format of label_list.txt is:

Classification 1
Classification 2
...

Example: Take Flower Dataset as an example, train_list.txt/test_list.txt/validate_list.txt:

roses/8050213579_48e1e7109f.jpg 0
sunflowers/45045003_30bbd0a142_m.jpg 3
daisy/3415180846_d7b5cced14_m.jpg 2

label_list.txt reads as follows.

roses
tulips
daisy
sunflowers
dandelion

Dataset Loading

For the preparation code of dataset, see flowers.py. hub.datasets.Flowers() It automatically downloads the dataset from the network and unzip it into the $HOME/.paddlehub/dataset directory. Specific usage:

from paddlehub.datasets import Flowers

flowers = Flowers(transforms)
flowers_validate = Flowers(transforms, mode='val')
  • transforms: Data pre-processing mode.

  • mode: Select data mode. Options are train, test, val. Default is train.

II. Image Coloring Dataset

When migrating a coloring task using custom data with PaddleHub, you need to slice the dataset into training set and test set.

Data Preparation

You need to divide the color images for coloring training and testing into training set data and test set data.

├─data:
  ├─train:
      |-folder1
      |-folder2
      |-……
      |-pic1
      |-pic2
      |-……

  ├─test:
    |-folder1
    |-folder2
    |-……
    |-pic1
    |-pic2
    |-……
  └─……

Example: PaddleHub provides users with a dataset for coloring `Canvas dataset. It consists of 1193 images in Monet style and 400 images in Van Gogh style. Take Canvas Dataset as an example, the contents of the train folder are as follows:

├─train:
      |-monet
          |-pic1
          |-pic2
          |-……  
      |-vango
          |-pic1
          |-pic2
          |-……

Dataset Loading

For dataset preparation codes, refer to canvas.py. hub.datasets.Canvas() It automatically downloads the dataset from the network and unzip it into the $HOME/.paddlehub/dataset directory. Specific usage:

from paddlehub.datasets import Canvas

color_set = Canvas(transforms, mode='train')
  • transforms: Data pre-processing mode.

  • mode: Select data mode. Options are train, test. The default is train.

III. Style Migration Dataset

When using custom data for style migration tasks with PaddleHub, you need to slice the dataset into training set and test set.

Data Preparation

You need to split the color images for style migration into training set and test set data.

├─data:
  ├─train:
      |-folder1
      |-folder2
      |-...
      |-pic1
      |-pic2
      |-...

  ├─test:
    |-folder1
    |-folder2
    |-...
    |-pic1
    |-pic1
    |-...
  |- 21styles
    |-pic1
    |-pic1
  └─...

Example: PaddleHub provides users with data sets for style migration MiniCOCO dataset. Training set data and test set data is from COCO2014, in which there are 2001 images in the training set and 200 images in the test set. There are 21 images with different styles in 21styles folder. Users can change the images of different styles as needed. Take MiniCOCO Dataset as an example. The content of train folder is as follows:

├─train:
      |-train
          |-pic1
          |-pic2
          |-……  
      |-test
          |-pic1
          |-pic2
          |-……
      |-21styles
          |-pic1
          |-pic2
          |-……

Dataset Loading

For the preparation codes of the dataset, refer to minicoco.py. hub.datasets.MiniCOCO() It automatically downloads the dataset from the network and unzip it into the $HOME/.paddlehub/dataset directory. Specific usage:

from paddlehub.datasets import MiniCOCO

ccolor_set = MiniCOCO(transforms, mode='train')
  • transforms: Data pre-processing mode.

  • mode: Select data mode. Options are train, test. The default is train.