Reference guide¶
Data handling for input function¶
data_loader(csv_filename, str], params[, labels]) |
Loads, preprocesses (data augmentation, padding) and feeds the data |
padding_inputs_width(image, target_shape, …) |
Given an input image, will pad it to return a target_shape size padded image. |
augment_data(image, max_rotation) |
Data augmentation on an image (padding, brightness, contrast, rotation) |
random_rotation(img, max_rotation, crop) |
Rotates an image with a random angle. |
random_padding(image, max_pad_w, max_pad_h) |
Given an image will pad its border adding a random number of rows and columns |
serving_single_input(fixed_height, min_width) |
Serving input function needed for export (in TensorFlow). |
Config for training¶
Alphabet(lookup_alphabet_file, blank_symbol) |
Object for alphabet / symbols units. |
TrainingParams(**kwargs) |
Object for parameters related to the training. |
Params(**kwargs) |
Object for general parameters |
import_params_from_json(model_directory, …) |
Read the exported json file with parameters of the experiment. |
Model¶
deep_cnn(input_imgs, input_channels, …) |
CNN part of the CRNN network. |
deep_bidirectional_lstm(inputs, params, …) |
Recurrent part of the CRNN network. |
crnn_fn(features, labels, mode, params) |
CRNN model definition for tf.Estimator. |
get_words_from_chars(characters_list, …[, …]) |
Joins separated characters to form words. |
Loading exported model¶
PredictionModel(model_dir, session, signature) |
Helper class to load an exported model and apply it to image segments for transcription. |
-
tf_crnn.data_handler.augment_data(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203367b320>, max_rotation: float = 0.1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dc18>[source]¶ Data augmentation on an image (padding, brightness, contrast, rotation)
Parameters: - image – Tensor
- max_rotation – float, maximum permitted rotation (in radians)
Returns: Tensor
-
tf_crnn.data_handler.data_loader(csv_filename: Union[List[str], str], params: tf_crnn.config.Params, labels=True, batch_size: int = 64, data_augmentation: bool = False, num_epochs: int = None, image_summaries: bool = False)[source]¶ Loads, preprocesses (data augmentation, padding) and feeds the data
Parameters: - csv_filename – filename or list of filenames
- params – Params object containing all the parameters
- labels – transcription labels
- batch_size – batch_size
- data_augmentation – flag to select or not data augmentation
- num_epochs – feeds the data ‘num_epochs’ times
- image_summaries – floag to show image summaries or not
Returns: data_loader function
-
tf_crnn.data_handler.padding_inputs_width(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dcc0>, target_shape: Tuple[int, int], increment: int) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f203581e4e0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581eac8>][source]¶ Given an input image, will pad it to return a target_shape size padded image. There are 3 cases:
- image width > target width : simple resizing to shrink the image
- image width >= 0.5*target width : pad the image
- image width < 0.5*target width : replicates the image segment and appends it
Parameters: - image – Tensor of shape [H,W,C]
- target_shape – final shape after padding [H, W]
- increment – reduction factor due to pooling between input width and output width, this makes sure that the final width will be a multiple of increment
Returns: (image padded, output width)
-
tf_crnn.data_handler.random_padding(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec940>, max_pad_w: int = 5, max_pad_h: int = 10) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f20336800f0>[source]¶ Given an image will pad its border adding a random number of rows and columns
Parameters: - image – image to pad
- max_pad_w – maximum padding in width
- max_pad_h – maximum padding in height
Returns: a padded image
-
tf_crnn.data_handler.random_rotation(img: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec400>, max_rotation: float = 0.1, crop: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec828>[source]¶ Rotates an image with a random angle. See https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders for formulae
Parameters: - img – Tensor
- max_rotation – maximum angle to rotate (radians)
- crop – boolean to crop or not the image after rotation
Returns:
-
tf_crnn.data_handler.serving_single_input(fixed_height: int = 32, min_width: int = 8)[source]¶ Serving input function needed for export (in TensorFlow). Features to serve :
- images : greyscale image
- input_filename : filename of image segment
- input_rgb: RGB image segment
Parameters: - fixed_height – height of the image to format the input data with
- min_width – minimum width to resize the image
Returns: serving_input_fn
-
class
tf_crnn.config.Alphabet(lookup_alphabet_file: str = None, blank_symbol: str = '$')[source]¶ Object for alphabet / symbols units.
Variables: - _blank_symbol (str) – Blank symbol used for CTC
- _alphabet_units (List[str]) – list of elements composing the alphabet. The units may be a single character or multiple characters.
- _codes (List[int]) – Each alphabet unit has a unique corresponding code.
- _nclasses (int) – number of alphabet units.
-
alphabet_units¶
-
blank_symbol¶
-
check_input_file_alphabet(csv_filenames: List[str], discarded_chars: str = ';|\t\n\r\x0b\x0c', csv_delimiter: str = ';') → None[source]¶ Checks if labels of input files contains only characters that are in the Alphabet.
Parameters: - csv_filenames – list of the csv filename
- discarded_chars – discarded characters
- csv_delimiter – character delimiting field in the csv file
Returns:
-
codes¶
-
classmethod
create_lookup_from_labels(csv_files: List[str], export_lookup_filename: str, original_lookup_filename: str = None)[source]¶ Create a lookup dictionary for csv files containing labels. Exports a json file with the Alphabet.
Parameters: - csv_files – list of files to get the labels from (should be of format path;label)
- export_lookup_filename – filename to export alphabet lookup dictionary
- original_lookup_filename – original lookup filename to update (optional)
Returns:
-
n_classes¶
-
class
tf_crnn.config.Params(**kwargs)[source]¶ Object for general parameters
Variables: - input_shape (Tuple[int, int]) – input shape of the image to batch (this is the shape after data augmentation). The original will either be resized or pad depending on its original size
- input_channels (int) – number of color channels for input image
- csv_delimiter (str) – character to delimit csv input files
- string_split_delimiter (str) – character that delimits each alphabet unit in the labels
- num_gpus (int) – number of gpus to use
- lookup_alphabet_file (str) – json file that contains the mapping alphabet units <-> codes
- csv_files_train (str) – csv filename which contains the (path;label) of each training sample
- csv_files_eval (str) – csv filename which contains the (path;label) of each eval sample
- output_model_dir (str) – output directory where the model will be saved and exported
- keep_prob_dropout (float) – keep probability
- num_beam_paths (int) – number of paths (transcriptions) to return for ctc beam search (only used when predicting)
- data_augmentation (bool) – if True augments data on the fly
- data_augmentation_max_rotation (float) – max permitted roation to apply to image during training (radians)
- input_data_n_parallel_calls (int) – number of parallel calls to make when using Dataset.map()
-
keep_prob_dropout¶
-
class
tf_crnn.config.TrainingParams(**kwargs)[source]¶ Object for parameters related to the training.
Variables: - n_epochs (int) – numbers of epochs to run the training (default: 50)
- train_batch_size (int) – batch size during training (default: 64)
- eval_batch_size (int) – batch size during evaluation (default: 128)
- learning_rate (float) – initial learning rate (default: 1e-4)
- learning_decay_rate (float) – decay rate for exponential learning rate (default: .96)
- learning_decay_steps (int) – decay steps for exponential learning rate (default: 1000)
- evaluate_every_epoch (int) – evaluate every ‘evaluate_every_epoch’ epoch (default: 5)
- save_interval (int) – save the model every ‘save_interval’ step (default: 1e3)
- optimizer (str) – which optimizer to use (‘adam’, ‘rms’, ‘ada’) (default: ‘adam)
-
tf_crnn.config.import_params_from_json(model_directory: str = None, json_filename: str = None) → dict[source]¶ Read the exported json file with parameters of the experiment.
Parameters: - model_directory – Direcoty where the odel was exported
- json_filename – filename of the file
Returns: a dictionary containing the parameters of the experiment
-
tf_crnn.model.crnn_fn(features, labels, mode, params)[source]¶ CRNN model definition for
tf.Estimator. Combinesdeep_cnnanddeep_bidirectional_lstmto define the model and adds loss computation and CTC decoder.Parameters: - features – dictionary with keys : ‘images’, ‘images_widths’, ‘filenames’
- labels – string containing the transcriptions. Flattend (1D) array with encoded label (one code per character)
- mode – TRAIN, EVAL, PREDICT
- params – dictionary with keys: ‘Params’, ‘TrainingParams’
Returns:
-
tf_crnn.model.deep_bidirectional_lstm(inputs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a390>, params: tf_crnn.config.Params, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a320>[source]¶ Recurrent part of the CRNN network. Uses a biderectional LSTM.
Parameters: - inputs – output of
deep_cnn - params – parameters of the model
- summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns: Tuple : (tensor [width(time), batch, n_classes], raw transcription codes)
- inputs – output of
-
tf_crnn.model.deep_cnn(input_imgs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a3c8>, input_channels: int, is_training: bool, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a2b0>[source]¶ CNN part of the CRNN network.
Parameters: - input_imgs – input images [B, H, W, C]
- input_channels – input channels, 1 for greyscale images, 3 for RGB color images
- is_training – flag to indicate training or not
- summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns: tensor of shape [batch, final_width, final_height x final_features]
-
class
tf_crnn.loader.PredictionModel(model_dir: str, session: <sphinx.ext.autodoc.importer._MockObject object at 0x7f2033696a90> = None, signature: str = 'predictions')[source]¶ Helper class to load an exported model and apply it to image segments for transcription.
Variables: - session (tf.Session) –
tf.Sessionwithin which to run the loading process - model – loaded exported model
Parameters: - model_dir – directory containing the saved model files.
- session –
tf.Sessionto load the model - signature –
which signature to use to select the type of input :
- predictions (default) : input a grayscale image
- rgb_images : input a RGB image
- filename : input the filename of the image segment
- session (tf.Session) –