trainTransformationModel

Trains up a simple regression model to match a bias model to an experiment.

The transformation input file is a JSON file that names a solo model and gives the experimental data that it should be fit to. Note that it may occasionally be appropriate to chain several transformation models together. Currently, the easiest way to do this is to feed the first transformation model in as the solo model for the second transformation. A better way to do it would be to write your own custom transformation Model.

BNF

<transformation-input-configuration> ::=
    {"settings" : <transformation-settings-section>,
        <data-section>,
        <head-section>, <verbosity-section>}

<transformation-settings-section> ::=
     {"output-prefix" : "<string>",
      "epochs" : <integer>,
      "max-jitter" : <integer>,
      "early-stopping-patience" : <integer>,
      "batch-size" : <integer>,
      "learning-rate" : <number>,
      "learning-rate-plateau-patience" : <integer>
      "solo-model-file" : <file-name>,
      "input-length" : <integer>,
      "output-length" : <integer>,
      "profile-architecture" : {<transformation-architecture-specification>},
      "counts-architecture" : {<transformation-architecture-specification>} }

<transformation-architecture-specification> ::=
    <simple-transformation-architecture-specification>
  | "name" : "passthrough"

<simple-transformation-architecture-specification> ::=
    "name" : "simple",
    "types" : [<list-of-simple-transformation-types>]

<list-of-simple-transformation-types> ::=
    <simple-transformation-type>
  | <simple-transformation-type>, <list-of-simple-transformation-types>

<simple-transformation-type> ::=
    "linear"
  | "sigmoid"
  | "relu"

Parameter Notes

Most of the parameters for the transformation model are the same as for a solo model, and they are described at trainSoloModel.

solo-model-file: The name of the file (or directory, since that’s how keras likes to save models) that contains the solo model.
passthrough: This transformation does nothing to the solo model, it doesn’t regress anything.
simple: This transformation applies the specified functions to the output of the solo model, and adjusts the parameters to best fit the experimental data. A linear model applies \(y=m x+b\) to the solo predictions (which, remember, are in log-space), a sigmoid applies \(y = m_1 *sigmoid(m_2x+b_2) + b_1\), and a relu applies \(y = m_1 * relu(m_2x+b_2) + b_1\). In other words, there’s a linear model both before and after the sigmoid or relu activation. Generally, you need to use these more complex functions when the solo model is not a great fit for the experimental bias.

History

Before BPReveal 3.0.0, there was a cropdown transformation option. It turned out to be mathematically inappropriate, and so it was removed.

Also in BPReveal 3.0.0, a parameter named sequence-input-length was renamed to just input-length.

API

bpreveal.trainTransformationModel.main(config): Build and train the transformation model.

Schema

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "trainTransformationModel",
    "description": "Schema for trainTransformationModel.py",
    "type": "object",
    "properties": {
        "train-data": {"type": "string"},
        "val-data": {"type": "string"},
        "settings": {
            "type": "object",
            "properties": {
                "output-prefix": {"type": "string"},
                "epochs": {"type": "integer"},
                "max-jitter": {"type": "integer"},
                "early-stopping-patience": {"type": "integer"},
                "batch-size": {"type": "integer"},
                "learning-rate": {"type": "number"},
                "learning-rate-plateau-patience": {"type": "integer"},
                "solo-model-file": {"type": "string"},
                "input-length": {"type": "integer"},
                "output-length": {"type": "integer"},
                "profile-architecture": {
                    "$ref": "#/definitions/transformation-architecture-specification"
                },
                "counts-architecture": {
                    "$ref": "#/definitions/transformation-architecture-specification"
                }
            },
            "required": ["output-prefix", "epochs", "max-jitter",
                "early-stopping-patience", "batch-size", "learning-rate",
                "learning-rate-plateau-patience", "solo-model-file",
                "input-length", "output-length", "profile-architecture",
                "counts-architecture"],
            "not": {"required": ["architecture", "transformation-model"]}
        },
        "heads": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "num-tasks": {"type": "integer"},
                    "profile-loss-weight": {"type": "number"},
                    "head-name": {"type": "string"},
                    "counts-loss-weight": {"type": "number"},
                    "counts-loss-frac-target": {"type": "number"}
                },
                "anyOf": [
                    {"required": ["counts-loss-weight"]},
                    {"required": ["counts-loss-frac-target"]}
                ],
                "required": ["num-tasks", "profile-loss-weight", "head-name"],
                "not": {"required": ["use-bias-counts"]}
            }
        },
        "verbosity": {"$ref": "/schema/base#/definitions/verbosity"}
    },
    "required": ["heads", "train-data", "val-data", "settings",  "verbosity"],
    "definitions": {
        "transformation-architecture-specification": {
            "oneOf": [
                {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string", "enum": ["passthrough"]}
                    },
                    "required": ["name"]
                },
                {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string", "enum": ["simple"]},
                        "types": {
                            "type": "array",
                            "items": {
                                "type": "string",
                                "enum": ["linear", "sigmoid", "relu"]
                            }
                        }
                    },
                    "required": ["name", "types"]
                }
            ]
        }
    }
}