How to use MolDescriptor

Calculating molecular data from input the field

Single molecule as input

MolDescriptor takes "simplified molecular-input line-entry system" (SMILES) as input. After the user inputs a SMILES, one can select the descriptors to be calculating by checking the descriptor checkboxes.

Multiple molecules as input

Moldescriptor can take multiple SMILES as input by adding a comma between each molecule. Example of multiple molecules as input:

n1ccccc1, CN1C=NC2=C1C(=O)N(C(=O)N2C)C, CC(CCOC(=O)CC(C)O)O

Error handling

When you select the "Exclude Invalid SMILES from result" option, any invalid SMILES entries will be excluded from the displayed table on the website. If you do not select this option and an invalid SMILES is inputted, the table will show an "Error" row indicating the invalid SMILES.

Calculating molecular data from uploading a CSV

Format of uploaded CSV

The CSV to be uploaded should contain one column of SMILES.

Example of a CSV in table format
C
O=C=O
CCO
C1CCCCC1

Calculating molecular data from a CSV file is done by first selecting the descriptors one wants to calculate, then clicking the "Upload CSV" button. This will automatically calculate the contents of the CSV provided it is in the correct format.

Downloading the results as a CSV

Ater providing the input field with one or many smiles, the user can select the descriptors to be calculated. Pressing the "Download CSV" button will download a CSV file containing the SMILES and the selected descriptors.

When the user inputs the following molecules and select the checkboxes for Number of Atoms, Molecular Weight, Polar Surface Area, and clogP, the system will produce the following CSV:

Input: CCN(C)C(=O)C, c1ccccc1Cl, C1COCCO1

SMILES Number of C atoms Number of Cl atoms Number of H atoms Number of N atoms Number of O atoms Number of atoms total: MolecularWeight PSA clogP
CCN(C)C(=O)C 5 11 1 1 18 101.084063972 20.310000000000002 0.48460000000000003
c1ccccc1Cl 6 1 5 12 112.00797784 0.0 2.34
C1COCCO1 4 8 2 14 88.052429496 18.46 0.03320000000000001

The name of the downloaded CSV will contain the name of the current RDKit version: data_RDKit_[current-version].csv

Error handling

By selecting the "Exclude Invalid SMILES from result" option, any invalid SMILES will be omitted from the CSV output. If this option is not selected, a new column named "Error" will be added to the CSV to highlight the invalid SMILES entries.

Descriptors

Processing and Visualization Options

Exclude Invalid SMILES from result

When processing a list of SMILES strings, not all of them might be valid representations of molecules. Invalid SMILES strings can cause issues when trying to compute descriptors or visualize molecules.

By checking the "Exclude Invalid SMILES from result" option, you ensure that:

  • Any SMILES strings that can't be interpreted as valid molecules will be automatically excluded from the results.
  • You'll receive results only for valid molecular structures, making it more streamlined and error-free.

This is particularly useful when batch processing multiple SMILES at once, as you won't have to manually filter out problematic strings.

Display Molecule

When users submit a SMILES representation of a molecule, our system can generate a visual depiction of that molecule. If the "Display Molecule" option is checked, each submitted SMILES will be processed to generate an image representation of the molecule.

How It Works

  1. Retrieving Input: The application first retrieves the list of SMILES strings either from the uploaded CSV file or directly from the input field.
  2. Validation: If the "Exclude Invalid SMILES" option is checked, the system filters out invalid SMILES representations.
  3. Image Generation:
    • For each valid SMILES string, the system converts it into a molecular object using RDKit.
    • If the "Image" option is among the selected descriptors, the system generates an image representation of the molecule using the Draw.MolToImage function of RDKit.
    • This image is converted into a PNG format, buffered, and then encoded into a base64 string for embedding directly within the web page. This ensures that the image is viewable on the website but is not included in the downloadable CSV.

RDkit Chem

Molecular Weight

The exact molecular weight of a molecule is calculated from the atomic weights of individual atoms. It represents the sum of the atomic weights of the constituent atoms in the molecule.

RDKit Function: Descriptors.ExactMolWt(molecule)

Polar Surface Area (PSA)

The topological polar surface area is a sum over all polar atoms, primarily oxygen and nitrogen, also including their attached hydrogens. It can be used as a measure of drug transport properties.

RDKit Function: Descriptors.TPSA(molecule)

QED (Quantitative Estimation of Drug-likeness)

The Quantitative Estimate of Drug-likeness (QED) is a measure designed to reflect how "drug-like" a compound is in terms of physicochemical properties. A QED score closer to 1 typically indicates a more drug-like compound.

RDKit Function: QED.qed(molecule)

Number of Atoms

This provides the total count of atoms in a molecule, which can be useful for understanding the size and complexity of the molecule.

RDKit Function: molecule.GetNumAtoms()

Solvent Accessible Surface Area (SASA)

The Solvent Accessible Surface Area (SASA) is a measure of the surface area of a biomolecule that is accessible to a solvent. It's important in understanding interactions like protein-ligand binding.

Calculating Solvent Accessible Surface Area

  1. Generate 3D coordinates for the molecule using AllChem.EmbedMolecule(molecule, AllChem.ETKDG())

  2. Classify atoms and get radii with rdFreeSASA.classifyAtoms(molecule)

  3. Use rdFreeSASA.CalcSASA(molecule, radii, confIdx=-1, opts=sasa_opts) with the appropriate SASA options.

RDkit Crippen

clogP

The calculated logarithm of the partition coefficient between n-octanol and water (clogP). It's a measure of the hydrophobicity of the molecule and plays an important role in ADME predictions.

RDKit Function: Crippen.MolLogP(molecule)