How to use MolDescriptor
Calculating molecular data from input the field
Single molecule as inputMolDescriptor takes "simplified molecular-input line-entry system" (SMILES) as input. After the user inputs a SMILES, one can select the descriptors to be calculating by checking the descriptor checkboxes.
Multiple molecules as inputMoldescriptor can take multiple SMILES as input by adding a comma between each molecule. Example of multiple molecules as input:
n1ccccc1, CN1C=NC2=C1C(=O)N(C(=O)N2C)C, CC(CCOC(=O)CC(C)O)O
When you select the "Exclude Invalid SMILES from result" option, any invalid SMILES entries will be excluded from the displayed table on the website. If you do not select this option and an invalid SMILES is inputted, the table will show an "Error" row indicating the invalid SMILES.
Calculating molecular data from uploading a CSV
Format of uploaded CSVThe CSV to be uploaded should contain one column of SMILES.
Example of a CSV in table format |
---|
C |
O=C=O |
CCO |
C1CCCCC1 |
Calculating molecular data from a CSV file is done by first selecting the descriptors one wants to calculate, then clicking the "Upload CSV" button. This will automatically calculate the contents of the CSV provided it is in the correct format.
Downloading the results as a CSV
Ater providing the input field with one or many smiles, the user can select the descriptors to be calculated. Pressing the "Download CSV" button will download a CSV file containing the SMILES and the selected descriptors.
When the user inputs the following molecules and select the checkboxes for Number of Atoms, Molecular Weight, Polar Surface Area, and clogP, the system will produce the following CSV:
Input: CCN(C)C(=O)C, c1ccccc1Cl, C1COCCO1
SMILES | Number of C atoms | Number of Cl atoms | Number of H atoms | Number of N atoms | Number of O atoms | Number of atoms total: | MolecularWeight | PSA | clogP |
---|---|---|---|---|---|---|---|---|---|
CCN(C)C(=O)C | 5 | 11 | 1 | 1 | 18 | 101.084063972 | 20.310000000000002 | 0.48460000000000003 | |
c1ccccc1Cl | 6 | 1 | 5 | 12 | 112.00797784 | 0.0 | 2.34 | ||
C1COCCO1 | 4 | 8 | 2 | 14 | 88.052429496 | 18.46 | 0.03320000000000001 |
The name of the downloaded CSV will contain the name of the current RDKit version: data_RDKit_[current-version].csv
By selecting the "Exclude Invalid SMILES from result" option, any invalid SMILES will be omitted from the CSV output. If this option is not selected, a new column named "Error" will be added to the CSV to highlight the invalid SMILES entries.
Descriptors
Processing and Visualization Options
Exclude Invalid SMILES from result
When processing a list of SMILES strings, not all of them might be valid representations of molecules. Invalid SMILES strings can cause issues when trying to compute descriptors or visualize molecules.
By checking the "Exclude Invalid SMILES from result" option, you ensure that:
- Any SMILES strings that can't be interpreted as valid molecules will be automatically excluded from the results.
- You'll receive results only for valid molecular structures, making it more streamlined and error-free.
This is particularly useful when batch processing multiple SMILES at once, as you won't have to manually filter out problematic strings.
Display Molecule
When users submit a SMILES representation of a molecule, our system can generate a visual depiction of that molecule. If the "Display Molecule" option is checked, each submitted SMILES will be processed to generate an image representation of the molecule.
How It Works
- Retrieving Input: The application first retrieves the list of SMILES strings either from the uploaded CSV file or directly from the input field.
- Validation: If the "Exclude Invalid SMILES" option is checked, the system filters out invalid SMILES representations.
- Image Generation:
- For each valid SMILES string, the system converts it into a molecular object using RDKit.
- If the "Image" option is among the selected descriptors, the system generates an image representation of the molecule using the
Draw.MolToImage
function of RDKit. - This image is converted into a PNG format, buffered, and then encoded into a base64 string for embedding directly within the web page. This ensures that the image is viewable on the website but is not included in the downloadable CSV.
RDkit Chem
Molecular Weight
The exact molecular weight of a molecule is calculated from the atomic weights of individual atoms. It represents the sum of the atomic weights of the constituent atoms in the molecule.
RDKit Function: Descriptors.ExactMolWt(molecule)
Polar Surface Area (PSA)
The topological polar surface area is a sum over all polar atoms, primarily oxygen and nitrogen, also including their attached hydrogens. It can be used as a measure of drug transport properties.
RDKit Function: Descriptors.TPSA(molecule)
QED (Quantitative Estimation of Drug-likeness)
The Quantitative Estimate of Drug-likeness (QED) is a measure designed to reflect how "drug-like" a compound is in terms of physicochemical properties. A QED score closer to 1 typically indicates a more drug-like compound.
RDKit Function: QED.qed(molecule)
Number of Atoms
This provides the total count of atoms in a molecule, which can be useful for understanding the size and complexity of the molecule.
RDKit Function: molecule.GetNumAtoms()
Solvent Accessible Surface Area (SASA)
The Solvent Accessible Surface Area (SASA) is a measure of the surface area of a biomolecule that is accessible to a solvent. It's important in understanding interactions like protein-ligand binding.
Calculating Solvent Accessible Surface Area
Generate 3D coordinates for the molecule using
AllChem.EmbedMolecule(molecule, AllChem.ETKDG())
Classify atoms and get radii with
rdFreeSASA.classifyAtoms(molecule)
Use
rdFreeSASA.CalcSASA(molecule, radii, confIdx=-1, opts=sasa_opts)
with the appropriate SASA options.
RDkit Crippen
clogP
The calculated logarithm of the partition coefficient between n-octanol and water (clogP). It's a measure of the hydrophobicity of the molecule and plays an important role in ADME predictions.
RDKit Function: Crippen.MolLogP(molecule)