PENENANG JIWA

Tuesday, January 11, 2011

SMILE

Simplified molecular Input Line Entry Specification 

 introduction:
The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.
The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. In 2007, an open standard called "OpenSMILES" was developed by the Blue Obelisk open-source chemistry community. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).
In July 2006, the IUPAC introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.

 

 

A Simplified Chemical Language

SMILES (Simplified Molecular Input Line Entry System) is a line notation (a typographical method using printable characters) for entering and representing molecules and reactions. Some examples are:



SMILESNameSMILESName
CC ethane [OH3+] hydronium ion
O=C=O carbon dioxide [2H]O[2H] deuterium oxide
C#N hydrogen cyanide [235U] uranium-235
CCN(CC)CC triethylamine F/C=C/F E-difluoroethene
CC(=O)O acetic acid F/C=C\F Z-difluoroethene
C1CCCCC1 cyclohexane N[C@@H](C)C(=O)O L-alanine
c1ccccc1 benzene N[C@H](C)C(=O)O D-alanine

Reaction SMILESName
[I-].[Na+].C=CCBr>>[Na+].[Br-].C=CCI displacement reaction
(C(=O)O).(OCC)>>(C(=O)OCC).(O) intermolecular esterification

SMILES contains the same information as might be found in an extended connection table. The primary reason SMILES is more useful than a connection table is that it is a linguistic construct, rather than a computer data structure. SMILES is a true language, albeit with a simple vocabulary (atom and bond symbols) and only a few grammar rules. SMILES representations of structure can in turn be used as "words" in the vocabulary of other languages designed for storage of chemical information (information about chemicals) and chemical intelligence (information about chemistry).

Part of the power of SMILES is that unique SMILES exist. With standard SMILES, the name of a molecule is synonymous with its structure; with unique SMILES, the name is universal. Anyone in the world who uses unique SMILES to name a molecule will choose the exact same name.
One other important property of SMILES is that it is quite compact compared to most other methods of representing structure. A typical SMILES will take 50% to 70% less space than an equivalent connection table, even binary connection tables. For example, a database of 23,137 structures, with an average of 20 atoms per structure, uses only 1.6 bytes per atom when represented with SMILES. In addition, ordinary compression of SMILES is extremely effective. The same database cited above was reduced to 27% of its original size by Ziv-Lempel compression (i.e. 0.42 bytes per atom).
These properties open many doors to the chemical information programmer. Examples of uses for SMILES are:
  • Keys for database access
  • Mechanism for researchers to exchange chemical information
  • Entry system for chemical data
  • Part of languages for artificial intelligence or expert systems in chemistry
The rest of this chapter is a concise exposition of the SMILES encoding rules. For further information, the reader is referred to "SMILES 1. Introduction and Encoding Rules", Weininger, D., J.Chem. Inf. Comput. Sci. 1988, 28,31.




No comments:

Post a Comment