Security Theater and the Blockchain Project
ConclusionsImproving the security of sensitive scientific data is a worthwhile goal. Hash-based audit trails like the one proposed in the UCSF paper might offer some benefit in very limited cases. But claims of immutable and tamper proof data protection should be viewed with great skepticism. Bitcoin solved a very specific problem through a brilliant combination of existing technologies and economic incentives. Cherry picking Bitcoin's technologies neutralizes its hard-won security guarantees.Before committing valuable resources to a block chain project, consider discussing these questions with your team:What specific secu...
Source: Depth-First - September 18, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Compiling InChI to WebAssembly Part 1: Hello InChI
ConclusionThis article presents a step-by-step procedure for cross-compiling a representative C codebase to WebAssembly using Emscripten. Although there are a few small complications to pay attention to, the procedure for generating a native binary looks almost identical to the one for generating WebAssembly. As such, the instructions here should serve as a model for cross-compiling other C codebases.For the moment, the result isn't much to look at. The next post in this series will show how to create a full-blown JavaScript API that generates InChIs from molfile strings. (Source: Depth-First)
Source: Depth-First - May 16, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

JavaScript for Cheminformatics, Part 2
ConclusionsIts first release written in a matter of days by a then-obscure company, JavaScript may be the most unlikely success story in all of software. Having emerged from a dark period lasting until the mid-2000s, today's JavaScript is a full-featured programming language wrapped in a rich, mass-deployed environment. Chemistry has been extremely slow to follow the direction charted by the rest of the software industry, but glimpses into the future are everywhere. (Source: Depth-First)
Source: Depth-First - May 1, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

The SMILES Substructure Search Fallacy
ConclusionText matching on SMILES strings seems like a natural, simple solution to the substructure search problem. But this is a mirage. The futility of this approach may only become apparent after weeks or even months of effort. Regardless of the specific implementation, any robust solution must compare molecules as graphs. (Source: Depth-First)
Source: Depth-First - April 13, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

The Maximum Matching Problem
ConclusionMaximum matchings find their way into a few important cheminformatics and computational chemistry contexts. To the untrained observer, the maximum matching problem might appear trivial. Indeed, in the case of bipartite graphs it is. However, the need to deal with odd cycles for general graphs vastly increases the complexity of the solution. This article doesn't describe such a solution in detail, but a future article will. (Source: Depth-First)
Source: Depth-First - April 3, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Chemical Line Notations for Deep Learning: DeepSMILES and Beyond
ConclusionThe method used to represent molecular structure can have far-reaching effects on the utility and flexibility of machine learning models developed from them. DeepSMILES offers a fascinating first step in the direction of molecular encoding schemes optimized for consumption and generation by neural networks. The field is wide open. (Source: Depth-First)
Source: Depth-First - March 19, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Class-Free Object-Oriented Programming
ConclusionClasses are but one path to OOP. Although classes may have once served a vital optimization role on resource-strapped computers and within languages with primitive OOP tooling, their potential for harm should not be ignored. It may not be time just yet to abolish classes, but it's pretty clear that the reasons to keep them around are dwindling. (Source: Depth-First)
Source: Depth-First - March 4, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

The Language of Organic Chemistry
ConclusionAn array of powerful tools awaits problems that can be recast in terms of computational linguistics. But as the work highlighted here shows, finding the right interface and working at scale could prove tricky. Nevertheless, maximum common substructure has been established as a linguistic unit in organic chemistry, and as such could offer a roadmap for travellers wanting to make the journey. (Source: Depth-First)
Source: Depth-First - February 20, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Distributed Chemistry
ConclusionLab automation is a good match for any problem that can be reduced to a parallelizable search through some well-understood space. For years, the majority of such efforts have centered on analytical chemistry and biochemistry. Although a network of ChemPU units optimizing azo dye colors is a far cry from a graduate student optimizing a palladium-coupling reaction, it's not that far off. The availability of platforms like ChemPU, built from cheap, off-the-shelf components, infinitely hackable, and under the control of potentially very sophisticated software, could help transform those areas of experimental chemistr...
Source: Depth-First - February 12, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Chemception: Deep Learning from 2D Chemical Structure Images
ConclusionI doubt that the current accuracy of Chemception's predictions would be of practical use today. Rather, Chemception provides a platform from which such systems may eventually emerge. Recent history suggests that such an emergence may be closer than it seems.Chemception offers a glimpse into a future in which lightly processed chemical datasets can be fed directly into off-the-shelf data learning pipelines to yield highly accurate predictive models. In this future, an iteratively hand-crafted molecular representation is no longer necessary. Instead, the system adapts itself to a much more raw form of structural da...
Source: Depth-First - February 5, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

The NextMove Patent Reaction Dataset
ConclusionOrganic synthesis underpins much of modern society. As such, it's hard to overstate the utility of an open, machine-readable, freely-reusable, annotated reaction corpus as large and complete as the NextMove reaction dataset. The theoretical and practical applications that have already appeared hint at some of the things that are now possible. (Source: Depth-First)
Source: Depth-First - January 28, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Scanner-Driven Parser Development
ConclusionsParsers developed with aScanner end up resembling their underlying grammar, making them easier to write, understand, and maintain than parsers written with alternative approaches. Provided that the language to be parsed can be represented in LL(1) form, Scanner-Driven Parser Development leads to running code quickly while avoiding unnecessary detours into theory. (Source: Depth-First)
Source: Depth-First - January 22, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Debugging ES Modules in Node.js and Mocha Using VS Code
ConclusionsES Modules can be debugged through Mocha tests on VS Code with Reify. No transpilation, no build system, and no alternative testing frameworks are needed. Simply require Reify through your project's.vscode/settings.json file. (Source: Depth-First)
Source: Depth-First - January 18, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

Computing Extended Connectivity Fingerprints
ConclusionsECFPs offer three features rooted in their construction that make them especially useful compared to other options:Simplicity. The base ECFP algorithm is composed of very few units, making implementation straightforward;Flexibility. Many variations on the base algorithm are possible;Readily Computed. The radius of perception can be changed to strike a balance between computational complexity and functionality (Source: Depth-First)
Source: Depth-First - January 14, 2019 Category: Chemistry Authors: Richard L. Apodaca Source Type: blogs

The Horrifying Future of Scientific Communication
Conclusion: maybe. Demonstrably worse? Authors of papers appearing in MDPI journals can expect none of the prestige that authors of Science papers enjoy. Imprimatur matters. Likewise, readers of MDPI journal articles will likely approach any new article with either no knowledge of the publisher or a negative impression. Conclusion: yes. Attracts fringe elements? The paper criticized by Lowe was authored not by trained research toxicologists, but by an "Independent Scientist and Consultant" and a computer scientist with a bachelor degree in biophysics. Conclusion: yes. Criticized as dangerous? Lowe isn't alone in his critic...
Source: Depth-First - May 9, 2013 Category: Chemists Authors: Richard Apodaca Source Type: blogs