์ฃผ์š” ์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

๊ท€ํ•˜์˜ ๋ธŒ๋ผ์šฐ์ €๊ฐ€ ์™„๋ฒฝํ•˜๊ฒŒ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜ต์…˜์ด ์žˆ๋Š” ๊ฒฝ์šฐ ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ฑฐ๋‚˜ Mozilla Firefox, Microsoft Edge, Google Chrome ๋˜๋Š” Safari 14 ์ด์ƒ์„ ์‚ฌ์šฉํ•˜์„ธ์š”. ๊ฐ€๋Šฅํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ์ง€์›์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ ํ”ผ๋“œ๋ฐฑ์„ ๋ณด๋‚ด์ฃผ์„ธ์š”.

์ด ์ƒˆ๋กœ์šด ๊ฒฝํ—˜์— ๋Œ€ํ•œ ๊ท€ํ•˜์˜ ์˜๊ฒฌ์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.์˜๊ฒฌ์„ ๋ง์”€ํ•ด ์ฃผ์„ธ์š”ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ

Elsevier
์—˜์Šค๋น„์–ด์™€ ํ•จ๊ป˜ ์ถœํŒ
Connect

Chemists unite! InChI is now built for collaboration โ€” and scale

2024๋…„ 11์›” 21์ผ

์ €์ž: Ann-Marie Roche

Photo depicting chemists (Source:: sanjeri/E+ via Getty Images)

sanjeri/E+ via Getty Images

The International Chemical Identifier (InChI) is going through a massive transformation โ€” meet two of its champions who are helping bring InChI to the masses.

The InChI story is something like a Hollywood movie. Itโ€™s about the quest to encode the essential structural information of a chemical into a character string so it can be used, shared and found by any interested party. It features an unsung cast of quirky characters. And now, thereโ€™s even a happy ending that offers a new beginning for any organization seeking to pursue its chemical interests without sacrificing IP.

The International Chemical Identifier (InChI)ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ โ€” (pronounced IN-chee) โ€” is widely used to identify the presence of a particular small molecule across the web or in a specific set of data. It has now been reinvented to take on other market-viable compounds such as organometallics, polymers and nanomaterials. In addition, once purely driven by memberships, InChI is now backed by a hybrid business model that allows customers to participate in a way that best suits their needs.

Sharing the love for open innovation

The InChI Wheel offers an overview of the current state of InChI's ever-expanded universe. Source: InChI Trust

The InChI Wheel offers an overview of the current state of InChI's ever-expanded universe. Source: InChI Trustย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ

โ€œGerd is realizing this wonderful transition from the old way InChI operated into a whole new way of thinking,โ€ says Dr Pieder Caduffย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, Elsevierโ€™s Senior Manager of Enabling Technologies & Innovation and longtime champion of InChIโ€™s potential.

Pieder is referring to Gerd Blankeย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, InChI Trustโ€™s Technical Director, who is overseeing the transformation. โ€œItโ€™s going from a single point of failure driven by membership fees where these contributors only got a limited outcome,โ€ Pieder explains, โ€œto a hybrid model that includes not only memberships but also in-kind contributions providing capacity and capacity development from other domains.

Pieder Caduff

Pieder Caduff

โ€œAnd by being scalable, extensible, open-source and collaborative, the new InChI represents a shift to a more practical, market-driven approach that can support existing and novel user communities.โ€

Gerd and Pieder have a longterm shared passion. In the 1990s, they were colleagues at MDL Information Systemsย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, the pioneer of chemical structure and reaction storage and retrieval, which Elsevier later acquired.

The release of version 1.07ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ in July 2024 from InChIโ€™s new home on GitHubย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ is a radical leap forward regarding the vision they share with the InChI Trust. Now, everyone can get a piece of the InChI action.

Headshot of Gerd Blanke, PhD

Gerd Blanke, PhD

Try InChI

The web demo versionย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ allows you to draw a chemical structure and calculate the InChI โ€” all in the confines of your own browser.

A future-resistant vision

Rapidly boosting its capacity in terms of both quality and quantity, InChI can now welcome new partners who want to go beyond the systemโ€™s roots in identifying small molecules and towards other market-oriented compounds.

With InChIโ€™s source code and documentation now on the developer platform GitHub, anyone can contribute to making chemical compounds more identifiable across data stores. In the process, a community of chemists and companies, including fierce competitors, is being formed that sees the benefits of making chemistry data more sustainable and FAIRย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ (Findable, Accessible, Interoperable, and Reusable) to drive innovation.

Birth of the InChI

In 1999, the InChI project was undertaken to solve a problem that had existed since the dawn of chemistry: How can a chemist be sure of what another chemist is talking about? Yes, you can draw a compound and share a picture. But how would you then search for it, particularly on the web?

Spurred by the rise of the internet, a group of cheminformatics researchers โ€” initially including Drs Stephen Hellerย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ and Stephen Steinย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ at the National Institute of Standards and Technology (NIST)ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ โ€” started to develop a standardized chemical identifier using an easily searchable string of text. Later, the work shifted to the International Union of Pure and Applied Chemistry (IUPAC)ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ.

In the delightful video Birth of the InChIย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, we see the two Steves happily bickering about who came up with the original idea. Regardless, the end product had clear and universal advantages:

  • It was non-proprietary (and hence free to use by any party)

  • It could be computed from structural information (so it didnโ€™t require a bothersome bureaucracy to OK each summation).

Soon, the value of having a standardized system became apparent as it was implemented in chemistry databases and toolkits worldwide โ€” including the worldโ€™s largest chemical database, Elsevierโ€™s Reaxys.

The InChIKey: Size matters

Still, a problem remained: the InChI needed to be shorter for easy searching. So a new version, the InChIKey, was released in 2007. While directly derived from the original InChI, itโ€™s always only 27 characters long.

Diagram of the InChIKey, a hashed version of InChI that allows for a compact representation and for searching. Source: InChI Trust

InChIKey is a hashed version of InChI that allows for a compact representation and for searching. Source: InChI Trustย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ

Soon after, in 2009, the InChI Trust took over the responsibility of its development, implementation and promotion.

โ€œThe Trust followed a very traditional model, with members making contributions for the further development of the InChI,โ€ Gerd explains. โ€œWith only a limited membership, the Trust could only afford one developer to take on the full responsibility. And as a result, it sometimes took years between releases.โ€

The rise of GitHub and open-source

Despite the limitations, the developer, Igor Pletnovย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, a Moscow professor of inorganic analytics, became a beloved figure in the InChI development communityย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ as he slowly but surely proved the value of a shared and standardized chemical identifier.

However, with Prof Pletnovโ€™s death in 2021, just as he had almost finished a critical bug fix release, InChI reached a crossroads. โ€œHe had all the code on one machine at the most important university in Russia, and unfortunately, it was unavailable,โ€ says Gerd, who was then brought in to take over.

โ€œHis son helped us retrieve some backups of the code from the new release. But the entire test environment was lost. And the code was also very black box, and nobody really understood it. So, basically, we had to start from scratch.โ€

โ€œThe world had already changed anyway,โ€ Gerd adds. โ€œThe world is now about open-source and the role GitHub can play. And this involved a completely new development and testing pipeline, so itโ€™s easier to maintain and extend the standard.โ€

Laying a new foundation

The latest versionย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ (1.07) has been released on GitHub for anyone to build on, with IUPAC as an active partner. โ€œItโ€™s essential IUPAC is involved,โ€ says Gerd. โ€œAfter all, they are the ones who set the actual standards. So itโ€™s a clear message that weโ€™re not just working in the wilderness.โ€

Another essential part of developing the InChI to be more open and sustainable is to provide a test suite, including test code, data and documentation. โ€œThe test suite allows computing of in-house data behind an organizationโ€™s firewalls,โ€ Gerd explains. โ€œThis way, we set objective and transparent quality criteria and enable convenient and replicable testing. Such a test environment is indispensable for collaborative development, especially with external contributors.โ€

Looping in a larger community

One early working relationship proved catalytic. โ€œI was lucky to get in contact with Prof Sonja Herres-Pawlisย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ of the Institute of Inorganic Chemistry at RWTH Aachenย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, who needed some extensions in InChI for her area of expertise,โ€ Gerd recalls. โ€œShe found funding to sponsor two developers. And thanks to that and the involvement of other partners and sponsors, such as German FAIR data champions NFDI4Chemย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, the Data Literacy Alliance-DALIAย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ and the Volkswagen Foundationย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, we could move forward.

โ€œIn addition, the Beilstein Instituteย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ provides additional chemoinformatic resources โ€” and thereby acts as a great example of how organizations can support InChI with program development capacity and domain knowledge. This integrated approach allowed us to make the modernized software publicly available via GitHub as open source under MIT License.โ€

Various working groupsย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ also exist, including ones for polymers and mixtures, Markush structures, reactions and organometallics. โ€œItโ€™s snowballing now thanks to everyone being able to bring in their in-kind contributions,โ€ says Gerd.

From membership drive to a community-driven hybrid business model

โ€œThis is the magic of having an open, collaborative and scalable model that allows people to participate if they want to support the initiative,โ€ says Pieder. โ€œAnd Gerd, as dedicated project lead, is here to ensure that all these contributions fit together by following the same rules โ€” which also ensures more frequent and reliable updates.โ€

โ€œPart of my role is convincing others to support us,โ€ adds Gerd. โ€œAnd this part of the job only gets easier as we get more channels and working groups. People can more easily find a way that suits their needs. After all, we still need continuous income from our members to further the scientific activities and push the standard forward. โ€œAnd these are early days. So we are still very open to new ways of working together.โ€

โ€œThis is the magic of having an open, collaborative and scalable model that allows people to participate if they want to support the initiative.โ€

Pieder Caduff

PC

Pieder Caduff

Elsevier์˜ Senior Manager of Enabling Technologies & Innovation

The roadmap to innovation

Indeed, as older companies begin to regard their legacy data as a potential gold mine โ€” especially if this data aligns with publicly available data โ€” the demand for InChI seems set to grow.

โ€œInChI is a neutral way to involve all these different information providers,โ€ Pieder says. โ€œAnd thatโ€™s why itโ€™s an excellent idea to have this common ground of a trust tied to the standards body. It becomes a place where everyone can work together, share the costs and later reap the benefits. The roadmap is now there for all of us to drive innovation.โ€

In other words, any interested party can now join the trust and co-star in their own InChI sequel.

๊ธฐ์—ฌ์ž