10 tips when unlocking AI for R&D
2023年11月30日
Ann-Marie Roche
© istock.com/metamorworks
The first edition of our new webinar series explores the perils and pitfalls of generative AI for R&D. Four experts talk about how to take advantage of new technology without losing sight of the big picture.
Elsevier’s four-part webinar series AI in innovation: Unlocking R&D with data-driven AI outlines the issues that can derail your AI projects and how to prepare yourself for these innovations — and those around the corner. In the first edition, a panel of AI and data experts explore the perils, pitfalls and promise of generative AI for R&D 打開新的分頁/視窗.
Moderated by Elsevier’s Commercial Director for Corporate Markets Zen Jelenje 打開新的分頁/視窗, the panel consisted of Elsevier’s VP of Data Science Life Sciences Mark Sheehan 打開新的分頁/視窗 and two experts from Elsevier’s SciBite 打開新的分頁/視窗: Director of Data Science & Professional Services Dr Joe Mullen 打開新的分頁/視窗 and Head of Ontologies Dr Jane Lomax 打開新的分頁/視窗.
With Elsevier’s history of providing enriched and curated scientific data in AI-driven solutions such as Reaxys 打開新的分頁/視窗 and Embase 打開新的分頁/視窗, this episode focuses on the questions our scientists, data scientists and computational chemists get from customers about AI and, more recently, large language models (LLMs).
As Zen explains: “These aren’t simple questions, and we definitely don’t have all the answers. But today, we have a diverse team from Elsevier and SciBite to explore some of these topics.”
Do watch the whole episode 打開新的分頁/視窗 — a lot was covered. Meanwhile, here are some tips from the panel for navigating these changing times.
Tip #1: Get your data in order.
It’s easy to get distracted by all the noise and hype around LLMs, particularly ChapGPT. But to take advantage of any AI technology, you need to start with your data.
“Your data need to be well organized, well-structured and FAIR 打開新的分頁/視窗 — meeting the principles of Findability, Accessibility, Interoperability and Reusability,” Joe says. “Only then will you be ready and flexible enough to quickly and seamlessly latch onto the best solution for the problem you want to solve.” (see Tip #2).
Tip #2: Don’t rush to a “solution.” Start by asking, “What’s the specific problem I want to solve?”
“You've got to remain focused on identifying what the problems are, and only then look at the ever-evolving solutions to solve those problems,” Joe says.
“Instead of thinking of it as whether to invest in AI,” adds Zen, "you need to ask the question, ‘How does this improve my research?’”
Tip #3: Don’t consider LLMs as an all-in solution — especially for life sciences. (However, LLMs can still be part of the solution.)
At the end of the day, scientific progress is built on providence, transparency and reproducibility. And LLMs like ChatGPT are simply not built for that — for now anyway. Currently, much of Elsevier’s work is built on ontologies. “These use language to create a model of a domain,” Jane explains. “It's a codification of what humans understand about a particular domain — facts as we now understand them. And I think that's always going to be something that's necessary and useful.
“LLMs, on the other hand, are probabilistic models that are really powerful at generating and understanding human language,” Jane adds. “They’re amazing, and we use them internally.” But unfortunately, LLMs also hallucinate, and the information is not properly sourced. So in the longer-term, many hope “to have an LLM with an ontology-based factual backbone — and then you’ll have something truly powerful,” she says.
“I also think that LLMs can bring value to one of our main aims at SciBite,” says Joe. “And that’s supporting data democratization — improving the access and interpretation of data. But LLMs won’t be able to supply this by themselves due to their limitations.”
Tip #4: Don’t underestimate scaling.
“One piece of advice: don’t underestimate the difficulty in being able to scale these types of technologies to production,” says Jane. “When we started with this three years ago, we ended up having to take a step back and first build the infrastructure and invest in the skills. We learned a lot through that process, but it was quite a learning curve. So, if you're investing in this, don’t overlook this. Come chat with us.”
Tip #5: Think operationalization.
“New technology brings new holistic cost considerations,” Joe says. “There are costs associated with rolling out some of these larger models: monetary costs, time costs, disk and carbon footprint costs, and so on and so forth.”
Tip #6: Get your hands dirty (while failing fast, learning fast and moving on).
“I read a McKinsey report the other day about whether you want to be a taker, a shaper or a maker in the AI space,” Mark says. “Are you going to wait until it’s fully cooked? Nothing wrong with that. And it can depend on the industry or your company’s appetite for risk and investment.” But for Elsevier, the road was clear: jump in now.
“And definitely having the right team in place is important,” adds Zen. “And since some of the questions we try to address in scientific research are really specific to the domain, it's also harder to wait for somebody else to do it for us.”
“It’s actually very fulfilling to bring the team together to work on new innovations using the latest technologies,” Mark says. “But it’s important to acknowledge there will be bumps on the road on that digital transformation journey. There will be mistakes and there will be failures. But it’s also incredibly rewarding when you get it right. You need to learn from your mistakes, pick yourself up and move forward.”
“And this is what we’ve been doing for the last 12 to 18 months in terms of GenAI, specifically LLMs. We’re getting our subject matter experts, our data scientists and our data analysts together to really get their hands dirty and ask, ‘What can I do now that I couldn't do yesterday?’ It's like you're building your muscles up in this space. You’re learning as you go.”
Tip #7: Think modularity.
“Our enrichment pipelines continue to become more automated and feature more of the latest AI technologies as we iterate,” says Mark. “And certainly, it's not the case that as soon as a new technology comes in, we throw out what we had before. It works well that we have a mix of rule-based technologies and machine learning technologies. And now we're exploring the latest Gen AI technologies. These can all be complementary.”
“We always try to find a way to integrate all these different pipelines, datasets and capabilities into what my team calls a Lego set,” Mark adds. “It's a great way to approach things in a modular and flexible way without getting too obsessed about the latest or greatest technologies.”
Tip #8: Stay on top of what’s happening.
It might be simpler to wait for others to fail and then adopt. But here you risk being left behind — and losing any competitive edge. As Joe points out: “Around 10 years ago, AI was beating humans at Space Invaders 打開新的分頁/視窗. Around five years ago, AI got better at Go 打開新的分頁/視窗. Just a few weeks ago, AI started beating humans in real-time drone racing 打開新的分頁/視窗. AI is evolving at such a pace, you need to keep yourself skilled up and aware of what's going on around you.
“And again, this is about getting your hands dirty. Reading a few articles and blogs isn’t enough. But it’s a difficult balance: keeping on top of things without getting sucked in, while just trying to identify those problems you want to solve.”
Tip #9: Keep humans in the loop.
As the panel discussed, Subject Matter Experts (SMEs) remain essential to validate the output of any AI algorithm — and more so when it comes to LLMs. For instance, these SMEs can be deployed as prompt engineers to ask the right questions to the LLMs so the resulting output is easier to validate.
“Prompt engineering is actually a skill that we should all have some appreciation and understanding of,” Joe says. “It's not as straightforward as some people might expect. You need to be able to relay your understanding of the world to an LLM … and this comes back again to the real importance of SMEs when applying it to scientific domains that really require some expertise.”
Tip #10: While waiting on regulatory decisions, aim to be responsible.
“If you ask me about the regulatory environment today, this webinar would be out of date in a month or so,” Mark says. And indeed: watch this space. But meanwhile, you should aim to be responsible. “Regulations are all about governments coming in saying we need to manage this space because we’re concerned about the future. But it could start with responsible AI where the actual practitioners go ‘How can we be responsible and ethical about how we approach this?’ And at Elsevier, we’ve really tried to bake this into our daily work from the start with our Responsible AI principles 打開新的分頁/視窗.”
For the full iceberg of insight, watch the webinar. And in the meantime, don’t forget to get your data in order (see tip #1)!