5 min read

Fixing Hallucination Challenges in Legal AI

Written by
CogniSwitch
Published on
September 10, 2024

Rajan: Yeah, I think we are alive now. Alive. Awesome. Welcome, Amit. Welcome Dilip. Awesome. So today we are going to have a very interesting discussion on this topic of what is happening in legal AI and what are some of the challenges in the legal industry and the AI intersection? Well, I said, you know, let's make it interesting. This can be a very boring topic; the legal industry can be a very boring topic. But then, like you know, the legal discussions are never boring. You know all kinds of legal discussions whether it is happening in the US or in India. So, let's quickly dive in in terms of the discussion. But let me actually first start with a quick introduction.

Amit, you want to spend 15-20 seconds just introducing yourself on what contract can, does and like,  What are the things that you are working on? And then after that we will move to Dilip.

Amit: Sure. Hey everyone. My name is Amit. I am the founder of ContractKen and we are essentially an AI Co-pilot inside Microsoft Word, which helps lawyers review and draft contracts. That is essentially what we do, our USP is, that we let all of this be done in a very private, safe and a reliable manner. And we are going to talk about the reliability aspect in detail today.

Rajan: Dilip, you want to jump in next?

Dilip: Yeah. Hi, I’m Dilip, I am the founder of CogniSwitch. Basically, what CogniSwitch does is to create that digital brain that can help enterprises, you know, reliably run their agents. When it is a copilot or autonomous agent, conversational agents, we basically use that AI brain to ensure that these agents work reliably within enterprises.

Rajan: Yeah. And just to introduce myself, my name is Thyagarajan. I go in my short form is Rajan. I am one of the Co-founders of Upekkha and AI accelerator for founders that are building global software businesses. So, I will start with Amit. Amit, what happens is that whenever there is a technology shift that happens. What you see is that there are few people who are earlier adopters. There are a few industries that are earlier adopters.  

So, you take any technology shift for that is DVD, whether it is like mobile or whether it is cloud. You typically see that the finance industry is the first adopter of technology because they have the money, they have the inkling to find out like what this technology can do for them, how can it solve the problem. And in that spread or spectrum its Jeffrey Moore has talked about in his book of saying early adopter, earlier majority late laggard. Usually, you see legal as a late Laggard in that spectrum of adopting technology.  

But if you look at if the Gen AI or this AI platform shift since the last 3 years here, we have been talking about legal and GenAI more than any other industry, more than let us say even finance. So, what is so interesting about like, you know, the legal industry and GenAI I where people are talking about Legal and GenAI as an opportunity space while all the past experiences has been that legal is a late Lagarde in terms of adoption of technology.

Amit: Yeah, I think, you know, some of this bad rap is actually uncalled for. I think lawyers have always adopted technology, although you know, not always the fuzziest things in the market and they have been working with, you know, various machine learning, text processing type of engines, you know, the last 10 or 15 years. And all of us know that, you know, law is extremely language specific, you know, text heavy, all the tasks are very, very text oriented. So, I mean it's no secret that large language models, you know, find a lot of applications in the range of legal tasks. So, you know, AI has been around in various shapes and forms in the in the business of law, right and little bit in the practice of law also for the last 10 or 15 years.

But when, you know, the whole ChatGPT thing and, and the whole thing blew up, you know, couple of years ago, I think lawyers got very, very intrigued. And, you know, in all the conferences that I attended and, and you know, all the conversations that I was having, everyone was extremely curious, wanted to be educated about, you know, what it can do. Does it really deliver what it promises? You know, what sort of use cases are the right ones for lawyers to start, you know, experimenting with it.  

And what has happened is, you know, over the last, I would say 24 odd months, I think now we are close to 24 months, right? Since the GPT 3.5 release, the conversations have sort of shifted from the initial buzz and the hype to, you know, education related discussions in 2023 to, you know, testing out evaluations, you know, real pilot projects, real proofs of concepts and then how to move that into production in 2024. So, I think as we speak the industry and when I say industry, it is both, you know, lawyers who are working in law firms as well as lawyers working in enterprise legal teams. You know, are kind of fully. You know, I would say with the fact that AI is here to stay, it's going to transform a lot of legal tasks,

You know, whether it is case law research, you know, summarization, document drafting patents, e-discovery, you know, everything is these days finding a lot of, you know, AI applications. So, I think the, as you said, Rajan, this is not the industry of the space which sort of adopts this sort of technology first up. But I, you know, from what I hear that it's like one of the top 3 industries or spaces which has picked up the use of
GenAI and I think it is, it is going to become, it is going to accelerate because everyone that I talk to, you really do not have to sell them on the concept of AI, you know, there is no discussion around, oh, is this really applicable? You know, in our context of contracts, everyone is really curious to understand how are you solving a particular problem? What are your capabilities? How are you different from, you know, another provider and are you the right fit for my use case? So, I think the initial hesitation, you know, anxiousness towards a new technology is not there. Now having said that, I am of course, we are going to discuss the whole issue of reliability and hallucinations. That is the topic that I think is a big hurdle which we still need to...

Rajan: So, before we go to the challenges. Like you know, just paint the picture of like what are the opportunities? You are saying that look, it is not a matter of surprise, people have been using this for like the last 15-20 years. Maybe there are like, you know, opportunities on software and legal that is large enough. But if you want to paint the picture maybe in terms of like market size or type of opportunities that are there broadly in the intersection of like legal and AI,

Amit: Yeah, I mean it is it is really, really huge. You know, whichever way you look at the market, right. Whether you look at sort of the some of the estimates of the value of the dollar value of the legal services which are delivered annually. Let's say in the US, I think it is close to 750 billion dollars annually, right. Or you look at just the you know, the law firms, you know most of them report their revenues. If you look at, you know, the total number of lawyers and I am specifically talking about the US market, you know, we can of course extrapolate worldwide. You know, the work, there is a humongous amount of work. And then there is a whole aspect about that large parts of the society do not get access to justice because law is so expensive, right? So that's the other aspect that AI and this kind of technology can actually make access to justice possible for, you know, a lot of underserved sections, you know, as far as legal.  

Rajan: So you what you are saying is that you know, yeah, there is about 750 billion dollars that is spent by legal professionals on legal professionals. And that still doesn't serve the legal needs for like, like, you know, large special population, which means that, you know, if there was more affordable than it may be a spend which is much larger than 750 billion dollars.

Amit: Yeah, absolutely. I mean, you know, we can, you can have a separate conversation just on this topic, you know, whether you actually want to increase legal activity, you know, especially litigation, which everyone is not a big fan of, or, you know, you want to give more access to people and get them, you know, better understanding of their rights and be able to, you know, claim what is rightfully theirs. So, so it's a huge market, lots of different types of work and, and as you are, as you are probably aware that traditionally most of this work and, and I think when law students go through their, you know, education and then they go through their apprenticeship and, you know, their internships and they go through their career, right.  

So, most of the skills that they learn is through reading the documents, through shadowing experienced lawyers, learning by doing right. And these are huge, you know, hundreds of pages long text and you know, it's a, it's really a skill that you need to develop to be able to read, understand, comprehend, you know, be able to refer that text in a relevant context, right. So, I think, you know, all of this is something which is, you know, actually being welcomed by a lot of people who feel that a lot of this work is better suited for machines than humans, right? And humans should actually be using the output of, you know, these sort of algorithms and do things which machines cannot, right? So, I think that that's the way the industry is moving towards.  

Rajan: So, I see a market map that is on the screen right now. Is it already that there are so many like, you know, legal tech, legal AI startups in the world right now? What are the different types of problems they are solving?

Amit: Yeah. So, I think this is just the early stage. So, you can probably multiply this by 2 or 3 if you want to look at the total number, right? And I think there are 2 or 3 big categories, right? So, there is transaction law, right, which is primarily corporate law work which has to do with contracts, documents, right, disputes and things like that. There is a big section of work which is related with litigation trial. You know, the work which happens around, you know, work in courts and things like that, and then just the management of operations, right, intake of work, management of your practice, you know, building things like that. And then there are areas like, you know, patents, so on and so forth, right, which are again a subsection of law, but a huge area in itself. So, there are many areas, and I think there is not any practice area, which is untouched by, you know, by AI right now.  

Rajan: So out of the 750 billion, where is it spent most? Which one? Where is the use case of having generative AI most prominent? Is it contract? Is it patenting?

Amit: So contracts is definitely one of the biggest application areas as I think most of us can imagine, both in drafting and review and negotiations. That is one of the areas which a lot of companies have been tackling for, you know, the last 10 or 15 years. And now with Gen AI, the whole scene seems to have changed, right? Because a I can do like 10 X the amount of things which were possible in pre GenAI world litigation, you know, which requires a lot of legal research, you know, case research, you know, case law summarization, things like that is a big area, big application area. Patents is of course another e-discovery, right, which is again linked with litigation. So, all of these areas where you have large volume of, you know, repositories of documents which need to be read through, referred to produce the final output or interim output or check something or query are finding, you know, tools and all. And there were existing tools. So, you know, so I think there are some very large these tools.  

Rajan: Amit, which one, I know you have spent like 15 years in the industry, which ones are like, you know, with the legal and AI growing the fastest and which are like by the revenue. If you were to pick like, you know, 3 or 4 names from that complex market map that we looked at, which one have the highest revenue there? Like, you know, obviously names that are out there.

Amit: OK, so I you know, I don't think the most recognizable names are on their rights. Of course, Harvey Case Texts, which is a part of Thomson Reuters now are the 2 biggest sorts of, you know, players which are taking more of a you know, a mild wide inch deep type of an approach, right. So, they basically are saying we will build the legal AGI, you know which is saying okay, we will do contracts, we will do drafting, we will do litigation, we will do research. We will do everything through sort of one type of single interface, you know, chat interface, right with of course, you know, RAG pipeline. What is the size of their business, what is their revenue or what is their growth rate in the last one or 2 years?

Yeah, I mean all of this is private. So your guess is as good as mine. You are probably closer to them being based in SF. But you know, I think the way you know, we can sort of extrapolate from the speed at which they have been closing various rounds of funding, right. So going from seed to Series A to Series B and case text actually got bought over by Thomson Reuters for you know, 600, $1,000,000 plus. So, you know there are no public figures available but definitely one of the big use cases of GenAI in the industry, right? And I am speaking cross domains, not just people and then within the contract, yeah, within the contract space, I think Spell Book is one player which has really sort of taken the lead. But there are lots of other companies, you know, including ours, which are competing with Spell Book in various parts of the market

Rajan: So, what are some of the challenges? What are now like, you know, friction that is coming in the way for some of these? I am assuming that they are let's play $50,000,000 or $100,000,000 in revenue or like you know, inching somewhere there, let's say that inching to $50,000,000. The TAM of the space is about $750,000,000. Sorry 750 billion that you said right. So, there is like a lot of opportunity for growth in terms of you can narrative that they are painting. What is the friction that is? I mean, like if people are looking for a solution like that, is it because that they are not able to deliver the solution in the way the expectation is there in the market that is in the way?

Amit: Yeah, I think yeah. I think, you know, specifically to legal AI think the typical issues to any Gen AI application like privacy, you know, uptime, you know, change in workflow and the people aspects. And I think all of those things are for all domains, not just legal. But I think specifically legal reliability is the number one issue, right? And hence the sort of discussion topic, right?

Because you know, lawyers are essentially, you know, they, they have a fiduciary authority to do risk management and produce reliable output. So whatever tools they are using have to have a very, very high level of accuracy, right? And those numbers need to be like extremely high, if not 100%. So, you know, some of the events which happened back in 23 and early 24, I think some of the, you know, which got a lot of press, you know, some lawyer used Chat GPT to cook up some, you know, fictitious cases or somebody used ChatGPT for, you know, creating sort of false narratives in a, in an actual legal case actually brought a lot of bad rap and focus and attention on that just out of the box.  

Using GPT 4 is not going to give you a reliable solution, right? You cannot use that in production. You know, however cool the demo might look like. So, you know, and, and if you, you know, sort of take a step back and peel the onion a little bit, right?  

I think the, the, the whole, you know, challenge is that, you know, by definition, right, legal opinions are actually not atomic facts, right? You know, on at some level law is essentially a contested concept, right? So, when you are doing, you know, an API call to and you know an LLM or you are doing a RAG pipeline call, deciding what to retrieve can be very challenging in a in a legal setting, right? So, a RAG system must not only be able to locate information from multiple sources across time and place, you know. And if there are no set of available documents that answer the question definitely, you know, then it has to come back with like you know, cannot be answered or it has to refuse the answer, right.  

Rajan: Amit I'll come back to you on explaining RAG. I think you know, straight away jumped into RAG. But you know, I will also use this opportunity switch to asking Dilip. Dilip, like we are talking about challenges in GenAI applications getting adopted in the legal industry and Amit talked about reliability as a big challenge. Now like you know within your sphere of experience and like what you do at CogniSwitch which you look at multiple use case legal is this one vertical that you look at. So, what are your thoughts on like, what are the challenges with respect to the use case within the legal industry? And you think these are the same across the other verticals that you look at?

Dillp: Yeah. Rajan, before I go there, I have one point to add on your question about why legal is now ahead. And it is a connection.  

Rajan: Yeah of course. Why don't we start with that?  

Dilip: Yeah, yeah. So, I think, you know, if you ever written code,  anyone of you in the audience, I am sure lots of you have. Then you realize that legal has got humongous amounts of text, right. And programming, creating programs around it is not everybody's cup of tea, right? Because there are huge amounts of concepts. There are there are penal codes, there are huge amounts of case laws and there are there are also verdicts that come out of it through it right. So, it's not been very easy. Whoever is automated.  

I mean people now it is like the chat GPT time right Everybody is getting a chance to try these things out. So that's. Thats a great thing. So that's why I think you find a lot of interest their lot of early stage. That doesn't mean we have already finished the task. It's a path that they are on. So, I think. I think that huge amount and by the way, LLMS have this great capability of understanding the patterns that humanity uses with respect to natural language. I think that's a very natural fit. Having said that, reliability is definitely an issue and like I have been salivating when I saw that early stage. You know, so many of them forget her way, forget who or else Thompson, right? I mean, we all of them a chance, right?  

Because it's now a level playing ground and I think if any of them who get to fix reliability issues, you know, and give because that's another side of the coin. With all these humongous amounts of textual content comes the need. And that's why I think human beings have always been in the middle of, in this industry, especially in many industry people this one. And I think that's why you know, and I think if the cost is high then then the expertise must be very, you know limited, right.

So having said that, why expertise is so important and therefore costly is because it's critical, right. You are going to rule something, and I am going to decide on somebody's fate either monitory or Lifewise or whatever. So, I think it is crucial that you if you take this then reliability as Amit said is amazingly, amazingly important as well. So, when we when we look at it, so, you know the CogniSwitch story, you know, but for the rest, the rest of the people who are the audience, we started off and we still are in the business of building digital twins of knowledge, whether it is related to case laws, whether it is related to some disease.  

You know, I think. We believe that if all this has to work in production, as Amit was saying, definitely certain things have to be assured, especially in domains like the financial service, insurance, life sciences. Reason? Because the conceptual structures you LLM is good at playing with humongous amounts of text, but there are there is definitely lack of conceptual clarity. Think you know connections with various entities in a Penal Code, various concepts in a Penal Code, how they are connected. When you have, if you have a case or somebody's rule, you know, somebody's made a ruling, a High Court, Supreme Court made a ruling, how and why are they connected to some Penal Code, right, based on which of multiple penal codes based on which is ruling itself, right?  

So, these connections, these concepts, these connections an expert lawyer or whoever is in the profession has that right there and it's not just text. So, I think this is a big issue in these kinds of industries. And what we have seen is if you can build out not exactly but at least some level of that expertise in a I or machine usable form, then many of these things these great startups including what Contract Ken is doing. We can see how already the difference that contractors can is making compared to some of the others. It is basically because now you start looking at employing those concepts together with the huge capability of the LMS, then it is an invention. So, I think it is critical that this gets into play.

Rajan: Yeah. So, to your earlier point about legal and like you know, LLM, the point that you made about like you know that unstructured data, that unstructured data LLMS are really good at dealing with. And so far before this, you know, all the technology that we had, like the data that gets ingested into a computer or database, a data warehouses all had structural relationship. And you know, toying with that is a very different class of problems when compared to dealing with volumes of data of case laws and others.  

And that's where I think the LLM’s coming in. And that's where you said that the next point of saying that LLM is this is something which is just basically predicting the next word, right? That's the fundamental of the technology. And another way we can say it is compressing a whole lot of unstructured data and then bringing in some implicit patterns with between that. But that is very, I would say pattern matching like and that is not very concept like.

And that lack of that concept is, is where like when a question is asked, it may come up with a with an answer which looked like a valid pattern, but they not be the right pattern or the right concept, right? And that is where the hallucination or the reliability issue comes up.

Dilip: So, what I am trying to, Rajan sorry. What I was trying to say was it's not like what is the capital of India, and the answer is something Delhi, right? It is not the same when it comes to complicated domains like legal and insurance and especially regulated and legal chemistry. That is what I was trying to say,

Rajan: Yeah. Which to me is a surprise, right. So, if ever, let's say like a use case is something like writing a blog post, writing a blog post and then, you know, you listen it a little bit, that is more a feature, right? So, you are like, you know, expressing yourself creatively. But if you actually come up with this reference to a case law and saying that what had happened in a particular case, and if that turns out to be fictitious, then lives are dependent on that. Like judgments and reputation are dependent on that.  

And yeah, so to me it is very interesting that, you know, the legal industry is adopting this where the downside of going wrong is very, very high. And how they fix it and how they address it is an important parameter. I mean, if there was like, you know, some use case related, let's say sales or marketing and if it impacts, if I if it impacts the revenue number, then there will be rollbacks that will happen. But here in legal, it seems to be like, you know, the ship is going at full throttle. And yeah, I am very excited and curious about that.

Amit: Yeah. And, and I think, you know, the, the whole hallucinations thing was brought, you know, front and center by Stanford University research, you know, empirical study which came out I think about 6 months back. And we have a, you know, snapshot of their study which basically looked at GPT 4 and, and they are looking at legal research tasks only, right? So, nothing to do with, you know, contractor or anything. So, these are specific questions around, you know, case laws, legal research and things like that.  

So, they looked at 3 of the industry leading tools which had launched their own, you know, AI assisted legal research capabilities, and the Stanford team had access to them as well as GPT 4. So, what they did was they created a set of 200 tasks, right, split into 4 or 5 different categories and basically ran them against these tools and GPT 4. And they had, of course, you know, experts who were evaluating the answers. And as Dilip said, you know, these experts already knew what the right answer was, right? So, they were evaluating whether, you know, it's...  

Rajan: So, I can only see Lexus, West law and GPT 4.
Amit: Yeah. Practical law, yeah, yeah, yeah. So, you know, they didn't include the Harvey or some of the other tools and as you can see, right, the ones in red are basically hallucinations, the one in yellow are incomplete or sometimes refusals and the ones in green are accurate. So you can see that you know what their conclusion was, which was actually very hotly debated and refuted by some of these industry players for various reasons.  

And you know, we can get into that, that about, you know, 15 to 30% of all the output was hallucinating, right? And as you said, Rajan, this is almost unacceptable in legal domain, right? I mean, you can't have half of your or a 3rd of your queries being answered with the hallucination. So, a lot of, you know, back and forth. Yeah, you and you know, we will talk a little bit about what hallucinations mean in the context of legal in a couple of slides from now.  

But I think this created a lot of effort around, you know doing some you know, concerted data set creation, you know, measurement of the quality of the responses which are coming out of these tools. And now I think all the serious players are now building out some kind of confidence course with their genuine output, right. And we are doing the same at our end where anything, you know, although we are not generating in contracts, we are not really generating responses like legal research tools do. But even for recommendations that our contract tool is making basically, attaching the confidence score based on that.  

I think one last point I want to make on this is, so GPT 4 is like, you know, what you call as a, as a closed book answer, right? The other 3 tools actually have applied the normal RAG, right, which is the standard RAG technology, which you can call as an open book. And even with that...  

Rajan: Amit can i just interrupt you here to say like, you know, can you just explain what RAG means and why it is relevant here?  

Amit: Yeah. I mean RAG is as simple as, you know, saying that you want the answer to be verified against, you know, reference source, which is a source of truth, right, which is a ground truth. So, if I am asking a question about a very specific, you know, sub-sub section or aspect of a case law, the tool needs to have access to that actual case law and not depend upon what the large language model hazy representation exists in its training right, because a lot of these large language models, including GPT 4 and cloud are all trained on case law. Thats a well-known fact. But how it was, you know, the data was prepared and what how accurately that can be fetched at runtime is a big question mark, which results in hallucinations. So that's where these guys have all built basic RAG pipelines.  

But in spite of that, you see the results and that's why, you know, the basic rack does not work in legal.  

Rajan: Yeah, yeah. So, somebody gave me this analogy that you know, the general LLM query, the chat GPT/ Claude query that you use. This is like, you know, answering a general knowledge question, a GK question. But then like, you know, a GK question is not good enough. So, you have to actually go and get the person to study 10th standard textbook or like you know, in this case, legal case studies, right, Study those textbook and case studies and then answer questions in that particular domain so that you know, you are a lot more grounded. So it is like, you know, teaching a person who is good at general knowledge, specific knowledge of that particular subject so that the answers are a lot more specific and closer to what the domain is, is as opposed to saying, Oh yeah, I know the general knowledge that, you know, the Shah Bano case happened in 19X3 and things like that, right? I was just saying, okay, 2023, like this was the Supreme Court specific case that had happened, and that understanding will only come when you study those legal knowledge, legal knowledge books, right?  

Amit: When you have access to that book, you know, so it's an open book exam, right? You have access to the reference text when you are answering a question, right? So, it's akin to an open book exam. But even that is not sufficient for legal right. I think that's the whole point of what I think Dilip is going to talk about later in in the  conversation that you need specialized type of RAG technique.  

So, if you can actually put up the slide, I will quickly dwell a little deeper into what sort of hallucinations do you usually see in the legal aspects. So I think, you know, hallucination is something which is factually incorrect or is incorrectly, you know, grounded and you know, factual hallucinations could be like unfaithful to the training data or unfaithful to what the user is asking for or you know, unfaithful to the true facts of the world, right. So those are sort of ways to think about it. If you can go to the next slide, you know, there are 2 ways to think about hallucinations, right? There is a correctness aspect, you know, how actually accurate is the tools response and then there is a grounded-ness, right? How accurate or how is the tool's response related to the sighted sources? Right, because all these tools, they come with citations, right? Because they have a basic RAG pipeline built in. But if you, if you go to the next slide, we will, you know, we will talk about where the right use cases lie.  

So, you know correctness, of course you know you can. You know the response can be correct or it can be incorrect or the tool can refuse right on the X axis. But grounded-ness is like when the tool is providing you know actual propositions which make valid references, or you know it could be mis-grounded when it has answered you know the query correctly, but it is actually cited a wrong, you know case or a document or a reference source. And this I think is very dangerous, especially in legal because you might actually get the right response, or the correct response based on wrong source and draw very different conclusions and you know go on a very, very wrong path. So, having both, you know a correct and a grounded response is paramount, right? I think Rajan, you were talking about marketing or some of the other spaces where I think correct answer can work in more cases, but you need citations in this, in this space.

Rajan: Yeah. So on this RAG part, Dilip, can you actually jump in and explain like, you know why and how RAG solves or does not solve or is the source of hallucination? And like, what is the approach that you have seen taken that actually fixes some of the issues with like, you know, the basic RAG and how do you actually get it to a reliability place where it it is really useful for people in the legal industry?

Dilip: Rajan, your voice is breaking.

Rajan: Yeah. So can you actually talk about RAG and what are the alternates to actually fix the issues?

Dilip: Yeah, so, so first of all, for those of you might not know, RAG stands for Retrieval Augmented Generation. And I think there is a subtle difference between ChatGPT or GPT 4. Can you guys hear me?

Yeah, yeah. OK. The GPT 4, which we can assume is Chat GPT and then the other side, you said RAG was used. The reason being many of them would have you know when the training happened. Many of them would have seen content related to Penal Code or maybe some of the existing case laws that has been used probably. But the RAG pipeline is because there are now changes to the Penal Code or there are some new case laws that have come.

So, it is external sources from which you need to pull in relevant data and not depending on LLMS patterns that it has seen or data that is seen. So that is a big difference because you are now depending on only like you said, close book it is right, only what do you know, right, what the LLM knows. The other is what you know. You are also bringing retrieval from external sources and augmenting it with what the LLM knows. So that's retrieval, augmented generation.

Now what we are seeing is even when you say all these things, if you see how RAG typically works, there are huge amounts of textual natural language information which is now cut up or chunked up into pieces. And then those are actually kept indexed and stored somewhere. And when a, when a request comes in, when somebody asks a question, right? So, if it is, if it is directly a question related to provisions of a Penal Code may be an LLM can directly answer it. The closed book will work. Now if you are asking a question related to a case that happened recently, then some external source from which you need to understand what happened in the case. So, let's say the case information is now chunked up and you bringing those chunks of text which the LLM has not seen for sure before. It has some information, definite information, which it has seen about the Penal Code and probably some other case laws. Now like you rightly said, an LLM large language model is a probabilistic engine based on the patterns.

Rajan: Like a parrot.

Dilip: That’s slightly derogatory word, but I think yeah, it uses some kind of probability with a kind of, you know, matrix that it has to keep predicting. Like in your old AI also keep predicting things. It could work, but there are some issue challenges there. Now when you get this text, it is not the full context somebody is retrieving outside of the LLM. Somebody is retrieving certain pieces of text which they think is relevant for this particular question and we can get in the details of how it is done. But to keep it simple and then you are giving it to the LLM who let's assume in full faith takes that and uses some patterns that it has internally and try to make create a generate.  

That is why they are called generated, you know responses. It generates the actually, predicts the next set of text, which is answer and you are also many times works well. But because of these gaps we talked about, which is conceptually the LLM has those concepts being available and the relationship between the structures and the inability for the LLM to know whether the most relevant pieces of text has been pulled out by this external system that brings the text to it. It all adds to the problems that we are now collectively facing.  

Rajan: So why does that hallucination happen, right? You know what is the source of the hallucination? Is it because of the specific way the systems are designed?

Dilip: So when you like you said, imagine you give a textbook on physics first to someone who doesn't know physics at all, which is not. I am just making this up right? Conceptually you don't understand gravity and now if you have to explain what happened when you know Albert Einstein saw the apple fall down, it's going to be very difficult. Even if it has seen some text about certain things that Albert Einstein saw maybe one page right up about him, it is not going to be very easy to define to actually explain that, right? You can get when you have questions and in illegal domain, I am sure it is not a simple question. It connects up multiple things, right? Things related to that case, things related to Penal Code, multiple contextual stuff. And that is where the hallucination happening. Because pattern wise it could. Execution could be a contract execution or execution, God forbid, of a human being, right? This both figure in in your in your legal domain. Now you can get confused right.

Rajan: But, but Dilip isn't that actually handled by embedding models where like especially in like you know how the language models are designed. Saying that you know, when you look at the attention part of it, when I am going into the specific or like you know, the technicalities of how this is done, they say that you know these 2 words like you know the execution in contract and execution in killing someone is too different meanings. But like that is provided in the context. So, so at least the new models of Transformer and others are supposed to actually learn this and give the right meaning of execution differently. But still the hallucination comes right which is beyond that context. This is understood, right? And is that because of what?

Dilip: So, let's take first add to the external, let's keep LLM aside because already malign quite a bit, right? If you take the other side which is RAG the away, we are doing it today. These chunks when I told pieces of text are basically 200-300 words, which is like maybe 600 tokens, right? And those are actually sent, those pieces of text are sent to a, you know an end point which does converts those words, those 200, 600 words with respect to, it's also related.  

Thats why people say you should also know what what are are you using to create those embeddings. But eventually when you give these 200 words or 300 words, what you get in return is a vector which is right numbers, correct numbers. And these are what is kept inside an index, right. And when you have a question, maybe it is 20 words, right? And then that is also embedded and, and a vector comes up, right. And you are now taking that 20-word vector and hitting it against the 300 indexes of index made out of 300-word vectors.  

If the corpus is big enough, you can potentially get many hits. Of course, the RAG community has really gone and done some fantastic work to now re-rank it and to do various forms of RAG to ensure that because the first problem was that the top 10 was not relevant for this particular question, right? It might have met the cosine some kind of similarity mechanisms that they use. They use multiple. They have too technical, if you know, I don't think we should get into this, but they use similarity mechanisms to find out which are those chunks of text I should pull out now those don't ensure that the most relevant for this particular question is coming out. Maybe that's in the last 20, right? People realize that.  

Rajan: Dilip, how are you? How are you then fixing it? OK, I think, you know we are getting too technical to the moment we said embedding, the moment we said vector, I think you know we have lost 90% of the audience. We are taking some words. We are like you know we are translating that words into some other representation which is like numbers. We are saying 2 numbers are to be matched and seen. How similar are they? And like you know execution in one context will be similar to contract, another context will be similar to killing, right. So, so that is that is the thing that is happening in the back end.  

But this, is this what you are saying is the source of hallucination? What are you doing to help fix that? Like, you know, how is CogniSwitch or like you know, or what are other approaches other than CogniSwitch which that people have taken in the industry to solve this hallucination issue that people building legal AI apps can actually adopt?

Dilip: Yeah. So, before we get into what CogniSwitch which is doing, you know, you last 2-3 months maybe or more, you must have heard about something called Graph RAG.

Rajan: Yeah, I mean, there is a meme on Twitter. There are like 16 types of RAG. Basic RAG there is Graph RAG, there is Extension RAG and follow on. So, tell me more about Graph RAG.  

Dilip: So some people. Also, there is one variation people called knowledge graph because graphs could be different from a knowledge graph. But anyway, it became very popular because Microsoft actually released a paper on Graph RAG. Basically, what that says is, I mean, the idea is can we take the content and instead of just. We need to use embeddings, we need to use vectors.  

But can we first take this content and actually take knowledge and insights in that content and transform it into a different information structure, you know, first and foremost? And can we automate that process and magic, like magic. And people figured out that the LLM can actually help in that task, taking large corpuses of text and then helping that, which is like very unstructured, even if it's structured its natural language data and unidimensional use certain NLP and other mechanisms including LLM.  

We use LLMs a lot to actually extract, mine, relevant information so that we can create a more connected, richer, deeper information structure with on which now we can start providing conceptual connections when a question comes in. For example, if the question is about something to do with a certain type of, you know, crime and it and therefore you can find out if this is a crime, what what provisions of which penal codes apply, what provisions could potentially go in.  

You could also have this particular graph is nothing, but you know, like you can, you can say entities and relationships and people who have even programmed using, you know, object-oriented relation structures. We will know entities and relationships and also properties of those entities and those relations. So mining that information like for example, if it is related to some violation, that violation is an entity and then there could be some properties related to that and that there it could be related to it. It is kind comes under the provision of a certain Penal Code. That Penal Code could be another entity and there could be a relation between those 2.  

Rajan: So, information structure right to say, would it be right to say if I were to paraphrase you the GPT or when we are just using OpenAI you know, directly, then it is taking a general knowledge quiz about may be a legal question. When you applied simple RAG, it is about making sure that that GPT system is given the context, or it is made to actually study the legal text. But then it is probably just mugging up. So, if it is asked a question, it just picks up without the conceptual understanding of that. And then, you know, just goes and looks up on the index and then says, oh, somewhere I have heard the word like this. So let me actually give you an answer which is much closer in the legal textbook as opposed to the general knowledge textbook it has been trained on.  

Then with respect to Graph-based RAG, the AI system is made to understand what is there, in the legal textbook a lot more conceptually and form conceptual graphs in its memory or I mean in its internal system and answer that question. So therefore, the answers are a lot more conceptual. So, if you throw a conceptual curveball, the guy who has mugged up and prepared for my exam cannot answer. So, would that be like a good reference between RAG versus Graph RAG?  

Dilip: It's a very good example. And especially even if it's closed, if that learner had actually understood the concepts first, then if you come up with a problem where that concepts could be applied together with some other external information, it could potentially be able to handle it. Well, you are right, the person who wants it will struggle and when a problem comes which is which he or she has not seen before and the other person who could apply the concepts could do well. And by the way, if you could combine these 2, some magic could also happen. That is what CogniSwitch is trying to do. Imagine one side of your brain is that mother and the other is the fellow with the concept, right? Imagine putting those 2 together and some basic rules which your teacher told you. Even if this says something, that says something, you also have some rules you have to apply from your mind. Imagine a 3-piece system that works.  

Rajan: Awesome. So, we should jump and take some questions. So, for those who are listening in, please put down your question on the comments box and I will pick them one by one. There is one question from Manoj and both Amit or Dilip can pick this up. His question is does this help in understanding criticality or relevance of subjects so that it does not take outlier event as an important event. And I am assuming Manoj is referring to more from the context of graph. RAG Who wants to take it? Dilip or Amit.

Dilip: Yeah, go ahead Amit.

Amit: Yeah. I mean, I think in in terms of structured data, I think outlier detection and treatment is very, very, you know, regular technique that is used for building and, you know, using machine learning models. But in Gen AI, I think how the outliers are treated by the GenAI model is something which only the people who have trained and prepared the data for Gen AI models know.  

So, for example, you know, if there is some very unique, you know, peculiar law in a particular state of US which does not exist in any other state, is that going to impact the response to a query which is somehow related to that peculiarity? Yeah, maybe, yes. So, I mean, I think the short answer is to kind of manage those type of things, you definitely need to build Guard-rails, right.  

And I think CogniSwitch, you know, is, is kind of, you know, one step in that direction where you can, you know, use Guard Rails and RAG more intelligently. But I think a lot of these questions are, you know, dependent on who has trained the model and how they have prepared the data.
Rajan: This is like, you know, coming up with a question which is out of syllabus, right.

Dilip: Can I,  add to that?  

Rajan: Yeah, go ahead. So, so, so the other thing we need to also

Dilip: An outlier event is, you classify it as an outlier because you probably are not even seen some of the patterns that could potentially make it a non-outlier because as a human being, our analysis would have said this is outline. But maybe there are some of these AI models are coming and telling us this is not outlier. It can tell you it sees those patterns because it processes humongous events or data and could come and say this is not an outline. And this very much is something that could happen. But you are not seeing it and therefore you are attributing some other reasons. So you are trying to find out those connections. So yeah, I mean that is a good, interesting way to look at the

Rajan: So yeah, I mean that is a good, interesting way to look at tit. Let's go to the next one which is from Jiten. What he is asking is if GenAI gets better at reasoning, can it help reduce hallucination and improve reliability? And can reasoning be an approach? Can reasoning, can be made better by building knowledge graphs? Dilip, you wanna go on this one?  

Dilip: Yeah sure. So quickly GenAI is a system and it's not just LLM right. So, if you if you think Generative AI, LLM is a very key component. But I think, thinking that everything will be done by LLM is a folly right. So therefore, you have multiple things RAG itself is one part of the Geneva system, right? And various types of racks are now coming as you. So reasoning is something even reasoning capabilities.  

They are saying like for example, you create agents who actually divide your main task into multiple tasks. A multi-hop question, like a complex question gets split into small, you know, simpler questions and then people find answers. These agents find answers to each of those simple questions and then you take it and then make an LLM probably answer the questions. So those kinds of things have been tried. So definitely everybody is working hard to reduce hallucinations. But you also look at how a knowledge graph is, everything is deterministic. So, this is in a way it’s symbolic right? Like rules and logic, these connections are deterministic and therefore those are very symbolic mechanics of doing. So, you are actually combining the great neural capability of an LLM and the great deterministic symbolic capabilities of a knowledge graph kind of structure.  

That is what I meant by the left and the right brain. You bring it together, That is where definitely if you bring it together and most of the time this knowledge itself keeps updating, right knowledge, if it had stood still, we would not have hit came where I come to where we are today. So therefore, an external symbolic kind of knowledge graph kind of system that keeps getting dynamically updated and then and a neuro capability and when very neatly coupled together can definitely create very reliable system. It can be, it can. We can do it today.

Rajan: Just Dilip the next one. And again, I will ask you to jump in. This is from Sandeep. He is asking well access to private models like Lama to address the hallucination challenge to a certain extent. I thought like, you know, llama was more publicly available. But anyway, I did. The spirit of the question is, is that, you know, can Lama address hallucination challenges to a certain extract

Dilip: ...

Amit: I mean I don't think you know, model A or model B, you know, can actually solve hallucination because it is actually not a bug, right. It is a feature of how large language models work yeah. So, it's. It's wrong to think of them as a bug to be solved, but, you know, a situation to be managed in the context of your business use case. And I think, you know, in the case of legal AI, I think what people are doing to solve these hallucination issues, which are much more acute in legal research space. But, you know, companies like ours, which are building in contract space are learning from those, you know, companies how they are solving hallucination. You know, we are trying to apply similar techniques. I think you need, you know, you need a knowledge base, right? So, you need either your company knowledge base or some industry knowledge base. You need, you know, some sort of play books and then you need a human in the loop,right?  

So, and that I think brings back to the topic that we started with Rajan that, you know, how is the adoption going? So, I think that is one of the reasons why the initial excitement and, you know, the pilots have now resulted in everyone spending a lot more time and effort. And it is actually difficult to evaluate these things. You need a lot of expertise. You need a structured data, a set of tasks to come up with a quantitative measure whether a particular, you know, system or a is performing well on hallucination. So, I think that's where the industry is moving in the context of legal. But if there is anything specific Dilip, you want to add,

Rajan: Yeah, maybe we should move to the next 2 questions. So, the next one, Dilip, I think you should chime in how to continuously update the knowledge base of our RAG system. Any examples you can share?

Dilip: So yeah, depends on what kind of RAG system you are seeing. There are different. Like Rajan said there are different ways of different variations of RAG systems that that are there. But I think typically updates currently the way it is not a very simple trivial task. It is definitely costly and you know heavy task to be to be done. So, it is not easy to just go and update. And also, there are things like temporal for example, when you say you know, time, certain things that get actually time barred for example, and, and therefore there are issues related to that things an offer might get, you know, might be only for 14 days, right. So yeah. So yeah, it's a very, I mean it's a topic in itself to get into how? It is possible, whether it's any type of RAG, Graph RAG, Fusion RAG or you know other rag. You know it is possible, but it's an involved task. It is not something like you take just a Llama Index sample notebook and just use some default and then start writing code for proof of concepts. Great point production. No,

Rajan: Yeah, there is one more question. Let's take that out before we move towards wrapping up. I hope the data and privacy related. I hope data privacy, security related challenges are addressed well along with hallucination and RAG. Either of you go for it,

Amit: Yeah, I mean I think it's a separate topic. I, I think ragged and whichever you know, flavor of RAG you are deploying is more to make it reliable, contextual, whereas privacy, data security, I think is I think there are elements of it that you can do within a RAG system with a GenAI system, right. As Dilip mentioned, GenAI is not a model, but it's a system. But I think it's slightly differently managed depending on, you know, what is your sort of privacy stack.

Dilip: In fact, you know, if the security and privacy is an issue, some of these open sources, very some of the open sources have become equally as good as some of the close, close points. And I think that opens up potential opportunities to have, you know, this right within your firewall, within your own, you know, private cloud. So, I think some of those challenges are also getting addressed today.

Rajan: So, Dilip Amit, we are right up on our time. I want to ask each one of you any closing comments on our topic today on like, you know, the opportunity in the legal space, the challenges in the legal space, the approach to solve some of the immediate friction that we have so that it can unlock this from what it is today to like, you know, the total potential that is possible.

Amit: Yeah, I mean from, you know, from a little, you know, from a legal and contracts, you know standpoint, it is a big opportunity. But I think you need to handle the privacy and the reliability issues, you know, with the, with the real purpose, right? Not just, you know, for, you know, not just for narrative purposes. And if you do that, I think lawyers and companies are willing to adopt. So, it's, it's a big opportunity. We will see this continue to grow.

Rajan: So you are saying that, you know, if that particular technical problem is solved, then, you know, real business and large business can be built.

Dilip: Yeah. So yeah, for me I think it is important to you know, it's good that these things happen good in in my opinion, it's good that ChatGPT happened. People experience the power of something and that leads to widespread use. You know, Rajan, now we were struggling to, you know, we were hollering about knowledge graphs and graphs related mechanisms for like last 2 years, you know, and well, you know, now of course people are, when people used it, they start. When you use something more, you start, you appreciate good things about it, but you also start seeing some of the challenges. And once you see that, everybody will work towards solving those problems.  

And I think this cycle is required, which also tells me this is not going away, it is going to be there. And for sure people are putting in efforts. And these are opportunities I see for startups both in the legal and the tech domain to work together to ensure that these things are ironed out. Because as more and more, and we work closely with earlier doctors and design partners, these are some things that would come out. And any nimble, you know, fast, you know, you know, highly focused set of startups could actually tackle some of these products. And I am very sure the future is going to be where these things are. I saw what you wrote today Rajan as in AGI I think you said AGI is already here and I think that is the truth. I mean depends on who is defining AGI, right. So, I think this is not like some fancy fat that is coming and going. This is going to be here. And I think it is a great opportunity for both users to utilize some of these technologies and for multiple startups to actually benefit and create great businesses.

Rajan: So, for me, the takeaway that I want to leave the audience with this is to say that amongst the many industry verticals that are there, lawyers and the legal use cases are very, very real, very, very near, very, very grounded, right, compared to many other use cases that are there. So, there is an opportunity for building business there, but then reliability and privacy and hallucination, the associated issues that are there with that that needs to be solved. And if you use Graph RAG or knowledge graph-based RAG, then you can like solve that to a lot more extent than how it is getting solved today. And that will lead to the adoption and then building that business. So that's the takeaway that I want to leave the audience with. Amit Dilip, thank you so much for spending this hour and like shining some light on the opportunities and challenges of legal and AI. We will meet in another similar session and till then, bye folks.

Dilip: Thank you. It's a pleasure. Thanks everyone.

Amit: Thanks everyone for joining.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Ready to start shipping reliable GenAI apps faster?

Build genrative AI into any business process with secure enterprise platform