OpenAI CPO Kevin Weil says their o1 model can now write legal briefs that previously were the domain of $1000/hour associates: "what does it mean when you can suddenly do $8000 of work in 5 minutes for $3 of API credits?" pic.twitter.com/MotT9Oo9rv— Tsarathustra (@tsarnick) October 19, 2024 OpenAI's Chief Product…
The legal profession revolves around logic. Legal arguments are formed based on the rules of logic. A fine tuned model would absolutely demolish any human opponent here.
Take chess for example. There’s a predefined set of rules. Chess was one of the first games where ML models beat humans.
Sure, the magnitude of complexity of rules in law is more than that in chess. The ground principle is still the same.
Law = rules. Action A is legal. Action B isn’t. Doing X + Y + Z constitutes action A and so on. Legal arguments have to abide by all rules of logic. Law is one of the most logic heavy fields out there.
As for LLMs not being able to reason, it’s very debatable. Whether they reason or not depends upon your definition of “reasoning”. Debating definitions here is useless however, as the end result speaks for itself.
If LLMs can pass certification exams for lawyers, then it means either one of two things:
Law = rules. Action A is legal. Action B isn’t. Doing X + Y + Z constitutes action A and so on. Legal arguments have to abide by all rules of logic. Law is one of the most logic heavy fields out there.
You’re ignoring the whole Job of a judge, where they put the actions and laws into a procedural, historical and social context (something which LLMs can’t emulate) to reach a verdict.
You know what’s way closer to “pure logic”? Programming? You know what’s the quality of the code LLMs shit out? It’s quite bad.
Debating definitions here is useless however, as the end result speaks for itself.
I also don’t agree with your assessment. If an LLM passes a perfect law exam (a thing that doesn’t really exist) and afterwards only invents laws and precedent cases, it’s still useless.
You’re ignoring the whole Job of a judge, where they put the actions and laws into a procedural, historical and social context (something which LLMs can’t emulate) to reach a verdict.
LLMs would have no problem doing any of this. There’s a discernible pattern in any judge’s verdict. LLMs can easily pick this pattern up.
You know what’s the quality of the code LLMs shit out?
LLMs in their current form are “spitting out” code in a very literal way. Actual programmers never do that. No one is smart enough to code by intuition. We write code, take a look at it, run it, see warnings/errors if any, fix them and repeat. No programmer writes code and gets it correct in the first try itself.
LLMs till now have had their hands tied behind their backs. They haven’t been able to run the code by themselves at all. They haven’t been able to do recursive reasoning.
TILL NOW.
The new O1 model (I think) is able to do that. It’ll just get better from here. Look at the sudden increase in the quality of code output. There’s a very strong reason as to why I believe this as well.
I heavily use LLMs for my code. They seem to write shit code in the first pass. I give it the output, the issues with the code, semantic errors if any and so on. By the third or fourth time I get back to it, the code it writes is perfect. I have stopped needing to manually type out comments and so on. LLMs do that for me now (of course, I supervise what it writes n don’t blindly trust it). Using LLMs has sped up my coding at least by 4 times (and I’m not even using a fine tuned model).
I also don’t agree with your assessment. If an LLM passes a perfect law exam (a thing that doesn’t really exist) and afterwards only invents laws and precedent cases, it’s still useless.
There’s no reason as to why it would do that. The underlying function behind verdicts/legal arguments has been the same, and will remain the same, because it’s based on logic and human morals. Tackling morals is easy because LLMs have been trained on human data. Their morals are a reflection of ours. If we want to specify our morals explicitly, then we could make them law (and we already have for the ones that matter most), which makes stuff even easier.
LLMs would have no problem doing any of this. There’s a discernible pattern in any judge’s verdict. LLMs can easily pick this pattern up.
That’s worse! You do see how that’s worse right?!?
You are factually correct, but those are called biases. That doesn’t mean that LLMs would be good at that job. It means they can do the job with comparable results for all the reasons that people are terrible at it. You’re arguing to build a racism machine because judges are racist.
I think you’re conflating formal and informal logic. Programmers are excellent at defining a formal logic system which the computer follows, but the computer itself isn’t particularly “logical”.
What you describe as:
Action A is legal. Action B isn’t. Doing X + Y + Z constitutes action A and so on.
Is a particularly nasty form of logic called abstract reasoning. Biological brains are very good at that! Computers a lot less so…
(Using a test designed to measure that)[https://arxiv.org/abs/1911.01547] humans average ~80% accuracy. The current best algorithm (last I checked…) has a 31% accuracy. (LLMs can get up to ~17% accuracy.)[https://arxiv.org/pdf/2403.11793] (With the addition of some prompt engineering and other fancy tricks). So they are technically capable… Just really bad at it…
Now law ismarketed as a very logical profession but, at least Western, modern law is more akin to combatative theater. The law as written serves as the base worldbuilding and case law serving as addition canon. The goal of law is to put on a performance with the goal of tricking the audience (typically judge, jury, opposing legal) that it is far more logical and internally consistent than it actually is.
That is essentially what LLMs are designed to do. Take some giant corpus of knowledge and return some permutation of it that maximizes the “believability” based on the input prompt. And it can do so with a shocking amount of internal logic and creativity. So it shouldn’t be shocking that they’re capable of passing bar exams, but that should not be conflated with them being rational, logical, fair, just, or accurate.
And neither should the law. Friendly reminder to fuck the police and the corrupt legal system they enforce.
The bar exam is just one part of a larger set of qualifications
The bar exam is just a (closed-book) proxy for the actual skills and knowledge being tested. While a reasonable proxy for humans, it is a poor proxy for computers
I disagree with your first statement. Law is about the application of rules, not the rules themselves. In a perfect world, it would be about determining which law has precedence in matter at hand, a task in itself outside of AI capabilities as it involves weighing moral and ethical principles against eachother, but in reality it often comes down to why my interpretation of reality is the correct one.
And I believe them 100%.
The legal profession revolves around logic. Legal arguments are formed based on the rules of logic. A fine tuned model would absolutely demolish any human opponent here.
Take chess for example. There’s a predefined set of rules. Chess was one of the first games where ML models beat humans.
Sure, the magnitude of complexity of rules in law is more than that in chess. The ground principle is still the same.
The legal system does not revolve around logic and even it it was: LLMs can’t reason, so they’d be useless, anyways.
Law = rules. Action A is legal. Action B isn’t. Doing X + Y + Z constitutes action A and so on. Legal arguments have to abide by all rules of logic. Law is one of the most logic heavy fields out there.
As for LLMs not being able to reason, it’s very debatable. Whether they reason or not depends upon your definition of “reasoning”. Debating definitions here is useless however, as the end result speaks for itself.
If LLMs can pass certification exams for lawyers, then it means either one of two things:
You’re ignoring the whole Job of a judge, where they put the actions and laws into a procedural, historical and social context (something which LLMs can’t emulate) to reach a verdict.
You know what’s way closer to “pure logic”? Programming? You know what’s the quality of the code LLMs shit out? It’s quite bad.
Yes, it does speak for itself: They can’t.
Yes, the exams are flawed. This podcast episode takes a look at these supposed AI lawyers.
I also don’t agree with your assessment. If an LLM passes a perfect law exam (a thing that doesn’t really exist) and afterwards only invents laws and precedent cases, it’s still useless.
LLMs would have no problem doing any of this. There’s a discernible pattern in any judge’s verdict. LLMs can easily pick this pattern up.
LLMs in their current form are “spitting out” code in a very literal way. Actual programmers never do that. No one is smart enough to code by intuition. We write code, take a look at it, run it, see warnings/errors if any, fix them and repeat. No programmer writes code and gets it correct in the first try itself.
LLMs till now have had their hands tied behind their backs. They haven’t been able to run the code by themselves at all. They haven’t been able to do recursive reasoning. TILL NOW.
The new O1 model (I think) is able to do that. It’ll just get better from here. Look at the sudden increase in the quality of code output. There’s a very strong reason as to why I believe this as well.
I heavily use LLMs for my code. They seem to write shit code in the first pass. I give it the output, the issues with the code, semantic errors if any and so on. By the third or fourth time I get back to it, the code it writes is perfect. I have stopped needing to manually type out comments and so on. LLMs do that for me now (of course, I supervise what it writes n don’t blindly trust it). Using LLMs has sped up my coding at least by 4 times (and I’m not even using a fine tuned model).
There’s no reason as to why it would do that. The underlying function behind verdicts/legal arguments has been the same, and will remain the same, because it’s based on logic and human morals. Tackling morals is easy because LLMs have been trained on human data. Their morals are a reflection of ours. If we want to specify our morals explicitly, then we could make them law (and we already have for the ones that matter most), which makes stuff even easier.
Ok, so you just ignore the reports and continue to coast on feels over reals. Cool.
Another report contradicting you
Stop believing the hype. Sam Altman is lying to you.
That’s worse! You do see how that’s worse right?!?
You are factually correct, but those are called biases. That doesn’t mean that LLMs would be good at that job. It means they can do the job with comparable results for all the reasons that people are terrible at it. You’re arguing to build a racism machine because judges are racist.
I think you’re conflating formal and informal logic. Programmers are excellent at defining a formal logic system which the computer follows, but the computer itself isn’t particularly “logical”.
What you describe as:
Is a particularly nasty form of logic called abstract reasoning. Biological brains are very good at that! Computers a lot less so…
(Using a test designed to measure that)[https://arxiv.org/abs/1911.01547] humans average ~80% accuracy. The current best algorithm (last I checked…) has a 31% accuracy. (LLMs can get up to ~17% accuracy.)[https://arxiv.org/pdf/2403.11793] (With the addition of some prompt engineering and other fancy tricks). So they are technically capable… Just really bad at it…
Now law ismarketed as a very logical profession but, at least Western, modern law is more akin to combatative theater. The law as written serves as the base worldbuilding and case law serving as addition canon. The goal of law is to put on a performance with the goal of tricking the audience (typically judge, jury, opposing legal) that it is far more logical and internally consistent than it actually is.
That is essentially what LLMs are designed to do. Take some giant corpus of knowledge and return some permutation of it that maximizes the “believability” based on the input prompt. And it can do so with a shocking amount of internal logic and creativity. So it shouldn’t be shocking that they’re capable of passing bar exams, but that should not be conflated with them being rational, logical, fair, just, or accurate.
And neither should the law. Friendly reminder to fuck the police and the corrupt legal system they enforce.
Ok. What test would an LLM need to pass to convince you that it capable of being a lawyer?
I disagree with your first statement. Law is about the application of rules, not the rules themselves. In a perfect world, it would be about determining which law has precedence in matter at hand, a task in itself outside of AI capabilities as it involves weighing moral and ethical principles against eachother, but in reality it often comes down to why my interpretation of reality is the correct one.
The legal system absolutely does not revolve around logic. Legal arguments, especially in court, are formed based on emotional appeal