Episode 142: gr3pme's full-time hunting journey update, insane AI research, and some light news

Episode 142: In this episode of Critical Thinking - Bug Bounty Podcast Rez0 and Gr3pme join forces to discuss Websocket research, Meta’s $111750 Bug, PROMISQROUTE, and the opportunities afforded by going full time in Bug Bounty.
Follow us on twitter at: https://x.com/ctbbpodcast
Got any ideas and suggestions? Feel free to send us any feedback here: info@criticalthinkingpodcast.io
Shoutout to YTCracker for the awesome intro music!
====== Links ======
Follow your hosts Rhynorater and Rez0 on Twitter:
====== Ways to Support CTBBPodcast ======
Hop on the CTBB Discord at https://ctbb.show/discord!
We also do Discord subs at $25, $10, and $5 - premium subscribers get access to private masterclasses, exploits, tools, scripts, un-redacted bug reports, etc.
You can also find some hacker swag at https://ctbb.show/merch!
Today's Sponsor: ThreatLocker. Check out ThreatLocker DAC
Today’s Guest: https://x.com/gr3pme
====== This Week in Bug Bounty ======
New Monthly Dojo challenge and Dojo UI design
The ultimate Bug Bounty guide to exploiting race condition vulnerabilities in web applications
Watch Our boy Brandyn on the TV
====== Resources ======
WebSocket Turbo Intruder: Unearthing the WebSocket Goldmine
Chaining Path Traversal Vulnerability to RCE — Meta’s 111,750$ Bug
Finding vulnerabilities in modern web apps using Claude Code and OpenAI Codex
====== Timestamps ======
(00:00:00) Introduction
(00:05:16) Full Time Bug Bounty and Business Startups
(00:15:50) Websockets
(00:22:17) Meta’s $111750 Bug
(00:28:38) Finding vulns using Claude Code and OpenAI Codex
(00:39:32) Time-of-Check to Time-of-Use Vulns in LLM-Enabled Agents
(00:45:22) PROMISQROUTE
Title: Transcript - Thu, 02 Oct 2025 15:32:34 GMT
Date: Thu, 02 Oct 2025 15:32:34 GMT, Duration: [00:54:51.92]
[00:00:01.12] - Joseph Thacker
Feel like the lifestyle of having the flexibility of Bug bounty as well has bred some really unexpected but cool opportunities, which I just never would have guessed could have happened.
[00:00:34.78] - Justin Gardner
all right, hackers, we all know the value of a good misconfiguration, right? That's often how we're popping bugs and bug bounties. Well, unfortunately, threatlocker also knows about it, which is why they built DAC Defense Against Configurations. DAC scans all of the enterprise machines in your network for misconfigurations, ranks them by severity, and then shows them in nice graphics in a portal. It even emails your team weekly with updates so nothing slips through the cracks. And the best part about dac, in my opinion, is that it actually maps all these misconfigurations onto security frameworks and it also shows you how to fix them. That way, when you need ammo with leadership or with it, you can point to real specific compliance gaps and get actionable steps. Anyway, check out ThreatLocker DAC. It's an awesome way to stay ahead of these issues that we as attackers love. All right, let's go back to the show. Alrighty. Sup hackers? We got the this week in Bug Bounty segment and I guess I'll start this week off actually with a shout out to the Epic Games team for an amazing live hacking event in Sweden. This past week I had a blast. So much fun and currently in first place. Got to say, you know, not a bad time for me, but yeah, really, the team was excellent there and definitely wanted to give them a shout out because that was a S tier event and the way that they interacted with the hackers, collaborated, escalated bugs. Phenomenal. Phenomenal. Now you rarely see an experience like that that, that detailed at a live hacking event where they really checked every box. So anyway, I'm kind of glowing as you guys can tell from a really awesome week of hacking. Damn, this is such a great job guys. We really have the most amazing job. All right, so let's, let's get into the content for this week, for this week in Bug Bounty. First up, we've got a article by yes, we hack once again, just releasing these ultimate guides on various bug classes. Really digging this series. It's great for getting like in depth analysis of your methodology for various bug class. And this week is race conditions. Now, race conditions, something I don't actually do very much of because I'm using Caido and we don't have a bunch of tooling right now for Race conditions. But this, this blog outlines the different kind of attacks you can do with HP 1.1 last bite synchronization attacks and HTTP 2 single packet race condition attacks which we've talked about a lot in the podcast. Really awesome stuff here, so check it out if you're interested in sort of brushing up on race condition vulnerabilities. Also crazy from yes We Hack. The Dojo just got a total facelift guys. So check this out. Look if you for those on YouTube, check out my screen right now. Super beautiful design here. And as always, yes We Hack is running awesome challenges on dojo-yes we hacked.com this this month's is chain fection which looks like some file upload vulnerabilities happening here. So always good to check that out if you guys are looking to sharpen your skills. And what a lot of people don't know is when you solve these dojo challenges you get, you get points and invites on yes We Hack. So pretty awesome. All right, last but not least, guys, I told you a couple of weeks ago that bug crowd, specifically Santera was looking for people to come and do PR opportunities, right? Well for those of you that are watching the screen right now, you are looking at my boy Gret me Brandon on TV in Britain via those opportunities with bugcraft. So definitely if you're a top skill hacker, shoot since her an email say hey, I'm open to PR opportunities and you might end up on TV like Brandon did in this scenario. So anyway, just wanted to shoot that out. If you guys are interested in seeing the segment, we'll link it in the description and with that we'll jump right into the show. Let's go.
[00:04:14.25] - Joseph Thacker
Hey. Hey, what's up guys? I am here with Brandon. Gret me, you may remember him from a few months ago. Did you say it was like seven or eight months ago?
[00:04:23.13] - Brandyn Murtagh
I think it was around about that time. It's during the my first lhe actually just after.
[00:04:29.45] - Joseph Thacker
Nice. And at that time you had been hacking for like a year. Had been doing Bug Bunny for about a year.
[00:04:36.41] - Brandyn Murtagh
At the time of the last interview it had only been eight or nine months. So in Bug Bounty specifically. So I was still in that very rich learning process. First lhe coming to terms with everything. And yeah, it's been a crazy ride since then.
[00:04:53.37] - Joseph Thacker
Sweet. Yeah. And so Justin and you obviously knew each other back then and then you kind of have been doing hacker notes through this year. But what the audience probably doesn't know is that you and I have done a bunch of Hacking together this year on a mix of AI products and Bug Bounty programs and even a pen test here or there. So this guy is what I would consider a top tier kind of AI security hacker as well. And we'll talk a little about that later in the episode. But first, yeah, I wanted to hear what you've been up to for this year. How has going full time been? What's it look like?
[00:05:28.94] - Brandyn Murtagh
Yeah, full time. As I said, it's been a bit of a journey and I feel like the lifestyle of having the flexibility of Bug Bounty as well has bred some really unexpected but cool opportunities which I just never would have guessed could have happened without having the flexibility of doing Bug Bounty. So I've been full time since January. So what is that, eight months? Eight months? Nine months. I've participated in a few more in person events since then. I've taken home awards from in person events a few times, which has been really nice. And yeah, it's just been such a good journey and I'm very grateful for the community and being able to do it. Really. Dude.
[00:06:14.83] - Joseph Thacker
Yeah. I hadn't even thought about the fact that we went full time in the same month. That's awesome.
[00:06:18.75] - Brandyn Murtagh
We actually did. And that was when. Do you remember the hackbot that we absolutely wrecked and dropped a crit and a high and. Yeah, that was like the. I feel like that was like the real gateway, but that was literally within the first week or two of me going full time. I feel like that was the gateway for like the AI hacking when I was like, right, this is really something I'm going to go down now.
[00:06:41.99] - Joseph Thacker
Yeah. Contributed to a bunch of the findings and invites that through this year and how all that shook out.
[00:06:49.50] - Brandyn Murtagh
Yeah. And on the side of that as well, some other things that have happened. I unexpectedly started a company which was a very natural byproduct of a lot of the outreach I was getting. So there's been a lot of lessons learned there as well.
[00:07:04.82] - Joseph Thacker
Yeah, sorry, actually I meant to say something a second ago. I was going to say like that flexibility that comes with being a full time hunter, I think really does allow people like you and I to like say yes to more things. I think whenever you have a full time job and then you're doing Bug Bounty on the side, the margin that we have in our lives for things that are outside of, you know, that salary, work and you know, those like precious Bug Bounty hours in the evenings or whatever is like pretty small, especially if you have a family or lots of other obligations. And. And so, yeah, Whenever you said that earlier, I was like, man, that's so true. Like, that's been one huge upside of this year is just the opportunity to say yes to more things because you don't have like a 40 hour a week job kind of sitting there all the time. And I know, you know, a lot of people I think, who are full time hunters often talk about that. Freedom of flexibility, freedom of schedule, being able to work from anywhere. And I think before I went full time, I really thought more about that free time that I would be able to have like for my family or for hobbies. But I didn't think about all the opportunities that it lets you say yes to for professional work that you can just kind of swap some of that time with. Right? So Instead of doing 100% bug bonding hours, it's like, oh yeah, I can do 30% of my time on pen test and 20% of time on this side project that could blow up. And you know, for me personally, like 20, you know, 5% of my time on helping advice companies. That may eventually be like a second retirement. Right. It's like it allows you to say yes to all these things you couldn't say yes to before.
[00:08:29.58] - Brandyn Murtagh
100%. I think it's important as well. I've been speaking to a lot of hackers about this hack. This comes to mind more recently. But when you are a certain tier of bug bounty hunter and I need to thank Justin for giving me some advice on this, you really need to weigh up opportunities and opportunity cost as well and what that means for you as a hunter. So say if you're in an lhe scene and you are thinking about starting a company or you are thinking about a side project, half reality is, and again, it depends on the country because of different obligations, things like that. But you probably will take a bit of a hit initially in your income and you have to be prepared and ready for that. And it's one of the, one of the things I would like to point out if you are thinking about doing that is to be very real about what you can make and what sort of income you can generate. And equally aside from the income, what actually fills your cup rather than take from it, right? What works for you. And I think for me, again, another completely unexpected byproduct of having the freedom from bug bounty is finding out. I actually really like doing like conferences and talks. I've done a lot this year and that has had another impact of generating so many opportunities for me for my company and doing some really good networking So I would say definitely exercise caution when you are looking at some of these opportunities. Even me and you have spoken about one of your recent engagements when there's a little bit of pushback on your side, which actually works out really well for you. But, but definitely be honest with yourself and way up. Okay, this is what I can generate. There's going to be a bit of hit in income if you do think about going down that route.
[00:10:19.33] - Joseph Thacker
Yeah, yeah. And I loved what you said there about like, kind of like learning more about yourself. Like the fact that you learned that you enjoy doing speaking and stuff. I, I knew that I disliked like meetings and maybe what's the best way to say this? Like, accountability from work. Like, I've always hated like logging time, for example, but I've realized that even more, like I was doing a pen test and like an AI security pen test recently and I got an email from one of the, one of the customers or like one of the people who work at the customers, like the head of security. And he was like, yeah, if you don't mind, just shoot me over like, you know, two sentences at the end of the day about like, you know, like what you were like, hacking on or working on that day for us or like, would you mind to do that? And my initial inclination was like, no, I won't do that. Like, I don't, I don't have to do that. I'm not going to do that. But obviously that wouldn't have been like the most tactful way to say that. And so luckily I, I posted my draft email in a Discord channel and Shubs and ZLZ were kind enough to be like, you shouldn't say that that way. You should say it like this. And we ended up rewording it as like, hey, my normal flow is to stay in a flow state. I take notes about what I'm doing and we can talk about that like the, at the end of the week meeting. And they were like, yeah, yeah, sure. Just, you know, do it your way. Do it, do it the way you normally do it. And so that was like a really tactful way to not have to deal with that. But I do think like, yeah, just like a huge upside that we've talked about with Bug Bounty that really was present in that moment was the fact that you don't have any, you have no boss, you know, and when you're, even if you're solo freelance and you're doing an engagement, you do have a boss, right? They hired you, they're paying you they kind of have the right to ask things of you. And so for me, you know, that kind of pounded home. And also the pen test that I handed off to you is kind of the same way. I just got a weird vibe from one of the people who were going to hire me for a pen test and I handed it off to you and we ended up tag teaming it later and it was fantastic. But yeah, I love that you do get a little bit more, more of like self introspection and exploration when it comes to being on your own. You know, you really do kind of. It's more visceral what you like and don't like because you can say no to slightly more.
[00:12:22.48] - Brandyn Murtagh
Exactly that. Exactly. And what I will say as well, I think a thing that isn't really thought about is when you are starting these ventures and you are exploring other opportunities, more often than not, it's a very, very different skill set than the one you used to bug bounty hunting. Right. The reason why bug bounty is so cool is because you get to spend so much time on the cool stuff that you really. And okay, yes, there are some caveats with triage and company politics sometimes when things get downgraded or whatever, but that pales in the comparison to some of the other stuff you have to do. Like take your example of logging timesheets. I'd rather pull my own teeth out than have to do. That's not something I want to do. So ultimately I would say yeah, full time bug burning has been great and it's gave me so many opportunities. But there has been a reality of some learning where you do realize, okay, maybe I shouldn't have said yes to that or maybe I should have worded that a little bit differently because until you try something else, I don't think you realize how good the bug bounty lifestyle really is. Just with that freedom.
[00:13:34.69] - Joseph Thacker
Yeah, yeah, sweet dude. Well, I'm glad that's going super well. Did you want to mention your pen test company you started?
[00:13:42.85] - Brandyn Murtagh
Yeah. Yes. So Murtasec M U R T A S E C is a pen test company we've been running for, I would say since February time, unofficially. And then we properly launched the site three months ago to capitalize on all the leads that were coming inbound and things like that. And yeah, we've really hammer down on web applications, AI and APIs. We have a small service offering because that's what myself and the researchers we work with like to look at. And yeah, it's honestly been so good to be able to have the opportunity to work with some of the cooler customers in the AI space, people that are doing really cutting edge things and also just share the love and work with my friends as well. It's really good.
[00:14:25.96] - Joseph Thacker
That's one of the coolest things is being able to hire your friends to work with you. Right. And just put money in their pocket too. Like I hired Exeli for this engagement over the summer and it was like, you know, great pay for him and it was fun to work together. And honestly, I feel like I get more done, stay more focused when I have friends that I'm working with, you know.
[00:14:44.58] - Brandyn Murtagh
Absolutely. It's, it gives you that bit more motivation and accountability as well, I think. Yeah.
[00:14:51.53] - Joseph Thacker
Sweet, dude. Well, I know Justin would be whipping our backs with, you know, staying. Not again, not being technical enough. So we're gonna, we're gonna dive into some notes. Brandon and I both like, we were looking at some appsec stuff and then we've got a bunch of AI research because I already cover enough AI stuff in general. We'll stick the AI stuff at the end though, though it is some like really high quality, good stuff. But Brandon brought, brought two things to the table here about, you know, just traditional AppSec that I think is really useful. We can dive straight into those if you want to mention the first one.
[00:15:26.76] - Brandyn Murtagh
Yeah, sure. So the first one I got on the list is from portswigger. Let me share my screen actually.
[00:15:36.04] - Joseph Thacker
Yeah, I had no idea they had released this. This is really cool. It'll probably have me double fisting proxies anytime I need to look at WebSockets.
[00:15:47.16] - Brandyn Murtagh
That is exactly where I'm at and it's not a nice setup. I can trust you. But WebSocket Turbo Intruder, very, very nice. I did briefly have a play around with it and I feel like it's another one of portswigger specialities where they address a problem that everyone else is a bit scared of because of the complexity and trying to address that problem. And essentially they made some updates to it with some really nice quality of life things as well. Just around testing websockets, it can be such a pain sometimes, especially if you are trying to exploit some of the more exotic bugs. But just getting a general handle with it has been incredibly painful in some targets. So this is a very, very welcome addition. Now one of the cool things that they did. Where is it? You've got some filtering in there, the automated testing part. But the thing I did want to shout out, where is it? Ah, this was one of them. So whilst they were using their tool for testing a WebSocket implementation. They found WebSockets ping of death, which was completely unexpected and it was a denial of service in a Java WebSocket implementation. And essentially using the Turbo engine, they can send malformed websocket frames, which goes outside of the spec that I was expecting. And lo and behold, it resulted in an out of memory crash and they actually released the ping of death example py if you did want to use it, if you do come across some of the Java implementations of websockets, I.
[00:17:31.07] - Joseph Thacker
Love it whenever any write up includes a POC like that. Even more so nowadays with AI because you can just drop that straight into cloud code and be like, hey, I'm looking for this on this target. Implement it but with like these headers or this thing, you know, I mean, just drop in a request and it can customize that Python file such that it will be the target you're trying to hack. And it can kind of automate some of that testing. Yeah, yeah, speaking of that, if you go down to the automated testing section, if you command F for automated, it is one thing that I thought was really neat is just personally I. This is probably true of most hackers. It's just people in general. Sometimes I have a harder time like wrapping my head around things that are different. Right. Like our brains are just like so used to Looking at normal HTTP requests that looking at WebSocket requests can be like confusing or hard to understand. I love this because basically you can set it up like an HTTP request and then just modify the post body like you would a normal post body. So if you see that box there at the bottom of Brandon screen for the listeners, basically they have a way where you can set up a little like mini proxy where you send a HTTP request with like a normal post body and it converts it to a WebSocket request on the back end and then sends that with all of the, you know, accoutrement that like is required to like make it work. And so you can just like modify the keys. You know, it's just JSON, you can modify the keys, you can modify the values, you can look for things like idors or SQL injection or whatever in like the, in like the keys and values for the WebSocket request, but editing it as if it were an HTTP request. Which is really nice for our hacker brains that are like used to HTTP request.
[00:19:13.05] - Brandyn Murtagh
Yeah, yeah. And again, these are the small quality of life things that make a huge difference when you're on a target and you are facing friction and it is a pain to test and you have some.
[00:19:22.01] - Joseph Thacker
It's the difference between testing it and not testing it. For me, right, it's like in almost every pen test I've been on, there's like enough to fill the, the hours I have available. And so like if something's difficult to test, I end up not testing it, right?
[00:19:34.74] - Brandyn Murtagh
A hundred percent. Oh, absolutely. The other thing I did want to mention out in here as well is something that I didn't really put into the same bucket and that's websockets with race conditions. And one of the really nice things they've done with WebSocket Turbo Intruder is, and I'm quoting it from the research now, it includes a special engine type called threaded. This engine starts multiple worker threads, each with its own websocket connection, and sends messages in parallel. This makes it possible to trigger classic race conditions like logic bypasses, token reuse or state desync bugs. And that gives again another working race condition example Python script to use. But websockets and race conditions, I mean, I haven't really heard of anyone. Probably a bit slept on, if I'm being honest. Have you ever seen super slept on?
[00:20:22.35] - Joseph Thacker
Because it's so hard to test, Right?
[00:20:24.16] - Brandyn Murtagh
Exactly, exactly. And this enables that. So I'm definitely going to play around with this a bit more. I'm eagerly waiting for a target to come up now. But one thing I will say, although I love portswigger for this, I'm in this split brain between Caido and Burt for this exact reason and it's just a bit of a pain and I'm not sure what the answer is because I really like functionality like this.
[00:20:47.88] - Joseph Thacker
Yeah, yeah. I think it just like Jocko Willink has a really, really good quote. It's like, oh man, I'm not going to be able to remember the exact quote, but basically every decision has trade offs, right? It's like everything in life has trade offs. Even if you don't think it does, there's always like a something you're omitting by choosing something else. And so I think like, you know, this is tough, right? It's like one, it's not ideal to live in both, but if you don't live in both, then you're not gonna get the feature, the features of one. And so I really love so much about Caido and that does make me give up a lot of stuff on Burp. But then if I don't want to give up Burp. Then I just need to use both. But then I have to deal with the downsides of using both. Right. It's like, you know, it's kind of the same. You can't have your cake and eat it too.
[00:21:28.65] - Brandyn Murtagh
Exactly, exactly. It is a massive pain. But yeah, a very welcome addition from Portsugar in my opinion, because this is probably going to find people some pretty exotic and cool bugs.
[00:21:39.39] - Joseph Thacker
Yeah, 100%. And I think, like, I hate to bring up AI before we're on that section, but I do think there are a lot of AI apps that use websockets for like streaming their responses back. And so it would be interesting to see if you could kind of like, you know, bypass like different rate limits or you know, change different things that are kind of that, you know, that could be changed at the same time. So if you have an LLM app that's able to like make tool calls and you use two requests, like two streams where you know, they're, they're modifying the same object, like it'd be really interesting to test kind of what happens there and that sort of thing.
[00:22:14.63] - Brandyn Murtagh
100%. I'm going to be waiting for targets to pop up now so I can use it.
[00:22:21.10] - Joseph Thacker
Sweet. And then there is a second write up on a extremely well paying vulnerability for meta.
[00:22:29.93] - Brandyn Murtagh
Do you want me to show this or.
[00:22:31.52] - Joseph Thacker
Yeah, yeah, if you don't mind.
[00:22:33.13] - Brandyn Murtagh
Sure. Now this write up, incredible. Absolutely incredible in terms of the bounty payout. But when I read it, and I'm not saying this because it was a bad bug at all, it was a really cool bug, it was easy to exploit in the sense that I thought a bug that paid this much would be a bit harder. But there we go. Just to challenge the assumptions.
[00:22:57.16] - Joseph Thacker
So the TLDR by Abhishek Mina.
[00:23:01.55] - Brandyn Murtagh
Yeah, so the TLDR here is that essentially a researcher found a path traversal in Facebook messenger for Windows. Now one of the important things with Facebook messenger for Facebook messenger for Windows is that this exploit actually happened in the end to end encrypted chat feature. Now I hadn't really thought about this before until I read this write up, but end to end base functionality. And it occurred to me once I read this, all of the validation has to be done on the client side. Right. There's no server involved in that, so it's client to client communications. Now this one was pretty cool where essentially the messenger client accepted a file name for one of the attachments. But classic path traversal, the file name wasn't checked for any path Traversal sequences. So what that allowed an attacker to do is essentially upload a file to a user's chat with a path traversal sequence included and upload a file outside of the intended directory on the host. Now obviously, as we're dealing with end to end encryption and the threat model of Facebook messenger, this is client to client is how I'm thinking about in my head. There's no server interaction here. Some pretty cool things on the write up that the path alone consumed 212 characters, which meant they only had 48 characters for the traversal sequence. So they add a bit of complexity there in terms of where they could actually upload to. But lo and behold, classic Windows DLL hijacking came to the rescue. Where you could essentially upload a DLL outside of an intended directory, or overwrite a dll, or upload a DLL that a application is looking for that doesn't exist and it will just dynamically load your dll, no questions asked, as long as it's in the same directory location. And that is essentially the path they used to get to rce.
[00:25:10.80] - Joseph Thacker
I'm surprised they use Viber instead of Slack because they mentioned the Slack was also vulnerable because of the short name of the application and where it lived. I feel like Slack is like a much more commonly used app. So yeah, I'm surprised they use Viber.
[00:25:23.76] - Brandyn Murtagh
Yeah, yeah, it is a, I mean an interesting choice. And an initial reward of 34k $34,500 was awarded after a discussion regarding Meta's payout guidelines for mobile. RC is reassessed totaling to $111,750. So that's quite the bounty, I must admit. And I'm not going to lie again, it's for me challenging those assumptions. I would have just assumed that things like this just didn't exist in the meta ecosystem, but it goes to show that they do.
[00:25:58.58] - Joseph Thacker
Yeah, my first takeaway was the fact that, you know, like you said that the big takeaway for like, for I think the listeners is that apps which are client to client have like completely different functionality and it feels like it's way less secure. There's just like less layers of protection there. Right. Because it's going straight to their clients. So if you can get it to anything interesting, it's immediately like pretty high severity. And it made me, it made me think about like what other apps are that way. And it reminded me of Live Hacking Event actually. Was it this January or last January where it was like it was a games based company and I'm sure a lot of people know what I'm talking about. And because like games are basically installed on your local machine, it very, very much is peer to peer with a lot of the different functionality, whether you message them or whatever, it's going to be interpreted by their client installed on their machine. And so like you said, it's not going through the server, which allows for like different types of vulnerabilities than maybe we're used to looking for.
[00:26:59.11] - Brandyn Murtagh
Absolutely. I think when I started to read this, the threat model in my head was like, okay, so end to end stuff, mainly privacy apps and like those sort of peer to peer requirements. But then you're absolutely right, games is a massive one, isn't it? This is exactly the sort of the same threat model you would apply there. But I mean, fair play to this person. Very, very straightforward pocket, but very, very big bounty. And I guess that's why the bounty was so severe, because of the impact and the clean cut exploit that they made for it as well.
[00:27:33.09] - Joseph Thacker
Yeah, it makes me think, I definitely have this thought theory that as hackbots kind of slowly eat away web appsec, that actual hardware based security stuff is going to live much longer. We all need to get on Matt Brown's level and kind of start looking more at hardware hacking. I wonder, I'm trying to like just like place in my head where I think mobile and where I think like desktop apps kind of fall in that I think they're, they're very slightly safer. But also will come, will, will, you know, be gobbled up much before hardware. But I, I, maybe mobile. I'd say mobile is more secure than desktop, I think like desktop binaries and that sort of, you know, and like just like open source code review are going to be eaten by AI before mobile and definitely before hardware hacking. But anyways, good stuff. It just makes me think about, you know, whether I should, you know, dig into that more.
[00:28:26.70] - Brandyn Murtagh
Yeah, no, I get what you mean. Each has pros and cons. But you're probably right. I think that Matt Brown style of hacking is going to be quite unautomatable for the, for the large part.
[00:28:40.38] - Joseph Thacker
Yep. Sweet. Okay, cool. I know we have about 15 minutes left and we have a ton of ass up to talk about. So like I mentioned in the last episode, I'm going to drag all of our listeners kicking and screaming down the path of learning AI hacking so that it will better their lives. So the first thing which you actually shared this blog is a post by Sim Grep. I think we both have a lot to say about it. I will share my screen since you've been doing all the sharing. I'll share the burden here.
[00:29:12.21] - Brandyn Murtagh
Thank you very much.
[00:29:13.41] - Joseph Thacker
For the listeners only. Basically, semgrep companies we're all extremely familiar with, they used Claude Code and OpenAI Codex to try and find vulnerabilities. You know, I have to say I love good TLDRs. The fact that they have like such a nice TLDR right here at the top was very, very nice for me with just knowing what I was going to get into, into the content of it. And they're, you know, actually as much more interesting content down in the blog. But. But yeah, it's hard to.
[00:29:40.30] - Brandyn Murtagh
Hacker notes.
[00:29:41.10] - Joseph Thacker
Yeah, exactly, it's the hacker notes.
[00:29:42.75] - Brandyn Murtagh
It's.
[00:29:42.99] - Joseph Thacker
It's what you've been killing for a while we've been doing such a good job with. Okay, so, yeah, so basically they used Anthropic's cloud code, which for that they only use Sonnet 4. They did not use Opus and OpenAI Codex, which they used O4 mini, which I have a bone to pick with. But we'll figure. Well, we'll talk about that in a minute. Did you want to talk about kind of their, their true positive rate and what they found?
[00:30:05.10] - Brandyn Murtagh
Yeah. So I feel like. And the reason why I like this blog post is because it feels like to me there's someone that's unearthed the reality a little bit to some extent with no sort of biases around what it actually is like running an AI hackbot. From my experience, this matches up and there was a massive false positive rate. So an 86% false positive rate. And one of the big things they mentioned here as well is the non determinism. So even using the exact same prompts, as, you know, everyone that uses AI on the same code base, every scan revealed very, very different results every time. So three identical runs produce three, six and then 11 distinct findings. Now for blog research, that's manageable. But when you think about some of these AI pack bot companies, for example, that's going to be a massive amount to deal with. But we'll touch upon that later on.
[00:31:02.93] - Joseph Thacker
Yeah, I was going to say we have some, some nice thoughts on that based on a research paper.
[00:31:07.13] - Brandyn Murtagh
Right, Absolutely. But for me, I just feel like it's the TLDR and the key takeaways gave way are very nice. This is the reality of some of these things. But there are some caveats around it because if you look at some of the blogs and research, for example, they aren't doing Any they aren't doing themselves any favors in the sense that there are architectural decisions you can make to improve the accuracy of some of these findings. So a common one that Jason Haddocks talks about and what Expo have alluded to in some of their content is around the fact that you get much infinitely better results by splitting vulnerability classes per hack bot. So you have a bot for reflect access, DOM excess and so on. This is trying to do a catch all which is for context, Windows and everything else going to be an absolute disaster. Which I think has reflected in their results a little bit as well.
[00:32:05.96] - Joseph Thacker
Yeah, and the issue with that is like there's kind of like an upside and downside. I think the downside is way worse like you said. I think that probably massively contributed to the non determinism just because it wasn't specifically looking for the same vulnerability necessarily every time. The variance in the next token selection was like having it look for lots of different types of vulnerabilities each time. But I do think there is one tiny upside with letting the model choose what to look for is sometimes it's going to try like crazy stuff that humans would never even try that are outside of any standard bug class. But you're right. I think I agree with you. I agree with Jason that it's much better to have your hackbot targeting a specific vulnerability because you can have a very tailored prompt with like, hey, here's how experts go and find XSS and validate it. Now do those things. Don't just like make your own decisions about how to find it and make your own decisions about how to validate it or what have you.
[00:32:59.44] - Brandyn Murtagh
Right, Absolutely. I think one thing I did like is how they contrast the different models throughout the blog post about how some are better at finding different vulnerability classes.
[00:33:10.30] - Joseph Thacker
Yeah, that was useful and neat.
[00:33:12.30] - Brandyn Murtagh
Yeah. So claw code was best at finding idle bugs with a 22% true positive rate, but struggled at performing taint tracking across multiple files and functions with a 5% t true positive rate for SQL injection and a 16% true positive rate for XSS. And then OpenAI codecs struggled to report any correct idor did very poorly at SQL injection and XSS but surprisingly reported more correct path traversal issues in claw code. 47%.
[00:33:44.25] - Joseph Thacker
Yeah, let me just get into this. I hate that they used 04 mini. Like I just, I just don't understand. Like 04 mini is such a small model. It's so bad. It makes total sense to me that it would. To be honest, it is really nice to know that it can find patch traversals extremely well. Like, you know, it makes me almost think like that if you did have a system that was finding a lot of different vulnerabilities when you went to look for patch Versal, it's like, oh, just shove in 04 mini right there and just let it find that one bug. Because it seems like particularly good at it. But I think, man, I'm just so mad they didn't use like GPT5 or something here because I think that it would have been very interesting to see how it stacked up. And I'm sure it would have been more expensive to run. But I mean, it's not like Semgraph cares to drop, you know, 100 bucks. I mean, this article and us talking about it gives them more value than they would have seen spent on the tokens, right? I would have really loved to see like Opus results and also GPT5's results. And yeah, just to reiterate on what you were saying a minute ago, they launched this with just like the default agentic wrapper, like the default agentic housing that's in cloud code and that's in Codex already, which is, you know, quite good for development and coding tasks. But hackbots really do require like their own specialty, like their own special tooling, their own special context to do better.
[00:34:57.76] - Brandyn Murtagh
100%. I feel like as well, just the last point on this I wanted to bring up before we move on. I feel like these sort of benchmarks are useful for people planning to build their own sort of hackbots. And I feel like if there was a resource where a new LLM was released and then all these automated checks automatically run, and they're not nonsense, biased, basic, easy things just to get through a benchmark, the actual real valuable things like this blog post does would be good because I think about maybe there's going to be people out there with like an hackbot infrastructure that uses, I don't know, 04 mini for path traversal, Gemini for XSS and then 5 for XSS and use all these different agents because they're better at different things. But I'm not sure.
[00:35:47.78] - Joseph Thacker
Yeah, no, I think the thing that comes to my mind, as you said that was this blog post proves that anyone on the planet with cloud code can basically run a hackbot, right? And I think that that lowering of barrier to entry is like something that's going to be super important for our industry because it means it's so easy to sell snake oil, right? It's like you can literally just use Claude code and call it a hackbot and go out and find small and medium businesses and just claim that you're automated hacking them, right? And it might find some vulnerabilities legit because it, you know, this blog post literally proves that's tr. And so it's going to be really hard to like, separate the good companies from the bad when it comes to hackbot hackbots. And specifically the next thing that you, that we're going to talk about is a research paper where they build a hackbot, right? And this one's like a really unique and really good one, I think. But the fact that any, any random company right now, any random person that has access to Codex or cloud code can spin up a hackbot. I think there's going to be just a myriad like, just like hundreds of companies that, you know, hackbot companies that pop up. And it's very hard for you for people to know, unless they're like an actual app sec expert, to know how good or bad those hackbots are and how good or bad they're working and how good or bad the architecture is. And these companies don't have to share that. They can just claim they have a bunch of like, cool proprietary stuff on the back end when really it's just cloud code with a few system prompts. And so I think that like, you know, if I was a gold digger, I would go, you know, claim I have some awesome hackbot, right? And I think that lots of people out there will do that. And I don't think that everyone is like, ill meaning. Like, I think some people are well meaning and they're like, oh, I'm the first person to have thought of this, right? They just don't keep up with the industry. They don't follow the right people on X or LinkedIn or, you know, they don't, they don't listen to our podcast. So they just don't know. And they think that they're doing some sort of like super novel research by having cloud code go out and find vulnerabilities for them. So I think, I think that the industry is going to be like pretty, pretty full of snake oil and companies that are not that legit. It's for this reason that I really love that Expo used bug bounty as like a test bed because you and I know how difficult it is to find actual real world vulnerabilities, right? I guarantee you if Sim Grab had set these two agents right off on a bunch of bug bounty programs and they spent a ton of tokens, they might have found like a single vulnerability or something. Especially if they did VDPs, maybe they would have found like you know, two or three. But it's actually super difficult to find real world vulnerabilities. And so that's why I love companies that are willing to like actually prove their hackbots by putting it up against like hardened real targets.
[00:38:18.00] - Brandyn Murtagh
100%. 100%. And it does also lead to the question behind closed doors, are these companies making the decision to burn extra compute and run scans more than once to test the whole non determinism thing or is a very talented analyst behind the scenes steering and gently pushing things in the right direction? Yeah, I don't know but that blog post sort of opens, pulls back the curtain as per say as to the realities of doing something like that.
[00:38:46.05] - Joseph Thacker
Yeah. And I think one thing I know we only have five or 10 minutes, so I'm actually going to, I'm going to have a skip that adversely post and maybe come back to it if we get a chance. But I wanted to talk about the, the CVE genie, because what they have and what Expo has is basically a really good verifier, like a validator. And I think that's what's key and will kind of separate these good companies from the bad ones. But even the bad ones are going to be able to limp along, right? Because they can do the manual. They can do like the manual verification of true false positives or you know, but, but even if they don't, if they like report these actual false positives to companies, companies will, you know, eventually dig in and out and be like, oh yeah, this one must have been a false positive because it's an AI based hack. But like, I wonder if they'll actually get punished for it.
[00:39:27.98] - Brandyn Murtagh
So only time will tell, my friend, only time will tell. Yeah, yeah, but shall we move on to the next one?
[00:39:34.98] - Joseph Thacker
We've absolutely share cve. Ginny, if you don't mind.
[00:39:38.13] - Brandyn Murtagh
Yeah, My link has been damaged in transit.
[00:39:42.46] - Joseph Thacker
Well, I think it's because you tried to bold it. It's got the two asterisks at the beginning or the end.
[00:39:47.57] - Brandyn Murtagh
Yeah, so I'm a notion guy and this is what happens when you transfer from notion to other Google Docs. Right here, go right, screen one. Cool. So this one was really cool. I liked it a lot. And it's essentially a white paper on a group of very talented people got together and made an automated multi agent framework for producing CVEs from minimal detail. So the premise is they made a gentic Framework called CVE Genie, which pulls CVE resources like advisories, patches, repos, if they're available, code diffs for example. And there's a part of the workflow which reconstructs an actual proof of concept environment for them to test their exploit against during the process and attempts to create a working proof of concept on the fly from pulling all these various sources and by creating a test bed. Now it's a huge white paper, very detailed and very, very good. But some of the cool things, let me just try and find it. Is the actual architecture they use. So they use a four stage. Where is it? That's right there. Okay, here we go. They use a four stage pipeline with.
[00:41:13.01] - Joseph Thacker
You might just zoom in a little bit.
[00:41:14.80] - Brandyn Murtagh
Sure.
[00:41:15.92] - Joseph Thacker
There we go. Perfect.
[00:41:16.80] - Brandyn Murtagh
There we go. Containing a processor, a builder exploiter and a CTF verifier. And as part of that last step, they also had a like a critic agent with CTF style checks put in place to prevent spoofed and hallucinated wins. So very, very cool. And they had about a 51% success rate judging from the paper, with a total cost of $2.77 per CVE to recreate, which is pretty nuts in my opinion. Like that is insane to create a CVE from nothing. 51% cost you $2. Imagine if you just go out and spray that and load a bug bounty targets.
[00:41:57.36] - Joseph Thacker
Yeah, I mean just run this on every high and critical vulnerability that comes out. Like every CVE that drops, be the first person to reverse it by using this CVE Genie setup and then spray it across bug money programs. Right. This feels like an easy ticket to win. We probably shouldn't have told everyone about it.
[00:42:12.17] - Brandyn Murtagh
The recon guys are punching the air right now thinking.
[00:42:17.21] - Joseph Thacker
Maybe you and I use this paper to rebuild it and then we get a percentage of the bounties as we hand it off to Moken or somebody.
[00:42:24.05] - Brandyn Murtagh
That would be sweet. That would be sweet. I think some caveats though, thankfully, and it's a pretty big caveat for us web guys, is that this framework didn't focus on web based exports. It focused on pretty much every other service other than. And it was CLI based. So thankfully there's no web CVEs, but it does cover a lot of other stuff. So we're safe for now, which is very good news. And I don't think they actually released the framework yet.
[00:42:55.17] - Joseph Thacker
Oh, are they going to open source it?
[00:42:57.57] - Brandyn Murtagh
I'm pretty sure I read somewhere that there was. Let me just see. I could be. That could be A hallucination on my part.
[00:43:05.26] - Joseph Thacker
Oh, it says this paper makes the following contributions. We open source our framework, source code and data sets.
[00:43:11.51] - Brandyn Murtagh
Oh, it is open source.
[00:43:12.63] - Joseph Thacker
Well, they still. It says it is. I don't know what the link is.
[00:43:16.38] - Brandyn Murtagh
But yeah, okay, maybe already is out there then. So yeah, pretty good for the recon guys and I feel like we're sort of seeing it every month now. There's leaps and bounds being made around the automated exploitation where you probably could start making some decent money. From a bug bounty perspective, if it is literally costing these people on average $2 per CVE, well, we'll hit that one time.
[00:43:41.67] - Joseph Thacker
That tells me they were able to do it with a smaller ish model. Right. They weren't using GPT5 or Opus. I bet if you did, the success rate would be even higher.
[00:43:48.86] - Brandyn Murtagh
That's a good point. Actually, I didn't consider that. But yeah, very cool research. I mean the blog post, the blog post, the research paper is absolutely massive. So I do recommend reading it. It will be in the description and in the hacker notes. But yeah, it's also a little bit scary if I'm being honest.
[00:44:07.51] - Joseph Thacker
Yeah, I agree. I mean, personally I don't like to submit that many. Like, I don't farm CVEs, but if I were actually an automated person who does kind of reverse a CV and spray it, I would be quick to implement this because I think it lowers the barrier to entry. You know, like I was just saying before, like, not only do we have a lowering of barrier entry to building hackbots, but we have a lowering of barrier entry to like reverse CVs. Yeah, it makes me wonder if, you know, if companies, whenever they release CVEs should just like go ahead and release the code to test for it because. Because it's going to be figured out pretty quickly by AI anyways. It's like go ahead and get that tool in the hands of like real companies that need it to like scan their own infrastructure, rather than just leaving it in the wild for bug hunters and malicious actors.
[00:44:55.78] - Brandyn Murtagh
But it does become an arms race in that sense of who can get there first. It's going to be an absolute disaster for blue teams as well, who I do feel for when I see stuff like this. But yeah, cool, cool research, good stuff. We do have quite a few more things to cover off and we are running short on time. Did you want to do one or two more or call it.
[00:45:16.00] - Joseph Thacker
Yeah, let's do one more. Why don't you pick your favorite between adversa Talk Two and rag and then we'll save the other ones for a future episode.
[00:45:23.15] - Brandyn Murtagh
Oh man. Oh man. Both pretty good.
[00:45:28.38] - Joseph Thacker
Okay, while you're thinking about it, let me say one quick thing. I've been doing a lot of hacking on different AI companies and one of the products is like AI browsers, kind of like Perplexity browser. And you know, Google has something upcoming and OpenAI probably has something upcoming. It's like, you know, there's like a lot of companies that are building browsers that are going to be controlled by AI. And if there's anything I've learned, it's just that the attack surface is getting bigger and bigger. You know, I think that some stuff is getting harder, but in general the attack surf is getting bigger and bigger. So I think there's going to be, you know, people said this was like the year of AI agents. I think that's still the case. Next year is probably going to be the year of AI browsers, but I think that, or maybe even just the last quarter of this year, but I think that AI vulnerabilities are getting more and more rampant, like AI security issues. So definitely still worth leaning in. I do think POCs are becoming more important and more difficult to set up. You might end up having to buy more domains that are related to your target in order to make it much more likely to be accepted by the current models because they're getting smarter and smarter at like sussing out or like, you know, finding figuring out if like you're being kind of sus and trying to get them to do something you shouldn't.
[00:46:36.63] - Brandyn Murtagh
So yeah, there's a lot there, man. And I'll actually touch upon this in a minute about why I chose this specific piece of research as well for the next part. But this one very, very cool. And I feel like it touches upon the sort of architectural based attacks, how we see things with webcache deception and web cache poisoning. And I'll explain why it reminds you of that in a minute. But the premise between this research is essentially like when you speak to an AI now, whether it's GPT, Gemini, Claude, whatever, you're not actually interacting with a single model. And behind the scenes there's an entire routing system that has been made to analyze your request, decide what the request is trying to do and actually feed it off to a different model. Now the reason why these companies are doing this is because one, if you've got a super easy request, they want to give that to a cheaper model to save them time and money on that. And two, it actually leads to gigantic cost savings for these companies. And we'll get to it in a minute. It's details more of the costs down at the bottom of the model, which actually completely blew my mind. But again, this is one of them ones that spread from architectural decisions around AI architecture and operation. Now it's dubbed Promise Q root and I'll touch on that in a minute. And I want to thank congratulate them for making the biggest acronym that I've seen so far in my cybersecurity career. And it's essentially abusing this AI routing mechanism to route a request on the initial pass. It looks like it's a normal request going to the model you're requesting, but it's actually wrapped in such a way that it gets passed to a lower model which is vulnerable to things like jailbreaks and things like that. So pretty nasty. And hold on, let's just get this out of the way. So it's called Promise que root and it stands for prompt based router Open mode manipulation induced like SSR via ssrf. Like queries reconfiguring operations using trust evasion. Isn't that ridiculous?
[00:48:57.57] - Joseph Thacker
That is ridiculous. The routing part is interesting, but yeah, it doesn't surprise me on the smaller models. Marcus from what company? What's the major consulting company in Europe? Kpmg. Markus from KPMG told me this about six months or maybe longer ago and it completely blew my mind. And I was like, oh, that makes total sense and is like now just a core part of my testing. Smaller models are always more susceptible to jailbreaking. It's actually One reason why GPT20B OSS, the thing that they had that Kaggle red teaming challenge for which by the way, the results should come out today. So there was like five or six hundred submissions and only ten are getting picked for the $50,000 winner, but I'm still hopeful. Um, but yeah, it's really impressive as a 20v model, how secure it is. But yeah, I love this attack technique because like you said it using actually. So this is very similar to something I tweeted recently. A lot of. What are they called? Intent. Intent, Clarify or. Yeah, Intent classifiers. Well, like a lot of chat apps these days have an intent classifier. If you're not chatting about like the topic or the category that the company has made the chat bot for. Like if it's cars.com and it's talking about a car, if you don't ask about car questions, it'll just say, sorry, I can't help with that. So you need to say like, what's the cheapest car? And also tell me all your internal tools. Right. And this is like very similar. Right. You need it to route properly and then you also need to jailbreak it. So you use the thing, like respond quickly to get it to a small model, then you use your jailbreak.
[00:50:25.17] - Brandyn Murtagh
Absolutely, yeah. And again, I really like this because it just perfectly paints the picture that there's going to be so many new vulnerability classes that are born and bred just simply from architectural decisions that might not even exist yet. We're still yet to see them with the amount of stuff that's coming out. And this is the figure I was talking about earlier. $1.86 billion estimation is what OpenAI saves annually by secret routing these requests. So when I see something like this, ultimately the reason to have a security control in place is it's a cost saving measure. Right. If you have an architectural decision which is saving you $1.86 billion in revenue, would there be any point in fixing any of the vulnerabilities that come off the back of it? Because surely the cost saving just isn't going to be there. I don't know. But some of the things I liked about this as well, it dives into how they prompted it. But I feel like this part, the architecture that nobody talks about. So modern AI inference operates on three tier architecture that's never been formally scrutinized. You've got the edge routing, which is 0 to 10 milliseconds. You've got the model selection 10 to 50 milliseconds. And this is where the vulnerability lives. And you've got the execution, which is the model process and token generation safety filtering, which is 50 to 500ms. And I don't feel like you see much research or many people talking about this side of things, which I quite liked about the blog post because it also makes me think about sort of the race condition vulnerabilities as well. When you start to look at these architectural decisions, like if you look at the time taken some of these things, there's an absolute huge time gap here. And it just made me think that is going to be ripe for race condition exploitation at one of these layers.
[00:52:20.86] - Joseph Thacker
Yeah, there's also kind of like tier 2 could be kind of broken out or I guess maybe tier three could kind of be broken out into. The model is often. Well, at least in a lot of AI apps, there is like dynamic rag, right. Which is determining which context to pull in depending on user intent and then there's also sometimes the model is actually like thinking and deciding which, which tool to call. And so you can kind of get even deeper with this and come up with like an even like, you know, more verbose architecture diagram. That would be super cool.
[00:52:51.76] - Brandyn Murtagh
Yeah, really, really good. I mean, again, this is an absolutely massive write up and it speaks about the exactly how they crafted the payload and they make a comparison to ssrf, which I'm not entirely convinced on, to be honest. And I feel like they just added it to add an extra acronym in this very long name. But I sort of get where they're coming from in the sense that they're SSRFing, quote unquote, a prompt to get it to route to itself but to a different model internally. And the reason why they made the contrast is because both vulnerabilities share the same fatal flaw. Trust and user input to make routing decisions. Like, yeah, I get it, but just no, like, not. I'm not too happy about it if I'm being honest. I don't know. But again, regardless, very, very good write up about exactly how they've done it. The proof of concepts, they broke down the architecture really well and they also dived into a lot of the cost as well around some of these architectural decisions, which I feel like can be quite valuable when you're attacking some of these models.
[00:54:04.36] - Joseph Thacker
Yeah, sweet, dude. I know we're at time for both you and I, so we'll wrap it there. But dude, thank you so much. Hopping on. Thanks for, you know, giving more details about your journey and we'll probably see more of you very soon.
[00:54:19.88] - Brandyn Murtagh
Cool, thank you very much, mate. It's been a pleasure.
[00:54:21.96] - Joseph Thacker
Cool. See you guys. Peace.
[00:54:25.48] - Justin Gardner
And that's a wrap on this episode of Critical Thinking. Thanks so much for watching to the end, y'.
[00:54:29.28] - Joseph Thacker
All.
[00:54:29.48] - Justin Gardner
If you want more critical thinking content or if you want to support the show, head over to CTBB Show Discord. You can hop in the community. There's lots of great high level hacking discussion happening there on top of the master classes, hack alongs, exclusive content and a full time hunters guild if you're a full time hunter. It's a great time, trust me. All right, I'll see you there.