Thoughts / suggestions on refactoring an old 210k line codebase?

after being a mediocre coder for 25 years, i’ve had so much fun again coding with cursor… some stuff was just so repetitive that i lost the fun in it over the years.

i’m like 100x more effecient with cursor and python than with phpstorm and php5.6 :frowning:

anyways, i’m curious on suggestions for refactoring a 210k line php5.6 codebase… it was poorly written from the start… it’s terrifying to think of, but kind of an oddly fun sounding challenge with something like cursor one day…

i recently tried giving large chunks of large files to o1, just to see if it could simplify overall functionality of different functions… but there are just SO many conditional statements… its so scary… i feel like i’d need something more in the end, with tracking old players who are stuck at various points or something… idk… i dont think 1000 unit tests would even be enough…

i love TDD principles with cursor, and i know with twitter/x, they basically setup all the unit testing while switching codebases… i wonder how many lines of code that was too… i think my app probably has more whistles and band aids than old twitter code… haha.

not sure if it was george hotz or someone else explaining how unit testing is just so important… and i’ve really come to agree with that…

Interesting question!
If you are familiar with the code base, start by breaking into ai-digestible chunks yourself. tell the llm the inputs and the ultimate goal of your codebase, tell it that it’s 210k lines, and that is too much for llm. write as much as possible and tell it to help you refactor the code top-down.

once you have the high level framework, pull the correct parts of the code and tell it to build it bottom up. depending on how your old code was written, you might be able to use the output of your old code as unit tests.

good luck.

2 Likes

ah, didn’t think of using it’s outputs as part of the unit tests… could definitely do that… could build some sort of mini api that returns database states… i’m also curious if i should just keep letting this idea simmer while context lengths improve so it can comprehend more at once…

i think the trickiest part with the codebase will be all the states… it’s an old passive game that has so many different states, linear progression, dynamic progression… all interacting in a tangled web.

i guess i’ll just keep working on my larger and larger python codebase for another thing i’m working on, as i’ve noticed issues already with just having too large of various files and not making things modular enough…

Precicely this!

Plug the following: break it down from naive to expertt.

First; ask a thing " tell me what this is "

START AS DUMB as that.

" provide a detailed .rmd, use appropriate language nad frameworks and references.

BUILD A TABLE FOR THAT WHICH YOU DO NOT KNOW.

Be as verbose as possible.