Edit/Disclaimer: this is a repost from something I put in
LocalLLaMA, but with some tweaks for the
r/cpp
crowd - this post is more focused on the content of the
dataset itself, the post over in
r/LocalLLaMA
is more focused on the details of the finetune
Hi all,
I've recently been thinking about putting together a
community sourced coding dataset for finetuning models,
with a heavy focus on cpp and systems programming.
My goal is to eventually have a model that understands
concepts like memory ownership, thread safety,
optimization, etc. Right now, a lot of the coding
knowledge of small (<100B), local models centers around
languages like js, py, html, etc.
Right now I'm thinking that the categories I would need
would look something like this:
- generation: basic prompt/code output
- optimization: heres slow/bloated code, make it better
- debugging: im getting this error pls fix
- organization: code review, interface design,
restructuring, tradeoff decisions
- tool_calling: exercises involving tool use and
interpreting results
Curious to see what the people over here think about this
kind of thing. I imagine many people in here have used
local AI to help code in cpp before - where do you guys
feel like local models could use the most improvement?
Thanks in advance for all the help!
[–]tartaruga232MSVC user, r/cpp_modules 1 point2 points3 points (1 child)
[–]True_Tangerine_4706[S] 0 points1 point2 points (0 children)