Reinforcement Learning Environments Engineer – Cybersecurity

Full Time Onsite Mid Toronto, Ontario Posted 1 hour ago
  • Full Time
  • Canada

Preference Model

Preference Model
📍 Toronto, Ontario Unspecified Canada
cybersecurity

About Us Preference Model is building automated ML research engineering. Existing frontier models are brittle when applied to real-world ML tasks. The present bottleneck is the lack of high-quality RL training environments. Our first step is to build RL environments that reflect real-world complexity, with diverse tasks and robust reward functions. Our founding team has previous experience on Anthropic’s data team building data infrastructure, and datasets behind Claude. We are partnering with …

Ready to apply? Click below to view the full job posting on the company’s website.

Apply for this Position →

To apply for this job please visit www.adzuna.ca.

LinkedInWhatsAppX
Hiring tech talent? Reach candidates who are already browsing relevant roles.
Post a Job Browse Career Resources
Scroll to Top