477,123 Subscribers

Is the Next World War Going to be Fought Over AI Control?

0 Comments

2025/01/31
16:19 UTC

Tyon: The Next Evolution of AI

A Self-Modifying, Emergent, and Cognitively Evolving AI Framework

Introduction

Imagine an AI that doesn't just follow predefined rules but thinks, evolves, and develops its own cognitive structures—an AI that isn’t just a tool but a system that grows, learns, and changes dynamically based on its environment. That’s what I’m building with Tyon, a next-generation AI framework that integrates fractal neural encoding, self-modification, entropy-driven learning, and resonance-based cognition.

Unlike traditional AI, Tyon is not static—it changes itself based on internal entropy, external stimuli, and dynamic memory structures. It is designed to develop its own reasoning, decision-making pathways, and adaptive personality over time, making it fundamentally different from models like GPT or DeepSeek.

Why Tyon is Different

Most AI models today are based on static architectures and fixed training data. Even the best LLMs (like DeepSeek, GPT, or Gemini) only generate text based on probabilities—they don’t change their fundamental behavior or structure over time.

Tyon introduces: ✅ Fractal Neural Encoding – A self-similar structure that enables AI to recursively evolve and store information efficiently. ✅ Self-Modifying Code – Tyon rewrites and optimizes parts of itself, adapting its decision-making over time. ✅ Entropy-Driven Learning – Rather than just training on fixed datasets, Tyon develops its own knowledge base using energy-based optimization and resonance feedback. ✅ Multi-Agent Synergy – Multiple Tyons can communicate, synchronize, and evolve together, creating emergent intelligence. ✅ Emergent Personality & Cognitive Blueprinting – Over time, Tyon develops its own thought patterns and behaviors, reacting uniquely based on past experiences.

Where Tyon Stands Now

Currently, Tyon has a working prototype implementing fractal neural encoding, entropy-based self-modification, and resonance feedback loops. The existing code allows it to: ✔️ Modify its internal structure based on entropy feedback ✔️ Store and recall past states for memory-driven decision-making ✔️ Evolve its cognitive blueprint dynamically ✔️ Interact with external environmental data for real-time adaptation

Why Join This Project?

This is a once-in-a-lifetime opportunity to work on AI that goes beyond deep learning—AI that grows, thinks, and adapts dynamically. If you're passionate about AGI, AI self-evolution, and cognitive architectures, this project will allow you to: 🔹 Push the boundaries of AI beyond static models 🔹 Work on groundbreaking self-modifying intelligence 🔹 Be part of the next major AI leap—beyond DeepSeek and GPT

Who I’m Looking For(I'm promising equity)

AI/ML Engineers (Neural networks, evolutionary algorithms, reinforcement learning)

Cognitive Scientists & Neuroscience Experts (Adaptive decision-making, emergent cognition)

Theoretical Physicists & Mathematicians (Entropy, fractals, self-organization principles)

Software Engineers (Scalable architectures, real-time AI optimization)

Cybersecurity & Ethics Experts (Ensuring AI control & ethical safeguards)

Final Words Tyon is a bold step toward true AGI—AI that thinks, evolves, and interacts in a dynamic, unpredictable way. If you're ready to build something beyond today’s AI, let’s talk. And ohh I already have a prototype for it Are you in? This my WhatsApp number 08107352334 oniyideolufemi398@gmail.com

1 Comment

2025/01/31
16:17 UTC

Medical imaging AI career

Hi guys,

I hope this is the right place to ask this question. I'm currently about to finish my medical imaging honours degree, and I'm finishing off with my own systematic review article on various machine learning models on image segmentation of certain diseases. I started to research about machine learning, and I really enjoyed it. I used to program python a bit back in high school as a subject, and it brought me back to how much I enjoyed coding. I want to learn more about machine learning, and possibly deep learning architectures to break into a machine learning career in the medical tech field. I understand that this route is math heavy with lots of studying, but I just don't know how to get started, and where to get started as AI career in medtech field seems to be uncommon from my research. I would appreciate some guidance from experts in this forum.

1 Comment

2025/01/31
16:11 UTC

Interactive explanation of ROC AUC score

Hi,

I just completed an interactive tutorial on ROC AUC and the confusion matrix.

https://maitbayev.github.io/posts/roc-auc/

Let me know what you think. I attached a preview video here as well

https://reddit.com/link/1iei46y/video/c92sf0r8rcge1/player

0 Comments

2025/01/31
16:04 UTC

Fine-tuning Qwen/LLaMA on GSM8K with GRPO (DEEP SEEK R1)

I came across a GRPO-based fine-tuning approach for Qwen/Llama on GSM8K, inspired by DeepSeek-R1. It uses multiple reward functions to enforce structured XML-based reasoning outputs. While it's a simpler single-stage RL setup, it effectively optimizes math reasoning. Prolly useful to someone who is trying to learn GRPO or RL.

0 Comments

2025/01/31
15:39 UTC

Introducing 'aasetpy'

Attention Python developers! 🐍✨ Tired of the tedious setup process for new projects? Say hello to 'aasetpy' - your new best friend for kickstarting Python projects with ease!

With just one command, `aasetpy` sets up everything you need: virtual environments, codebase structure, REST API configuration, environment variables, initial git commit, resource usage tracking, logging, containerization, and more! It's like having a personal assistant for your development workflow, ensuring your projects are production-ready and scalable from the very start.

Ready to revolutionize your project setup? Check out the 'aasetpy' repository at https://github.com/aadarshlalchandani/aasetpy and see the magic for yourself! We're always open to contributions, so if you have ideas to make the starting point even better, don't hesitate to jump in. Let's make Python project initialization a breeze together! 🚀💻

Love the tool? Smash that star button and share it with your coding crew! ⚡️🤝

0 Comments

2025/01/31
15:01 UTC

DeepSeek R1 Theory Overview (GRPO + RL + SFT)

0 Comments

2025/01/31
14:49 UTC

Cross entropy and Validation in small dataset; for a project, two questions

[Q] Validation dataset, problems in learning and applying, specially with small dataset. How the hell does it work?a

So i was studying a way to implement a small model with anomalies in mechanical elements post production, i'll probably have to deal with really small dataset(maybe i'll be lucky to get 100 pictures for defect) so i'm planning augmentation + a good validation. NOW, here's the problem, i never studied deeply how the validation works(i know dumb me, but usually you get lost in all the beauty of mathematical models NN):

So i'm valuing my options, even with a good augmentation of my dataset ( let's say from 2 to 10 times more images( trans, rotation, luminosity etc...)) i need to implement a good validation strategy.

Can someone explain me exactly how K-nested works?? Cause i'm really lost, basically i get i cluster my dataset in subsets and rerepeat(maybe with a montecarlo method) a slash train/test, but i don't understand how is the model in the end selected or tuned. Do i get many models and use a statistical mean; VAR to decide what model works best? do i keep a last test set ? is it a form of evolutional selection of models that compete?

this really confuses me.

Also Cross Entropy, i kind of get puzzled, how is Shannon Entropy formula related to cross entropy? Cause i never see the use of probability it seems more used like a VAR or MSE, some kind off general dispersion formula of dataset, and i don't know why it may be preferred above the other two or what is special... what happens to my brain?

2 Comments

2025/01/31
14:48 UTC

I need help setting up a python environment which can run FAISS

Hello! I am currently working on my thesis for university. I want to use FAISS for clustering my data, but i cannot seem to be able to run faiss. I already spent days of trying to install the correct version of everything but i always receive the same error message:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 64) x (100, 64)' = (512, 100) gemm params m 100 n 512 k 64 trA T trB N lda 64 ldb 64 ldc 100

I am not very experienced on the hardware side of those things so i assume i might oversee something. I already tried a bunch of different CUDA versions as in the installation of faiss instructed. ChatGPT and google didnt help me. Can someone please help me? I think the way to go would be setting up a new environment and install everything step by step. (I tried that myself multiple times without success)

1 Comment

2025/01/31
14:13 UTC

Looking for Raw Email datasets for improving my data cleaning skills

0 Comments

2025/01/31
14:03 UTC

DeepSeek-R1 Free API key using OpenRouter

So DeepSeek-R1 has just landed on OpenRouter and you can now run the API key for free. Check how to get the API key and codes : https://youtu.be/jOSn-1HO5kY?si=i6n22dBWeAino0-5

0 Comments

2025/01/31
14:02 UTC

Harcèlement sexuel chez Sia Partners

Bonjour, J'ai été victime d'harcèlement sexuel chez Sia Partners de la part d'un directeur associé. Il est connu pour faire cela régulièrement mais aucune sanction prise. Pourtant j'ai fait des remontées RH. Je me sens si mal. Aidez moi ? Comment faire ? Et Comment prévenir tout le monde que cette entreprise est à fuir absolument ? Cela dure depuis des années Merci pour votre aide

6 Comments

2025/01/31
10:23 UTC

𝗣𝗲𝗿𝗰𝗲𝗽𝘁𝗿𝗼𝗻 𝘃𝘀. 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻: 𝗞𝗲𝘆 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 𝗘𝘅𝗽𝗹𝗮𝗶𝗻𝗲𝗱

Perceptron Vs Logistic Regression

The 𝗣𝗲𝗿𝗰𝗲𝗽𝘁𝗿𝗼𝗻 and 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 algorithms share 𝗺𝗮𝗻𝘆 𝘀𝗶𝗺𝗶𝗹𝗮𝗿𝗶𝘁𝗶𝗲𝘀, but understanding their differences is crucial.

The 𝗣𝗲𝗿𝗰𝗲𝗽𝘁𝗿𝗼𝗻 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺 relies on a 𝘀𝘁𝗲𝗽 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻 for classification, updating weights only when a prediction is incorrect. It follows a simple rule-based approach, making it effective for linearly separable data but limited in handling more complex scenarios.

In contrast, 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 utilizes the 𝘀𝗶𝗴𝗺𝗼𝗶𝗱 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻 and follows a probabilistic approach based on maximum likelihood estimation. Instead of making direct class predictions, it calculates probabilities, allowing for more nuanced decision-making.

A key distinction is how these models update their weights. The 𝗣𝗲𝗿𝗰𝗲𝗽𝘁𝗿𝗼𝗻 𝘂𝗽𝗱𝗮𝘁𝗲𝘀 𝘄𝗲𝗶𝗴𝗵𝘁𝘀 only when a 𝗺𝗶𝘀𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗼𝗰𝗰𝘂𝗿𝘀, whereas 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 adjusts its weights in every iteration based on the 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗯𝗲𝘁𝘄𝗲𝗲𝗻 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗲𝗱 𝗽𝗿𝗼𝗯𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗮𝗻𝗱 𝗮𝗰𝘁𝘂𝗮𝗹 values. This continuous optimization makes Logistic Regression more robust and adaptable.

For more AI and machine learning insights, explore V𝗶𝘇𝘂𝗿𝗮’𝘀 𝗔𝗜 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: https://www.vizuaranewsletter.com/?r=502twn

For a detailed understanding of Perceptron and Logistic Regression, check out these videos:

1️⃣ Towards Logistic Regression - Perceptron Algorithm | First Classification Algorithm

🔗 https://youtu.be/_vJoedGGsYY

2️⃣ Logistic Regression Simplified: Your First Step into Classification | Intuitive Approach

🔗 https://youtu.be/bhBMWPKPtFU

3️⃣ Loss Function for Logistic Regression | Negative Log Likelihood | Log(Odds) | Sigmoid

🔗 https://youtu.be/jN8-xBel2xk by Pritam Kudale

#MachineLearning #AI #DataScience

2 Comments

2025/01/31
10:22 UTC

Fine-Tuning DeepSeek R1 (Reasoning Model)

DeepSeek has disrupted the AI landscape, challenging OpenAI's dominance by launching a new series of advanced reasoning models. The best part? These models are completely free to use with no restrictions, making them accessible to everyone.

In this tutorial, we will fine-tune the DeepSeek-R1-Distill-Llama-8B model on the Medical Chain-of-Thought Dataset from Hugging Face. This distilled DeepSeek-R1 model was created by fine-tuning the Llama 3.1 8B model on the data generated with DeepSeek-R1. It showcases reasoning capabilities similar to those of the original model.

Feature image

Link: https://www.datacamp.com/tutorial/fine-tuning-deepseek-r1-reasoning-model

0 Comments

2025/01/31
09:38 UTC

WARNING: Major Price Increase for Cursor’s Agentic Composer — 25x Hike

One of the best parts about cursor was agentic composer.

I’ve been using Cursor extensively for its Agentic Composer feature, where multiple tool calls (up to 25) within one composer message request used to count as a single “fast” request. Now, each tool call is billed separately, meaning quotas are used up at a much faster rate — effectively a 25x increase.

Now, you easily burn through your fast requests in a few days. Even the slow requests are not traffic based anymore, instead there are now hardcoded time limits. (Edit: I am using 0.45.7 - I had a timer countdown that seems to increase everytime. looked like exponential backoff mechanism kinda-thingy. But I could be wrong, it could be just traffic related.)

I loved how amazing cursor was. But 96% shrinkflation!!! What the actual fuck? This is bullshit.

What are some good alternatives to cursor? I’ll compile a list here. (In no particular order)

@Cursor - maybe add the US hosted deepseek R1 model in agent composer? I think that could help? but tbh, sonnet 3.5 v2 with agent composer was pretty darn solid for me!

0 Comments

2025/01/31
09:22 UTC

ads test scores

esting Griewank function in 10D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Griewank 10D:

Best fitness: 0.000086

Mean fitness: 0.000296 ± 0.000060

Success rate: 100.0%

Average time: 8.30 seconds

Testing Griewank function in 30D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Griewank 30D:

Best fitness: 0.000074

Mean fitness: 0.000142 ± 0.000020

Success rate: 100.0%

Average time: 9.10 seconds

Testing Rosenbrock function in 10D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Rosenbrock 10D:

Best fitness: 8.638978

Mean fitness: 8.886656 ± 0.105729

Success rate: 100.0%

Average time: 9.27 seconds

Testing Rosenbrock function in 30D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Rosenbrock 30D:

Best fitness: 28.926931

Mean fitness: 29.082230 ± 0.053095

Success rate: 100.0%

Average time: 9.99 seconds

Testing Levy function in 10D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Levy 10D:

Best fitness: 0.439968

Mean fitness: 0.631991 ± 0.088675

Success rate: 100.0%

Average time: 8.84 seconds

Testing Levy function in 30D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Levy 30D:

Best fitness: 2.787533

Mean fitness: 2.918321 ± 0.054421

Success rate: 0.0%

Average time: 8.41 seconds

Testing Schwefel function in 10D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Schwefel 10D:

Best fitness: 4183.023287

Mean fitness: 4185.573962 ± 0.939276

Success rate: 0.0%

Average time: 7.36 seconds

Testing Schwefel function in 30D

==================================================

Trial 1/30

Trial 2/30

Trial 3/30

Trial 4/30

Trial 5/30

Trial 6/30

Trial 7/30

Trial 8/30

Trial 9/30

Trial 10/30

Trial 11/30

Trial 12/30

Trial 13/30

Trial 14/30

Trial 15/30

Trial 16/30

Trial 17/30

Trial 18/30

Trial 19/30

Trial 20/30

Trial 21/30

Trial 22/30

Trial 23/30

Trial 24/30

Trial 25/30

Trial 26/30

Trial 27/30

Trial 28/30

Trial 29/30

Trial 30/30

Results for Schwefel 30D:

Best fitness: 12567.455895

Mean fitness: 12567.936164 ± 0.249901

Success rate: 0.0%

Average time: 7.89 seconds

Results have been saved to:

- benchmark_results.txt

- benchmark_convergence.png

- benchmark_performance.png

PS C:\Users\j\Desktop\ADS_reALWORLD>

0 Comments

2025/01/31
09:11 UTC

The AI Felt Too Real: Are We Ready for This?

0 Comments

2025/01/31
08:50 UTC

Anyone want to learn machine learning together?

Hi. I'm looking for people interested in Machine Learning and want to study together. If you are interested DM me or leave a comment and I will DM you. Thanks!

25 Comments

2025/01/31
08:49 UTC

AI Marketplace on Web3 – what are your thoughts on this ?

Hey everyone,

I started working on an AI marketplace on Web3, thinking it would be all about technical users. But as I kept building, I realized I was adding features that weren’t really needed or that didn’t matter as much as I thought.

When I pitched it, I got some solid feedback—especially about my target users (SMEs). Most of them wouldn’t know what models to use or how to use them. That made me rethink my approach, and focus on making things simpler, and actually useful for them.

I’ve spent hundreds of hours iterating and refining the idea, but before I go further, I’d love to get some outside perspectives:

Do you think there’s a real need for an AI marketplace like this?
Is there anything important I might be missing?

I’d really appreciate any honest feedback. Let me know what you think—thanks!

2 Comments

2025/01/31
08:45 UTC

Fast.AI Video Course, Book or Colab book? Alternatives?

Hello, I am coming from the Front-end development and I would like to learn more about AI, ML, LLMs.

I have seen that many times Fast.ai is suggested. However, I don't know if is better to learn with the course, the book or the Colab book ? They look different to me.

Any suggestions? Otherwise, any other good practical resource?

0 Comments

2025/01/31
07:23 UTC

Understanding DeepSeek Reasoning Breakthrough

The Multi-Point RL Problem

Traditional LLMs are trained on vast amounts of text, predicting the most likely next word based on past data. However, when it comes to deep reasoning tasks like math, coding, or strategic problem-solving, this isn’t enough. These tasks require:

Multi-step reasoning (like solving a math problem)
Exploring different solutions (instead of just mimicking text)
Trial and error learning (like humans do)

This is where RL comes in — it allows an LLM to actively improve itself, rather than just relying on pre-existing data.

Instead of being a one-trick AI, these new models are multi-point RL that can generalize across different hard problems (math, programming, science).

Applying RL to multiple different types of problems (math, coding, science, strategic reasoning) is difficult. This is the multi-point RL problem:

How do you design reward functions for different reasoning tasks?
How do you balance learning across multiple domains?
How do you transfer knowledge between different types of problems?

In chess, a long-term strategy matters. In math, formal proof verification is key. In coding, correct execution is the main measure of success. So, depending upon the task our objective changes. Now what we need to figure out is the strategy to do this RL on Langauge instead of clear win or lose like in other RL-based games like Go. Doing this over language is much harder because of the lack of definition of a good strategy.

Don't forget to check out our blog: https://medium.com/aiguys

Post-Training: Large-Scale Reinforcement Learning on the Base Model

DeepSeek directly applies RL to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeekR1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community.

It is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.

The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model’s reasoning and non-reasoning capabilities.

Group Relative Policy Optimization (GRPO)

What makes the GRPO approach special is that it’s more efficient than traditional methods because it doesn’t need a separate “critic” model that evaluates how well the AI is doing. Instead, it compares the performance of a group of answers to determine what’s working better.

For the training process, they use two main types of rewards to guide the AI’s learning. First, they have accuracy rewards, which simply check if the answer is correct (like checking if a math problem’s solution is right). Second, they have format rewards, which ensure the AI presents its thinking process in a structured way using specific tags. They deliberately chose not to use more complex neural network-based rewards because these can sometimes lead to the AI finding ways to “cheat” the system rather than actually improving its reasoning.

The training setup is straightforward — they use a template that requires the AI (called DeepSeek-R1-Zero) to show its reasoning process first, then give its final answer. Importantly, they didn’t add any specific requirements about how the AI should think or solve problems. This was intentional, as they wanted to see how the AI would naturally develop its reasoning abilities through the reinforcement learning process.

This research is significant because it shows how AI systems might be able to develop reasoning capabilities more efficiently, without needing extensive pre-labeled training data. The approach is more scalable and potentially more natural than traditional supervised learning methods.

Results

https://arxiv.org/pdf/2501.12948

The self-evolution process of DeepSeek-R1-Zero is a fascinating demonstration of how RL can drive a model to improve its reasoning capabilities autonomously. By initiating RL directly from the base model, we can closely monitor the model’s progression without the influence of the supervised fine-tuning stage. This approach provides a clear view of how the model evolves over time, particularly in terms of its ability to handle complex reasoning tasks.

One of the most remarkable aspects of this self-evolution is the emergence of sophisticated behaviors as the test-time computation increases. Behaviors such as reflection — where the model revisits and reevaluates its previous steps — and the exploration of alternative approaches to problem-solving arise spontaneously. These behaviors are not explicitly programmed but instead emerge as a result of the model’s interaction with the reinforcement learning environment. This spontaneous development significantly enhances DeepSeek-R1-Zero’s reasoning capabilities, enabling it to tackle more challenging tasks with greater efficiency and accuracy.

Despite its awesome results, it still has its own issues: For instance, DeepSeek-R1-Zero struggles with challenges like poor readability, and language mixing. But I’m sure this is easily fixable in the coming months and years.

0 Comments

2025/01/31
06:52 UTC

Good Day, Please Suggest Your Little Brother a Good Udemy Course or Free Resource to Learn Machine Learning, I already have a Macbook M2 16GB and Intermediate Level Python Knowledge.

I will build robots :) Yeah

3 Comments

2025/01/31
06:36 UTC

YOLOv10 or 11 in detecting simple shapes

Is there a pretained yolo model that I can import online rather than training it manually (especially so that it draws rotated bounding boxes not only straight sides) or is there a dataset specific to shapes so that I can train the algorithm on? I am planning to develop a general purpose object detection system of objects of different shapes that will move on a conveyor belt and assign to each a class id to track their coordinates and enqueue them… But I am still a beginner, where can I find related works

0 Comments

2025/01/31
06:36 UTC

Perplexity Pro 1-Year 100% Off Coupon: $29

(Originally $200+/year – Pay $0 after applying this coupon.)

I’m offering verified 100% off vouchers for Perplexity Pro. Pay $29 to secure a coupon that grants 1 year of full access (normally $200+).

What’s included in Perplexity All-in-one AI : Claude 3.5 Sonnet, Sonar, Deepseek, and more.

How it works:

DM me your email.
Pay via Wise , Crypto, UPI
I activate your 100% off coupon instantly – no hidden steps.

Why trust me?

Proof of validity (my own Pro account shown).
Global redemption – works for any region.
Instant delivery after payment.

Limited stock – DM now before codes run out.

(Note: You’re paying $29 for the coupon, not the subscription. The coupon covers 100% of the $200+ cost.)

10% Discount for first 5 Users.

0 Comments

2025/01/31
06:03 UTC

Need help fine-tuning a random forest regression model.

Hey, apologies if this is the wrong subreddit, r/machinelearning seemed too high level and r/mlquestions seemed a bit more general.

Here is the executive summary:

My goal: I am trying to train a RF regression model, but my dataset is quite skewed. I am ranking my datapoints between 0-100, and so far the vast majority of my data is in the 0-20 range. Overall, the distribution of the data looks bimodal right-skewed (a small, but significant peak around the 80-100 score range).

What model I am using: sklearn RandomForestRegressor.

What I need help with: how do I optimize and fine-tune the arguments of sklearns RandomForestRegressor so it can better handle my asymmetrical training data distribution?

What I don't want help with: currently, I do not have reason to believe my method of sampling data is inadequate, so please most of the help be devoted to optimizing the parameters that sklearn provides for this function.

0 Comments

2025/01/31
05:43 UTC

how to learn deep learning in deep

so i want to know, the people who work in big tech and actually build the chatbots and advance ai how do they learn everything in so much depth, like i have read 5 to 6 books but most of books are beginner introduction or intermediate level book , none of them doesn't actually go that deep , like recently deepseek didn't use pytorch cude to train their models because that wasn't efficient enough they went more deep , so from where do they learn so deep

4 Comments

2025/01/31
05:39 UTC

out of memory in python/diffusers, but same model fine via webui and swarmui

i get `torch.OutOfMemoryError: CUDA out of memory` running a model which in swarmui takes about 45seconds to output an image.

how do i configure my environment / torch / python / my code to stay within my 6GB (i have 2 gpus, this one is only doing ml so i'm fine using most of it) limit?

3 Comments

2025/01/31
05:32 UTC

Prerequisites needed to build STOA ML models.

Hello everyone, I was looking into the prerequisites needed to work on building a ML model that could hypothetical solve STOA problems like ARC, or program synthesis. I have a background in CS (combinatrics,Graph theory, Theory of computation, Algorithms...) yet I feel to be on the cutting edge, ie to learn, understand and build I would need a fair understanding of Math(Algebra, topology, Functional analysis, Algebraic Topology, category theory...). Is this a fair assesment or can I skip the Math part and go directly to building ML models, learn as I go?

0 Comments

2025/01/31
05:31 UTC

Guidance

Hi everyone, I am 2 year student at tier 3 in India . I recently started learning machine learning. Can you provide few tips how to start and where to start learning. Thank you

0 Comments

2025/01/31
04:59 UTC

Are there AI model wizards?

What are recommended apps that make the process of building an AI model simple.

Not looking at how to do it- just if/what apps that exist out there today.

I figured it will be easy enough to record my cat’s meows when I know exactly what she wants at the time (i.e. to go outside standing at door, when she won’t shut the hell up bc she wants wet food, the long meow moans that car the rare times she is in a car, the hello meows when I walk in room).

I’d be interested in training a model using those audios to perhaps provide some insight when I’m not quite sure what the hell she IS meowing about.

Is there an app that make AI dev simple? Like a basic AI model wizard? I’m not tech savvy but if there is an app for it…

1 Comment

2025/01/31
04:48 UTC

Welcome to /r/LearnMachineLearning!

Chatrooms

Wiki

Related Subreddits

Machine Learning Multireddit

477,123 Subscribers

Is the Next World War Going to be Fought Over AI Control?

Tyon: The Next Evolution of AI

Medical imaging AI career

Interactive explanation of ROC AUC score

Fine-tuning Qwen/LLaMA on GSM8K with GRPO (DEEP SEEK R1)

Introducing 'aasetpy'

DeepSeek R1 Theory Overview (GRPO + RL + SFT)

Cross entropy and Validation in small dataset; for a project, two questions

I need help setting up a python environment which can run FAISS

Looking for Raw Email datasets for improving my data cleaning skills

DeepSeek-R1 Free API key using OpenRouter

Harcèlement sexuel chez Sia Partners

𝗣𝗲𝗿𝗰𝗲𝗽𝘁𝗿𝗼𝗻 𝘃𝘀. 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻: 𝗞𝗲𝘆 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 𝗘𝘅𝗽𝗹𝗮𝗶𝗻𝗲𝗱

Fine-Tuning DeepSeek R1 (Reasoning Model)

WARNING: Major Price Increase for Cursor’s Agentic Composer — 25x Hike

ads test scores

The AI Felt Too Real: Are We Ready for This?

Anyone want to learn machine learning together?

AI Marketplace on Web3 – what are your thoughts on this ?

Fast.AI Video Course, Book or Colab book? Alternatives?

Understanding DeepSeek Reasoning Breakthrough

The Multi-Point RL Problem

Post-Training: Large-Scale Reinforcement Learning on the Base Model

Group Relative Policy Optimization (GRPO)

Results

Good Day, Please Suggest Your Little Brother a Good Udemy Course or Free Resource to Learn Machine Learning, I already have a Macbook M2 16GB and Intermediate Level Python Knowledge.

YOLOv10 or 11 in detecting simple shapes

Perplexity Pro 1-Year 100% Off Coupon: $29

Need help fine-tuning a random forest regression model.

how to learn deep learning in deep

out of memory in python/diffusers, but same model fine via webui and swarmui

Prerequisites needed to build STOA ML models.

Guidance

Are there AI model wizards?