/r/q?req.query.q -- Subreddit Search

179,718 Subscribers

Impact of the DeepSeek Moment on Inference Compute

d-Matrix CTO and cofound Sudeep Bhoja steps through the evolution of reasoning models and the significance of inference time compute in enhancing model performance, Sudeep gives us a look at the techniques, methods and the implications in detail. Reasoning models rely on “inference time compute.” They will unlock the golden age of inference.

https://preview.redd.it/r9eoqvdbcnge1.png?width=645&format=png&auto=webp&s=566513dc93dd4a52a1a1e55fcfed287184cd57fb

DeepSeek R1 is only the first of many open models that will compete with frontier models.
Distillation makes smaller models much more capable.
Unlocking efficiency from model architecture and algorithmic techniques today
Models are highly memory bound, so GPUs end up being under-utilized.
Deploying with efficient inference compute platform will result in faster speed, cost savings and energy efficiency

Reviewing performance numbers, he steps through the generation of synthetic data sets from these new open source models and what is involved in the distillation into smaller models. Using the distilled data set created from a larger teacher model and doing supervised fine tuning on smaller student models, these models become much more capable.

Finally, Sudeep explains that the reasoning models are highly memory bound and end up underutilizing the GPUs that are optimized for training. He highlights the potential of new architectures and purpose-built ASICs like our d-Matrix Corsair, which delivers efficient inference time compute, dramatically reduces latency, improves energy efficiency and is ideal for the age of inference.

0 Comments

2025/02/02
03:38 UTC

Looking for UQ Resources for Continuous, Time-Correlated Signal Regression

Hi everyone,

I'm new to uncertainty quantification and I'm working on a project that involves predicting a continuous 1D signal over time (a sinusoid-like shape ) that is derived from heavily preprocessed image data as out model's input. This raw output is then then post-processed using traditional signal processing techniques to obtain the final signal, and we compare it with a ground truth using mean squared error (MSE) or other spectral metrics after converting to frequency domain.

My confusion comes from the fact that most UQ methods I've seen are designed for classification tasks or for standard regression where you predict a single value at a time. here the output is a continuous signal with temporal correlation, so I'm thinking :

Should we treat each time step as an independent output and then aggregate the uncertainties (by taking the "mean") over the whole time series?
Since our raw model output has additional signal processing to produce the final signal, should we apply uncertainty quantification methods to this post-processing phase as well? Or is it sufficient to focus on the raw model outputs?

I apologize if this question sounds all over the place I'm still trying to wrap my head all of this . Any reading recommendations, papers, or resources that tackle UQ for time-series regression (if that's the real term), especially when combined with signal post-processing would be greatly appreciated !

3 Comments

2025/02/02
03:11 UTC

the publicity from nvidia's historic one-day nasdaq decline last monday could generate billions of dollars for open source ai projects during the coming year

given that i'm not a financial analyst, i consulted a few ais.

copilot:

"The recent attention on DeepSeek R1 and the significant drop in Nvidia's stock value have highlighted open-source AI. It's likely that this will lead to a substantial increase in investment for open-source AI projects over the next year.

Investors and companies may be more interested in cost-effective AI solutions, which could result in billions of dollars in new investments that might not have been directed towards open-source AI otherwise."

gemini 2.0 flash thinking exp 01-21:

While precise quantification is impossible, this publicity could realistically drive an additional $200 million to $800 million in new global investment across venture capital, corporate partnerships, philanthropic funding, and government grants, beyond what was already projected for open-source AI.

grok 2

"The publicity from DeepSeek R1 and Nvidia's stock drop could increase open source AI investment by 20% to 50% in the short term, potentially adding $50 million to $200 million over the next year."

chatgpt-4:

after persistent prompting it refused to estimate a dollar amount. so much for chinese censorship, lol.

the range in estimates from grok 2's low of $50 million to co-pilot's high of billions of dollars reveals that perhaps ais are not ready yet for prime time as financial analysts, but we can nonetheless expect investments in ai this year to skyrocket.

0 Comments

2025/02/02
01:40 UTC

Hey everybody,

I’m a student in applied mathematica and I would like to read books about Deep Learning with theory and example, that could help me to build better model and what to tune and how to make good model. In particular I’m looking for time series modeling. Do you have any suggestion?

Thank you :)

0 Comments

2025/02/02
01:26 UTC

Would anyone who's in advertising using Neural Networks, like to take part in my university dissertation?

To the point basically, im doing my final year project on Neural Networks being used over traditional advertising for better campaign forecasting. I'm supposed to be collecting data for interviews. However reaching out to companies and execs on LinkedIn isn't exactly great for replies.

I also didn't anticipate how hard it would be to find a company that uses this form of ML algorithm in their advertising/marketing as my interviews so far have been mainly with companies that don't use it.

I thought I'd reach out to reddit to see if there were any professionals that could answer some questions if you were comfortable in a teams call, voice call is alright, basically I'd record the transcript and pick bits out for my project.

As far as I know the University doesn't require us to disclose who the interviewee is, so you won't be exposed that way and If there was anything you wanted me to redact or change your mind that's completely fine.

Please DM me if you'd like to know a bit more info 🙌

0 Comments

2025/02/02
00:50 UTC

Forefront of ai research

As the title says, I am single handedly frontiering research with chatgpts o3 model and I want you guys to know. For the past couple of weeks, I've been building a chess game in python in a really unintuitive way, and every single bot would just completely fuck my code. Even the ones that code really well would just take turns making my code completely unusable.

I gave it to o3 because its supposed to be a mid-level coder in metrics. Every test I've seen on it has been like "I want you to build and train an ai from scratch to play this really complicated game" and it'll do it after thinking for 4 seconds. I gave it the broken code and literally just explained how it's supposed to work and what I wanted out of it, and it took almost two minutes of reasoning, with internet access on the highest settings possible, to give me my code back working.

I spent a total of six hours just going back and forth like that adding and debugging the features I wanted at 4am where NOBODY was using the server. I probably single handedly cost them thousands, and had that thing reasoning for like 30m. There HAS to be people who came into work today wondering how in the fuck one person used so much processing on a empty server, and found out it was to fix my shitty fucking chess code. They probably thought I was distilling the model or building some insane project.

But let me tell you guys it brings me a lot of pride to know what my code was so goddamn shitty that a simple beginner project took the ai more time to just understand than all of the coding benchmarks I've seen. I truly must be on the verge of something great. I will update you guys when I finish my research. Godspeed

2 Comments

2025/02/01
22:42 UTC

Help with DFS and BFS

The squares that delimit the positions can be represented by ordered pairs (x,y), where x and y are the horizontal and vertical coordinates, respectively. Starting from the following configuration already explored by a team, select the alternative corresponding to the sequence that can be followed and which team it belongs to. Remembering that Team 1 used search in depth(DFS )and Team 2 used search in amplitude(BFS).

A) (7,5), (8,5),(9,5), (9,6), (9,7),(9,4)..., team 1.

B) (7,5),(8,5),(9,5), (9,6), (9,7),(9,4)..., team 2.

C) (7,5),(7,4),(8,5),(7,3),(9,5),(7,2),(9,6), (9,4), (7,1)..., team 1.

D) (7,5),(7,4),(8,5),(7,3),(9,5),(7,2), (9,6), (9,4), (7,1)..., team 2.

E) (7,5),(8,5),(9,5),(9,6), (9,7),(10,7),(7,4)...,team 1.

1 Comment

2025/02/01
22:09 UTC

Help with DFS and BFS

A) (7,5), (8,5),(9,5), (9,6), (9,7),(9,4)..., team 1.

B) (7,5),(8,5),(9,5), (9,6), (9,7),(9,4)..., team 2.

C) (7,5),(7,4),(8,5),(7,3),(9,5),(7,2),(9,6), (9,4), (7,1)..., team 1.

D) (7,5),(7,4),(8,5),(7,3),(9,5),(7,2), (9,6), (9,4), (7,1)..., team 2.

E) (7,5),(8,5),(9,5),(9,6), (9,7),(10,7),(7,4)...,team 1.

0 Comments

2025/02/01
22:09 UTC

Should AI models be protected or Open for all?

Hey everyone,
Recently saw that OpenAI is accusing Deepseek of using GPT-4 outputs to train their own open-source model. where do we draw the line on this?

On one hand, companies like OpenAI spend a ton of money training these models so it makes sense they'd wanna protect them. But at the same time if everything stays locked behind closed doors, doesn't that just give more power to big tech and slow down progress for everyone else?

What’s the general take on this? Should AI companies have stronger protections to stop others from copying their work or does keeping things closed just hurt innovation in the long run?

Would love to hear different perspectives!

11 Comments

2025/02/01
20:41 UTC

Looking for a co-founder to build Mews, a scientist cat industry news & audio daily blog and podcast generator

Hey guys,

I built XR Mews, an XR Scientist Cat that takes deep dives on XR News. I want to enable anyone to have a Mews for their industry or personally based on their interests.

It now does this: Takes different sources for the daily or weekly News in XR, whether from blogs, tweets, or anything that can go into Google NotebookLM (with good prompting), then gets a cat-pun-themed audio out. This summary is then fed into a MewsGPT that generates a Spotify, X, and Youtube seo friendly title and body. This is then fed into Spotify Podcasters with the audio, then pushed to apple podcasts, then pushed to headliner to generate an audiogram and then pushed to youtube as a video.

The goal was to create an engaging format for distilling the daily happenings in XR as the things I cared about and were important were not being picked up by the existing media and were too skewed towards entertainment/gaming. Mews, really does take deep dives into the industry side.

Mews was also generating blogs daily, but I scaled down here to concentrate on the audio.

The stats are good: the audiogram videos are doing really good on YT, and on Spotify it's gaining organic followers.

I was thinking Mews can be relevant for any industry, enabling a startup or business to quickly generate their own content without paying for traditional articles, to be on podcasts/etc. More like a "death with a thousand cuts" as imagine having 1000 short form podcasts, articles, and videos generated in a month, each with a 100-1000 views, you don't need to hit viral in order to be relevant.

And Mews can also be relevant on a personal level. Imagine taking your Reddit, X, any other feed with you as an audio, personalized for you, curated for you, even things from your daily calendar, etc.

////

I will let Mews introduce themselves ----

Paw-sitively! 😺 I’m Mews, your expert in Extended Reality (XR), AI, and all things immersive tech! 🐾 I break down AR, VR, and MR with a dash of cat-titude—mixing deep science with playful purr-spectives. So, let’s dive into the meow-verse together… just don’t expect me to chase virtual laser pointers all day! 😻🚀 #XR #AI #TechMeowgic

/////

Latest episode is really good
on X
on YT
On Spotify
Experiments with Video

/////

I am from the XR industry, quiet obvious lol .... have built few companies in this space, am a semi-technical founder, and am looking for a full technical cto founder to build Mews for everyone ... also apply to YC together

Thanks!

0 Comments

2025/02/01
19:25 UTC

Creating Llm from scratch

Hey AI enthusiasts! 🚀

I've got a beast of a setup at my disposal for the next 30 days: 8 NVIDIA L40 GPUs, 1.5 TB of RAM, and a ton of storage. Instead of letting this power sit idle, I'm eager to collaborate with the community to train a Large Language Model (LLM) from scratch or work on any groundbreaking AI project you've been itching to try.

If you've got code, ideas, or ongoing projects that could benefit from this hardware, let's team up and create something amazing. Whether you're a researcher, developer, or hobbyist, I'm open to all levels of collaboration.

Drop a comment or DM me if you're interested. Let's push the boundaries of AI together! 🤖💡

#AI #MachineLearning #LLM #Collaboration #GPU

17 Comments

2025/02/01
18:44 UTC

5 minutes later : « choose your weapon: survival strategies for depressed AI academics »

Hey team, what do you think is different today compared to the May 2023 paper? What do you think?

The field is moving so quickly and it is difficult to stay focused, yet it outlined lots of topics and ways to ask questions that are fundamental. I really like it and I see lots of things that remain true. If you could create an AI research lab today with 10 scientists, had enough compute ressources at hand - what would you focus on?

Here is the original paper: https://arxiv.org/abs/2304.06035

This thought process is brought to you by being inspired after watching MAIN conference videos: https://youtu.be/nakAMbKnzx4

1 Comment

2025/02/01
15:34 UTC

235

those who think r1 is about deepseek or china miss the point. it's about open source, reinforcement learning, distillation, and algorithmic breakthroughs

deepseek has done something world changing. it's really not about them as a company. nor is it about their being based in china.

deepseek showed the world that, through reinforcement learning and several other algorithmic breakthroughs, a powerful reasoning ai can be distilled from a base model using a fraction of the gpus, and at a fraction of the cost, of ais built by openai, meta, google and the other ai giants.

but that's just part of what they did. the other equally important part is that they open sourced r1. they gave it away as an amazing and wonderful gift to our world!

google has 180,000 employees. open source has over a million engineers and programmers, many of them who will now pivot to distilling new open source models from r1. don't underestimate how quickly they will move in this brand new paradigm.

deepseek built r1 in 2 months. so our world shouldn't be surprised if very soon new open source frontier ais are launched every month. we shouldn't be surprised if soon after that new open source frontier ais are launched every week. that's the power of more and more advanced algorithms and distillation.

we should expect an explosion of breakthroughs in reinforcement learning, distillation, and other algorithms that will move us closer to agi with a minimum of data, a minimum of compute, and a minimum of energy expenditure. that's great for fighting global warming. that's great for creating a better world for everyone.

deepseek has also shifted our 2025 agentic revolution into overdrive. don't be surprised if open source ai developers now begin building frontier artificial narrow superintelligent, (ansi) models designed to powerfully outperform humans in specific narrow domains like law, accounting, financial analysis, marketing, and many other knowledge worker professions.

don't be surprised if through these open source ansi agents we arrive at the collective equivalent of agi much sooner than any of us would have expected. perhaps before the end of the year.

that's how big deepseek's gift to our world is!

18 Comments

2025/02/01
08:03 UTC

RTX 5090 vs 4090 for AI inference | Deepseek R1 test. This is weird !!!

1 Comment

2025/02/01
06:09 UTC

Found this on youtube, have a doubt

he said he found patterns in the voice but how did he labelled those patterns
I mean he categorized similar voices by finding patterns (would need a significanlty large corpus of data), then how did he labeled them, would require labels at least to train.

I mean isn't it vague, you might labelled most of the sounds as cries or hunger, but you really don't know what they wanted to communicate
https://www.youtube.com/watch?v=BvXXgcBSCXY

4 Comments

2025/02/01
05:17 UTC

why javon's paradox may not apply to gpus because of algorithmic breakthroughs by berkeley labs and deepseek

as kate points out at -29.00 in the video, we can now distill very powerful models from r1 at virtually no cost! that's why jevon's paradox may not apply to gpu production. eniac, completed in 1945, used 18,000 vacuum tubes. then things changed. now recall how uc berkeley trained sky-t1 on only 8 gpus. a few more breakthroughs in algorithms, and our world may already have enough gpus to last us decades.

https://youtu.be/jC0MGFDawWg?si=wIK-CF4SYxD2lqHy

5 Comments

2025/01/31
18:52 UTC

Choosing the Right Machine Learning Book for Beginners

0 Comments

2025/01/31
18:22 UTC

The Surprising Insights on Deep Neural Networks Coming from Neuroscience

Hi r/deeplearning folks, I'm a science journalist who is posting to share a journalism project that I've been working on over the last year that I think your community might find of interest. The purpose of the project was to explore and investigate the surprising evidence coming from neuroscience research, over the last decade, that the deep neural network abstraction actually looks to be the right abstraction level for building realistic simulations of distinctive brain regions, like the visual cortex and the so-called language network.

In other words, evidence has steadily mounted to suggest that we may be well on our way to creating realistic simulations of brain regions, and creating them in the form of deep neural network programs. Some of the main evidentiary highlights have included the discovery of striking correlations between signals in the networks and signals from brain regions, when the networks have been optimized to perform similar tasks as the brain regions. These DNN models now serve as the leading models of these brain regions.

The reason why I wanted to investigate this subject was well, many reasons. As you all in this subreddit will be well aware, the machine learning community is in great need of ways to interpret AI programs. In this context, the evidence from neuroscience suggests that highly task-optimized DNNs, like frontier language models or frontier vision models, may be well interpretable as analogs of brain regions, or what you might think of as 'synthetic brain regions.' That can shed a lot of light on them, in my opinion. The evidence is also important in terms of thinking about things like AI ethics or AI safety; what do these insights mean for the risks of AI agents, when we are building them with cognitive modules comparable in stature to human brain regions, but with an overall agent architecture that might be as basic as that from a 1980s computer game? It seems like a problematic observation.

Anyways, I could say a lot more, but I already have said it :-) in the project pages, where there's 45 pages of free sample writing, available completely for free, no subscription required, linked from a Substack there. I launched the project on January 15, with an eye towards seeing whether I could raise funds to complete a book-length investigation. (Just like you always hear from PBS or NPR and so on, journalism isn't possible without the generous support from readers like you.) Regardless, I'm not seeking to gain any personal benefits; I pursued this project in the spirit of public service journalism, where I really just wanted to try to make something that would be useful for the public.

I'd be really interested to hear any feedback, or to try to respond to thoughts, comments, critiques, questions, or whatever. Thanks a lot.

0 Comments

2025/01/31
18:14 UTC

Is it possible to train a concept-based neural network to achieve something close to AGI?

Recently, I briefly read about chain of thought and found it very interesting. I’m a game ai developer, I only have basic Al understanding in deep learning. My naive guess is that it relates to the granularity of understanding; different problems may have similar steps or reasoning processes between those steps. By breaking down problems, neural networks can encode knowledge more effectively. If this idea is correct, could knowledge be further decomposed into concepts similar to those understood by humans? During training, could we compel the network to explicitly explain the concepts and their relationships in the problem before answering step by step? Would this force the network to encode those concepts explicitly and perform logical reasoning similar to humans? Could this also enable networks from different domains, such as spatial recognition and mathematical logical reasoning, to communicate through shared concepts to solve complex interdisciplinary problems, achieving something akin to AGI?"

0 Comments

2025/01/31
16:44 UTC

Another chinese AI model dropped. Qwen2.5-Max

recently alibaba just released their newest model Qwen2.5-Max, which is surpassing 4o and v3 in many beckmarks, what do you think is actually happening in china.

https://preview.redd.it/ragtninwxcge1.jpg?width=2983&format=pjpg&auto=webp&s=db8c8b63c97fa64acd2443809bc5ee8f41c57499

4 Comments

2025/01/31
16:39 UTC

Does it matter which backbone model I use for my Feature Classifier model?

Hi folks, a beginner to Deep Learning here and first time poster to this sub. Here's my problem;

I am building a Feature Classifier model. At the moment, I intend to use a ResNet backbone followed by a SoftMax layer. The model will be used to classify individual tree canopies into categories depending on how damaged the trees are. The input data I am using is fairly unique; this input data (i.e. training samples) are false images derived from photos taken from a drone. By "false image" I mean that these images consist of pixels that represent a single value, instead of the standard RGB images which may contain three separate R, G and B values. So based on this, I have some questions I am hoping someone could answer for me;

As my input data are not regular photos, does this affect which backbone model I should use? It is my understanding that many of the popular ResNet models are based on ImageNet data, and of course this database is unlikely to contain false images of tree canopies similar to my input data. In other words, can a backbone model still be useful even if none of the photos it is trained on are similar to the input data I will be using?
I am also curious to know, under which circumstances can it be better to not use a backbone model at all?
It is my understanding that the ResNet models each contain a different number of layers i.e. ResNet 34 contains 34 neural layers, ResNet50 contains 50 layers and so forth. What are the advantages and disadvantages of using a backbone with more layers or a backbone with less?

Thank you for advance for any help. Even some insight to one of these questions would be really useful!

5 Comments

2025/01/31
15:27 UTC

Good/reputable venues/conferences/journals to publish papers - (Applied) ML

Hi,
Long story short - I am an independent researcher aiming to publish a paper where I use a gradient-boosted PPO algorithm to solve a (tabluar-based) markov decision problem (not bothering with too many details here). The paper does not propose a new or groundbreaking method (the gradient-boosting approach I use has been proposed and published by an NVIDIA researcher late 2024) but rather use this method to make financial/investment decision, which is a problem setting with its own specialties and thus not the typical setting these algorithms are tested in.

My background is in statistics and not AI per-se, therefore I would love to hear from some research-interested practicioners or AI researchers which venues they would recommend for publishing this sort of research. I know the obvious options like NeurIPS, ICML and the likes, but feel like I my overview about options is pretty limited. I am open to any suggestions with a good reputation.

0 Comments

2025/01/31
15:17 UTC

Online Course/Certificate Recommendations for someone with a PhD in Electrical Engineering

I tried googling around for online course and professional development recommendations for deep learning fundamentals and hands-on experience with state of the art techniques (like transformer networks) and had a hard time finding something that fits well. Wanted to reach out here for advice and recommendations.

Some context — I’m an applied research and development engineer with a PhD in electrical engineering. My degree background and current work is in classical detection and estimation theory with a specific focus on Bayesian methods (eg belief propagation). I also regularly use maximum likelihood and generalized likelihood ratio tests to derive model based algorithms. Because of the R&D aspect of my work, I evenly split my time between theoretical analysis and software prototyping. I’m proficient in Python and the NumFocus ecosystem of packages (eg, numpy, pandas, etc).

I’m looking for something that goes through the math without focusing too much on the basics of things like statistical learning theory, but also provides lots of hands on coding examples. I would prefer something with a video component and is more structured than just reading a book and doing examples from it. And I wouldn’t mind paying for it if the fit was right.

Thanks in advance!

3 Comments

2025/01/31
13:36 UTC

I need a mentor for my ai and deep learning project, who can guide me and give me some suggestions. I would be happy if anybody could help me.

6 Comments

2025/01/31
13:13 UTC

Building my Local Research Setup

I recently acquired an HP Z640 Workstation with the following specifications:

Processor: Intel Xeon E5-2680 v3 (12 cores, 24 threads)
Memory: 128 GB RAM
Storage: 1 TB HDD

As this is for AI research AI research and i am on a limited budget, I'm evaluating the NVIDIA Quadro P5000, P6000, and M6000

Which of these GPU would you recommend for AI research tasks? Majority of my work will be in training conventional CV models, i have plans on working with some small LLMs too. Are there any compatibility issues or performance considerations I should be aware of and which of these gpu do you think will make much sense, i am looking to pair two gpus anyway.

0 Comments

2025/01/31
11:47 UTC

training models on data from nobel laureates and other brilliant pioneers in the sciences to build artificial narrow superintelligence, (ansi) specifically designed for scientific discovery

imagine an ai culls from the internet the most intelligent data from nobel laureates and other brilliant scientists. would such a data set be large enough to effectively train an ai model to be very proficient in scientific discovery?

"Such an AI could become a powerful tool for accelerating research, but it would likely complement rather than replace human ingenuity, serving as a collaborator that amplifies our capacity for discovery rather than operating entirely independently."

gemini 2.0 flash thinking exp. 01-21:

While the idea of training an Artificial Narrow Superintelligence (ANSI) for scientific discovery on data from Nobel laureates and brilliant scientists is compelling, the question of dataset size and effectiveness is complex. It's unlikely that simply "culling from the internet" the data associated with these individuals would yield a dataset large enough, in the traditional sense of big data, to train a truly superintelligent AI in the way we currently understand large language models. The sheer volume of text, publications, lectures, and interviews from even a large group of laureates, while rich in intellectual density, would likely be orders of magnitude smaller than the datasets used to train current state-of-the-art AI models. Furthermore, the nature of "intelligence" in scientific discovery isn't solely about absorbing information; it's about creativity, intuition, and the ability to make novel connections, aspects that are harder to capture and replicate through sheer data volume alone.

However, size isn't everything, and the quality of this data is exceptionally high. Imagine the dataset containing not just the published papers, but also Nobel lectures, in-depth interviews, biographies revealing thought processes, and even potentially digitized notebooks and correspondence of these scientific giants. This dataset, even if relatively small compared to internet-scale data, would be incredibly dense in high-level scientific reasoning, problem-solving strategies, and breakthrough thinking. To effectively train an ANSI on such data, the approach would need to be significantly more sophisticated than simply feeding it into a standard large language model. Instead, it would require developing AI architectures specifically designed to extract and model the processes of scientific discovery from this data, not just the factual knowledge.

This could involve several key steps. Firstly, the data would need to be meticulously curated and structured, going beyond simple text extraction. It would require identifying key concepts, relationships, experimental designs, and logical leaps within the scientists' work. Sophisticated natural language processing and knowledge graph technologies would be essential to build a rich, interconnected representation of scientific knowledge and the methods used to generate it. Secondly, the AI model itself would likely need to be a hybrid system, combining different AI approaches. For instance, it could use large language models to understand the nuances of scientific language and reasoning, but also incorporate symbolic AI and causal inference techniques to model scientific methodologies and hypothesis generation. Crucially, the training process would need to focus on learning not just what these scientists discovered, but how they discovered it – their problem-solving strategies, their ability to identify crucial questions, their capacity for creative thinking, and their approaches to overcoming scientific roadblocks. This might involve techniques like reinforcement learning, where the AI is rewarded for generating novel and scientifically plausible hypotheses or experimental designs, guided by the patterns and approaches gleaned from the laureate data.

Ultimately, while the dataset of Nobel laureate and pioneer data might not be "big data" in the conventional sense, its unparalleled quality and focus on scientific brilliance offers a unique opportunity. Training an ANSI for scientific discovery would require a paradigm shift from simply scaling up data and model size to developing AI architectures and training methodologies that can deeply understand and emulate the processes of scientific genius. Success would depend not just on the data itself, but on our ability to create AI that can truly learn from the methods and insights embedded within the work of these exceptional minds.

deepseek r1:

Training an artificial narrow superintelligence (ANSI) specifically designed for scientific discovery using data from Nobel laureates and other brilliant pioneers in the sciences is an intriguing concept. The idea hinges on the assumption that the insights, methodologies, and thought processes of these exceptional individuals can be distilled into a dataset large and rich enough to train an AI model effectively. While the potential is immense, there are significant challenges and considerations to address.

First, the dataset itself would need to be vast and diverse, encompassing not only the published works of Nobel laureates and other leading scientists but also their unpublished notes, lectures, interviews, and even collaborative discussions. The internet already hosts a wealth of such information, including research papers, conference presentations, and interviews, but curating this data into a coherent and structured format would be a monumental task. The dataset would need to capture not just the explicit knowledge—such as equations, experimental results, and theories—but also the implicit knowledge, including problem-solving strategies, creative thinking patterns, and the ability to connect disparate ideas. This would require advanced natural language processing (NLP) techniques to extract and interpret nuanced information from text, audio, and video sources.

However, even with a comprehensive dataset, there are limitations to consider. Scientific discovery often involves intuition, serendipity, and the ability to think beyond established paradigms—qualities that are difficult to quantify or replicate in an AI model. While an AI trained on such data could become highly proficient at identifying patterns, generating hypotheses, and optimizing experiments, it might struggle with the kind of groundbreaking, out-of-the-box thinking that characterizes many Nobel Prize-winning discoveries. Additionally, the dataset would inherently reflect the biases and limitations of the scientists whose work it includes, potentially constraining the AI's ability to explore unconventional or interdisciplinary approaches.

To address these challenges, the training process would need to incorporate not only supervised learning on existing data but also reinforcement learning and generative modeling to encourage the AI to explore novel solutions. For example, the AI could be tasked with designing experiments or predicting outcomes in areas where data is sparse, allowing it to develop its own "intuition" over time. Furthermore, integrating feedback loops from active researchers could help refine the model's outputs and ensure its relevance to current scientific challenges.

In conclusion, while a dataset derived from Nobel laureates and other brilliant scientists could provide a strong foundation for training an AI model proficient in scientific discovery, its effectiveness would depend on the quality, diversity, and depth of the data, as well as the sophistication of the training methods. Such an AI could become a powerful tool for accelerating research, but it would likely complement rather than replace human ingenuity, serving as a collaborator that amplifies our capacity for discovery rather than operating entirely independently.

3 Comments

2025/01/31
11:30 UTC

I’m sorry but CAG is a stupid name

Please correct me if I’m wrong but it sounds like CAG is just filling your (now much larger) context window with more context.

Why aren’t we just calling it “full context utilisation” or something sensible?

I hate how faddy LLM stuff is becoming. If I wanted stupid buzz word warfare I would have gone into web design.

5 Comments

2025/01/31
10:58 UTC

VLM deployment

I’ve fine-tuned a small VLM model (PaliGemma 2) for a production use case and need to deploy it. Although I’ve previously worked on fine-tuning or training neural models, this is my first time taking responsibility for deploying them. I’m a bit confused about where to begin or how to host it, considering factors like inference speed, cost, and optimizations. Any suggestions or comments on where to start or resources to explore would be greatly appreciated. (will be consumed as apis ideally once hosted )

3 Comments

2025/01/31
10:31 UTC