/r/MachineLearning

2,886,861 Subscribers

51

[D] Real talk about RAG

Let’s be honest here. I know we all have to deal with these managers/directors/CXOs that come up with amazing idea to talk with the company data and documents.

But… has anyone actually done something truly useful? If so, how was its usefulness measured?

I have a feeling that we are being fooled by some very elaborate bs as the LLM can always generate something that sounds sensible in a way. But is it useful?

54 Comments
2024/04/27
18:00 UTC

2

[P] BLEU scores not improving

I am working on this Image Captioning project using CNN+LSTM. Currently I am using googleNet as CNN and lstm, bilstm as RNN. For embedding I am using word2vec algorithm. Dataset used is flickr8k. As you can see from the blue scores, they are very close i.e no noticeable improvement. Parameter values in both of these approaches are same. The values are as follows:

word_embedding_dimension : 128
lstm_hidden_state_dimension : 32
lstm_number_of_layers : 2
dropout_proportion: 0.5

I am training my model for 500 epochs, batch size of 16 and learning rate as 0.0003.
Here is my architecture for both approaches:

class BiLSTM_fixed_embedding(torch.nn.Module):
    def __init__(self, embedding_dim, lstm_hidden_dim,
                 num_lstm_layers, image_latent_dim,
                 vocab_size,
                 dropoutProportion=0.52):
        super(BiLSTM_fixed_embedding, self).__init__()
        self.embedding_dim = embedding_dim
        self.lstm = torch.nn.LSTM(embedding_dim, lstm_hidden_dim, num_lstm_layers, batch_first=True, bidirectional=True)  # Bidirectional LSTM
        self.dropout = torch.nn.Dropout(dropoutProportion)
        self.linear = torch.nn.Linear(lstm_hidden_dim * 2 + image_latent_dim, vocab_size)  # Adjusting linear layer input size for bidirectional LSTM

    def forward(self, image_latentTsr, embeddedChoppedDescriptionTsr):
        aggregated_h, (ht, ct) = self.lstm(embeddedChoppedDescriptionTsr)
        # Concatenating the final hidden states from both directions
        concat_latent = torch.cat((torch.nn.functional.normalize(ht[-2,:,:]), torch.nn.functional.normalize(ht[-1,:,:]), torch.nn.functional.normalize(image_latentTsr)), dim=1)  # Using ht[-2,:,:] and ht[-1,:,:] for bidirectional LSTM
        outputTsr = self.linear(self.dropout(concat_latent))
        return outputTsr




class LSTM_fixed_embedding(torch.nn.Module):
    def __init__(self, embedding_dim, lstm_hidden_dim,
                 num_lstm_layers, image_latent_dim,
                 vocab_size,
                 dropoutProportion=0.5):
        super(LSTM_fixed_embedding, self).__init__()
        self.embedding_dim = embedding_dim
        self.lstm = torch.nn.LSTM(embedding_dim, lstm_hidden_dim, num_lstm_layers, batch_first=True)
        self.dropout = torch.nn.Dropout(dropoutProportion)
        self.linear = torch.nn.Linear(lstm_hidden_dim + image_latent_dim, vocab_size)

    def forward(self, image_latentTsr, embeddedChoppedDescriptionTsr):
        aggregated_h, (ht, ct) = self.lstm(embeddedChoppedDescriptionTsr)
        concat_latent = torch.cat( (torch.nn.functional.normalize(ht[-1]), torch.nn.functional.normalize(image_latentTsr)), dim=1)
        outputTsr = self.linear(self.dropout(concat_latent))
        return outputTsr



BLEU scores for lstm+googleNet

Average BLEU-1 score: 0.5125574933996119
Average BLEU-2 score: 0.3231883563312253
Average BLEU-3 score: 0.13785811579538154
Average BLEU-4 score: 0.04515516928289819



BLEU scores for bilstm+googleNet

Average BLEU-1 score: 0.5202970280329978
Average BLEU-2 score: 0.3236744954911404
Average BLEU-3 score: 0.1341793193712665
Average BLEU-4 score: 0.050970741654506005

Now I have two questions:

i. What can be the reasons of not getting improved scores, and how can I improve it.
ii. I am also planning to employ resNet50, which have 2048 latent representations i.e twice of googleNet. What changes in parameter is needed for this approach.

I am really grateful to you all for helping me out.

8 Comments
2024/04/27
16:21 UTC

2

[D] But what does a trained Convolution Neural Network actually learn? Visualized!

Sharing a video from my YT channel explaining convolution and visualizing how kernels are learnt… enjoy!

0 Comments
2024/04/27
14:14 UTC

9

[P] Classification finetuning experiments on small GPT-2 sized LLMs

I ran a few classification finetuning experiments on relatively "small" experiments that I found interesting and wanted to share:

ModelWeightsTrainable tokenTrainable layersContext lengthCPU/GPUTraining timeTraining accValidation accTest acc
1gpt2-small (124M)pretrainedlastlast_blocklongest train ex. (120)V1000.39 min96.63%97.99%
2gpt2-small (124M)pretrainedfirstlast_blocklongest train ex. (120)V1000.37 min78.46%80.54%
3gpt2-small (124M)pretrainedlastlast_layerlongest train ex. (120)V1000.33 min78.65%87.25%
4gpt2-small (124M)pretrainedlastalllongest train ex. (120)V1000.94 min99.62%96.64%
5gpt2-medium (355M)pretrainedlastlast_blocklongest train ex. (120)V1000.91 min87.50%51.01%
6gpt2-large (774M)pretrainedlastlast_blocklongest train ex. (120)V1001.91 min99.52%98.66%
7gpt2-small (124M)randomlastalllongest train ex. (120)V1000.93 min100%97.32%
8gpt2-small (124M)pretrainedlastlast_blockcontext length (1024)V1003.24 min83.08%87.92%
  1. Training the Last vs. First Output Token (row 1 vs 2): Training the last output token results in significantly better performance compared to the first. This improvement is expected due to the causal self-attention mask.
  2. Training the Last Transformer Block vs. Last Layer (row 1 vs 3): Training the entire last transformer block is much more effective than training only the last layer.
  3. Training All Layers vs. Last Transformer Block (row 1 vs 4): Training all layers shows a modest improvement of 2% over just training the last transformer block, but it requires almost three times longer in terms of training duration.
  4. Using Larger Pretrained Models (row 1 vs 5, and row 1 vs 6): Employing a 3x larger pretrained model leads to worse results. However, using a 5x larger model improves performance compared to the initial model, as was anticipated.
  5. Using a Model with Random Weights vs. Pretrained Weights (row 1 vs 7): Utilizing a model with random weights yields results that are only slightly worse by 1.3% compared to using pretrained
  6. Padding Input to Full Context Length vs. Longest Training Example (row 1 vs 8): Padding the input to the full supported context length results in significantly worse

 

If you want to run these experiments yourself or try additional ones, here's a link to the code on GitHub: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch06/02_bonus_additional-experiments

0 Comments
2024/04/27
12:31 UTC

1

[R] Transfer learning in environmental data-driven models

Brand new paper published in Environmental Modelling & Software. We investigate the possibility of training a model in a data-rich site and reusing it without retraining or tuning in a new (data-scarce) site. The concepts of transferability matrix and transferability indicators have been introduced. Check out more here: https://www.researchgate.net/publication/380113869_Transfer_learning_in_environmental_data-driven_models_A_study_of_ozone_forecast_in_the_Alpine_region

0 Comments
2024/04/27
12:15 UTC

83

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain

Open Source Strikes Again, We are thrilled to announce the release of OpenBioLLM-Llama3-70B & 8B. These models outperform industry giants like Openai’s GPT-4, Google’s Gemini, Meditron-70B, Google’s Med-PaLM-1, and Med-PaLM-2 in the biomedical domain, setting a new state-of-the-art for models of their size. The most capable openly available Medical-domain LLMs to date! 🩺💊🧬

https://preview.redd.it/2h4ebhftf0xc1.png?width=2514&format=png&auto=webp&s=bbc3a583d45fb37b87a6fbbabe2d9e0f23c75d8b

🔥 OpenBioLLM-70B delivers SOTA performance, while the OpenBioLLM-8B model even surpasses GPT-3.5 and Meditron-70B!

The models underwent a rigorous two-phase fine-tuning process using the LLama-3 70B & 8B models as the base and leveraging Direct Preference Optimization (DPO) for optimal performance. 🧠

https://preview.redd.it/w41pv7mwf0xc1.png?width=5760&format=png&auto=webp&s=f3143919ef8472961f329bb8eb98937d8f8e41e0

Results are available at Open Medical-LLM Leaderboard: https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard

Over ~4 months, we meticulously curated a diverse custom dataset, collaborating with medical experts to ensure the highest quality. The dataset spans 3k healthcare topics and 10+ medical subjects. 📚 OpenBioLLM-70B's remarkable performance is evident across 9 diverse biomedical datasets, achieving an impressive average score of 86.06% despite its smaller parameter count compared to GPT-4 & Med-PaLM. 📈

https://preview.redd.it/5ff2k9szf0xc1.png?width=5040&format=png&auto=webp&s=15dc4aa948f2608717f68ddf2cb27a6a2de03496

You can download the models directly from Huggingface today.

This release is just the beginning! In the coming months, we'll introduce

  • Expanded medical domain coverage,
  • Longer context windows,
  • Better benchmarks, and
  • Multimodal capabilities.

More details can be found here: https://twitter.com/aadityaura/status/1783662626901528803 Over the next few months, Multimodal will be made available for various medical and legal benchmarks.

I hope it's useful in your research 🔬 Have a wonderful weekend, everyone! 😊

9 Comments
2024/04/27
11:51 UTC

48

How do I convince my superior to do data preprocessing? [D]

How do I convince my superior to do data preprocessing?

Hello, I’m working as an AI Engineer for a year at my current company (got masters in cs with data science specialization). We want to build chatbots specialized on chit chat (mostly conversational chats) in specific languages.

The problem is that I’m not agreeing with my superior‘s approach to do things. Its almost always doing prompt engineering. I mean we have tons of data (I would say infinite of real time conversational chat sessions with information like interests, appearance, etc…, the dream of all data scientists to build a nice model). Why I am disagreeing with his approach is, with prompt engineering we can’t always get constant good results. Also for a specific domain (for example erotic chat) you can’t prompt engineering due to censorship of models. Or hallucinations and other problems when the model isn’t trained on domain specific tokens/words. At the end it’s all about statistics, isn’t it? The model learns from the data which is used. If there is a token during the inference, which is not covered in the trainingdata, then it would make a guess with probability to predict the most likely next token.

I can’t understand why we don’t make use of the data to clean it up, create a super good dataset for our purpose/domain and finetune the LLM. I have asked him a lot why don’t we just do it and my superior has responded: „we did it in the past and the cost was too much with bad results“. So I ask him, who did it? He told me, my colleague did it (educational background in medicine, is interested in AI in his free time, but he has no idea of data processing or fundamentals of data science).

So their last try was 3 years ago (they did it with deepspeed without the Lora approach, so my superior told me that the cost was pretty high but the result was not good (they finetuned in a cloud for 200h), so that was a full parameter finetune)

Tbh I don’t blame my colleague. He tried his best with his knowledge. But I do blame my dumb superior that we don’t have much success to develope a decent model for our purpose.

So half a year after I‘ve started to work for my company I finally could convince my superior (because I did a finetune in my free time just for fun and showed them my results). So he agree, that we can do a finetune with lora but.. BUT.. NO DATA PROCESSING, JUST TAKE IT RAW BABY!!

Seriously, that guy is totally lost, btw he is our product manager and has no idea about data science. He did the same mistake again with no data processing because „wE dONt hAVE the rESOurCE foR tHat“ and I can’t even convince him.

So at the end, the chatbot becomes a bit better then just doing prompt engineering but for me it’s still crap. I just want a real and standard workflow with data preprocessing, training, evaluation. That’s all. Most important: DATA PREPROCESSING

So what do you guys think? Am I the monkey? Should I leave the company soon? I need to stay there at least for 1 more year.

33 Comments
2024/04/27
11:43 UTC

17

[D] Mathematical aspects of tokenization

I recently made a video covering our recent work on the mathematical aspects of tokenization, specifically:

  • formalization of tokenization as compression
  • bounds of Byte-Pair Encoding optimality
  • link between tokenization entropy and performance

I'd be very grateful for any feedback as I'm still learning how to make educational videos. Thank you!

https://youtu.be/yeEZpf4BlDA

0 Comments
2024/04/27
09:06 UTC

7

[R] Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Paper: https://arxiv.org/abs/2403.14608

Abstract:

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities.

Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adapt the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design.
In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to the algorithmic perspective, we overview various real-world system designs to investigate the implementation costs associated with different PEFT algorithms. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed insights into recent advancements and practical applications.

0 Comments
2024/04/27
07:42 UTC

1

[D]Can DDPG solve high dimensional environments?

0 Comments
2024/04/27
07:35 UTC

1

[D] Recommended ML conferences

Hello r/MachineLearning! What are the ML-related conferences that you would recommend?

1 Comment
2024/04/27
06:51 UTC

1

[D] How to find out if evaluation is FID-10k or FID-50k?

Hello all,

I am using the following repository code for computing FID value for my diffusion model. I am performing evaluation using this reference dataset - ImageNet 256x256: reference batch. But I am unsure if I am computing FID-10k or FID-50k using the evaluator.py script.

  1. What is the evaluation used in this case and how to deduce that from the code?
  2. If it is computing FID-50k let's say, how should I go about computing FID-10k (or vice versa)?
  3. How many samples (lower estimate) should I generate from my model in order to get a fair evaluation? Does number of samples generated affect whether I do FID-50k or FID-10k?

Please let me know answers to any question you know..

3 Comments
2024/04/27
04:50 UTC

14

[D] Does it make sense to talk about the probabilities of models?

https://lunaverus.com/programLikelihoods

There is a neat way to frame unsupervised learning as likelihood maximization, but not in the usual way where you just compute the likelihood of the data using a model and ignore the likelihood of the model itself. Rather, this is the combined likelihood of model and data...

Does it make sense to talk about the probabilities of ML models?

3 Comments
2024/04/27
01:06 UTC

4

[P] Source code for EURISKO and Automated Mathematician (AM) found in public archives

0 Comments
2024/04/26
19:43 UTC

6

Open-Sourced: Automated Data Sorting Tools [P]

Hello r/MachineLearning,

I'm excited to share a project that was initially intended to integrate automated AI maintenance features for Windows into an application I was building to sell commercially, but has now been open-sourced for community use and development. The project focuses on automated data sorting and could serve as a base for more advanced machine learning applications.

You can explore the project here: [NazTech Automated Data Sorting Tools](https://github.com/nazpins/naztech-automated-data-sorting-tools)

These tools are designed to quickly automate sorting large data dumps, employing python algorithms suitable for handling large datasets. While the project is no longer in active development from my end, the python scripts are functional and open for any adaptations or enhancements you might find interesting for your own ML projects. I started building the framework for the actual application but due to time constraints and a lot going on irl, I haven't had time to continue working on it.

I am happy however to share these tools with the community and hopefully they can be beneficial to someone else down the road.

Cheers!

0 Comments
2024/04/26
17:29 UTC

17

[D]What Nomenclature do you follow for naming ML Models?

Hi All,

I am brainstorming some kind of a nomenclature for our team so that theres a standard way of naming ML models like their pickle files . Any inputs will be appreciated.

thanks

11 Comments
2024/04/26
16:42 UTC

2

[D] Advice Needed: Enhancing NER for ADE Detection in Clinical Texts (Thesis Work)

Hi r/MachineLearning community,

I'm currently working on the second part of my thesis focused on Named Entity Recognition (NER) for detecting Adverse Drug Events (ADE) in clinical texts. In my first thesis project, I tried to replicate a paper but had to pivot to the n2c2 dataset, which led to challenges in model performance.

I've fine-tuned a DeBERTa model with standard practices, but I'm struggling with achieving high accuracy, particularly with precision and recall. This is my first deep dive into a thesis and the world of NLP, and any guidance would be immensely appreciated.

Also, any common pitfalls for thesis work or useful resources on this topic would be extremely helpful. I'm eager to learn from the community and improve my research.

Thank you so much for your time!

3 Comments
2024/04/26
16:29 UTC

33

[R]Large language models may not be able to sample behavioral probability distributions

Through our experiments, we found that LLM agents have a certain ability to understand probability distributions, the LLM agent's sampling ability for probability distributions is lacking and it is difficult to give a behavior sequence that conforms to a certain probability distribution through LLMs alone.

We are looking forward to your thoughts, critiques, and discussions on this topic. Full Paper & Citation: You can access the full paper https://arxiv.org/abs/2404.09043. Please cite our work if it contributes to your research.

https://preview.redd.it/ai7uks7nluwc1.png?width=935&format=png&auto=webp&s=891dd57ef50d1ee99b1a8b2372b9a460397754d6

10 Comments
2024/04/26
16:11 UTC

3

[D] GAN/Adversary Autoencoder/Cycle GAN

Main aim: Style transfer between two discrete timeseries signals.

Here are the details: Dataset: Discrete time series. 1700 rows, with 97 percent of it with zeroes. Cannot remove these zeroes as it means something. Values ranging from 0-32 for one of the features in Domain A needs to translated to another feature with same range in domain B. Another feature from 0-5000 from domain A, translated to a different domain B with same range. I can recreate the same dataset multiple times with small variations, so we can have larger datasets. I would create sequences of size 20 or 30 and batch: 32 or 64 initially.

Generator Network: A simple encoder with linear layer first hidden size:16 , relu, 2nd linear layer :8 and relu again . A symmetric Decoder .

Discriminator: 2 linear layers with hidden size 8 and leaky Relu between them. And sigmoid as final layer. Loss function : BCEloss . Also experimented BCE + MSE loss for generator.

Training: I'm using pytorch. Only trained with one feature/signal and tried to generate this feature from noise. Didn't move to cycle consistency yet. With the small dataset training, the discriminator becomes too strong, I even tried to set reduce the learning rate for discriminator as 0.0001 and generator as 0.01 , it didn't work. Tried to add/complicate the layer of generator, still didn't work. Tried to train discriminator every 10th epoch, while the generator trained more. Didn't work. Also tried to normalize the data.

I want to explore Adversarial autoencoder /cycle Gan , but the generator is unable to learn anything with vanilla GAN as well. Can someone help or give me some ideas on what I can do ? Thanks

1 Comment
2024/04/26
16:09 UTC

0

GPU out of memory error message on Colab with Llama 3 [D]

Hi guys,

I just tried to run Llama 3 on my Colab(free version) and seems that I ran out of the memory:

OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 9.06 MiB is free. Process 8863 has 14.74 GiB memory in use. Of the allocated memory 14.60 GiB is allocated by PyTorch, and 22.06 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Anyone have the same experience? Has anyone managed to run Llama 3 on free version of Colab (or similar platform)?

Thanks!

2 Comments
2024/04/26
15:43 UTC

2

[D] Overwhelming LLM release rate: Seeking suggestions for building a test set to evaluate LLMs

Hi everyone,

I'm trying to build my own test set in order to make an initial fast evaluation of the huge number of models that pop up on huggingface.co every week, and I'm searching for a starting point or suggestions.

If someone would share some questions that they use to test LLM abilities, even as high-level concepts, or simply give me some tips or suggestions, I would really appreciate that!

Thanks in advance to everyone for any kind of reply."

5 Comments
2024/04/26
14:43 UTC

8

[R] Reinforcement Learning via Regressing Relative Rewards

https://arxiv.org/abs/2404.16767

New deep RL algorithm that works with both language models and diffusion models.

4 Comments
2024/04/26
13:10 UTC

7

[D] Clean caption dataset

I am attempting to train CLIP from scratch. However, there is a lack of available datasets. The one dataset that seemed quite diverse and clean seems to be taken down (laion-400m). Looking at HF datasets, these are the two datasets that are promising, but wondering if there has been anything better/ cleaner.

  • conceptual captions: uses alt-text.
  • red_caps: reddit threads, but these are mostly the first comment on the image than an actual caption.

TIA

1 Comment
2024/04/26
11:09 UTC

170

[D] LLMs: Why does in-context learning work? What exactly is happening from a technical perspective?

Everywhere I look for the answer to this question, the responses do little more than anthropomorphize the model. They invariably make claims like:

Without examples, the model must infer context and rely on its knowledge to deduce what is expected. This could lead to misunderstandings.

One-shot prompting reduces this cognitive load by offering a specific example, helping to anchor the model's interpretation and focus on a narrower task with clearer expectations.

The example serves as a reference or hint for the model, helping it understand the type of response you are seeking and triggering memories of similar instances during training.

Providing an example allows the model to identify a pattern or structure to replicate. It establishes a cue for the model to align with, reducing the guesswork inherent in zero-shot scenarios.

These are real excerpts, btw.

But these models don’t “understand” anything. They don’t “deduce”, or “interpret”, or “focus”, or “remember training”, or “make guesses”, or have literal “cognitive load”. They are just statistical token generators. Therefore pop-sci explanations like these are kind of meaningless when seeking a concrete understanding of the exact mechanism by which in-context learning improves accuracy.

Can someone offer an explanation that explains things in terms of the actual model architecture/mechanisms and how the provision of additional context leads to better output? I can “talk the talk”, so spare no technical detail please.

I could make an educated guess - Including examples in the input which use tokens that approximate the kind of output you want leads the attention mechanism and final dense layer to weight more highly tokens which are similar in some way to these examples, increasing the odds that these desired tokens will be sampled at the end of each forward pass; like fundamentally I’d guess it’s a similarity/distance thing, where explicitly exemplifying the output I want increases the odds that the output get will be similar to it - but I’d prefer to hear it from someone else with deep knowledge of these models and mechanisms.

79 Comments
2024/04/26
11:01 UTC

29

[D] Critical batch size and LLMs

In a video about "A little guide to building Large Language Models in 2024" at 41:38 the author starts to talk about the limits of how big the batch size can be.

Well, if you start to have a very large batch size, the model for each optimization step makes less efficient use of each token, because the batch size is so big that each token is kind of washed out in the optimization step. And roughly, it's a little bit hard to measure this limit, which we call the critical batch size.

I thought that the bigger batch size is always better for training LLMs, because:

  1. It better approximates the true gradient.
  2. We go through the dataset faster.
  3. To my knowledge the limits are only infrastructure, hardware, communication overhead etc.

I found a paper that introduces the "critical batch size" concept - An Empirical Model of Large-Batch Training. It mostly talks about the speed/efficiency tradeoff for data parallelism of large batch sizes. Also another highly cited paper Scaling Laws for Neural Language Models:

Training at the critical batch size provides a roughly optimal compromise between time and compute efficiency

So I don't really understand what author of the video meant by saying:

each token is kind of washed out in the optimization step

Are there any other issues with large batch sizes other than infrastructure, hardware or implementation limits?

9 Comments
2024/04/26
09:21 UTC

0

[Discussion] Time series regression problem?

Hi guys,I have a problem for which I am not sure what would be the best approach (and I cannot really find any relevant literature).I have a small dataset (~100 measurements like the one attached) of a sensor value, for which I want to predict a certain relevant event. Here t_0 is the relevant moment in time which I want to predict. The problem is, that I need to trigger something when the event is reached. If I need too long to trigger after it was reached, it will not be a positive outcome.My initial idea was to basically chunk the time series before the event, and try to predict the remaining time from that segment until the event is reached. When it is below a threshold, I can trigger my action. I wanted to have a look at e.g. XGBoost and feed it small chunks of the timeseries and run this process continuously. I am not really sure if that is the correct approach there.Is that a known problem? What would be a good name for this problem to search for literature? Do you have suggestions how to solve it?

Thanks.

https://preview.redd.it/l59gsmswkswc1.png?width=1303&format=png&auto=webp&s=f9e83d7b87ce5227378de6a0805a916fd4f93314

1 Comment
2024/04/26
09:21 UTC

0

IoU Accuracy on Yolov8 object detection "[P]"

"[P]"

How to calculate IoU score on yolo v8. I didn't find any inbuilt function to do that. I have trained yolov8n on my custom dataset for object detection which has 1 class.

11 Comments
2024/04/26
07:04 UTC

4

[D] What is the State of Art for text to speech synthesis?

I'm starting to do some research for my graduation and I'm looking for some papers on text to speech synthesis. I'm doing some reproductions on a paper I found to be interesting called Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Prediction. Basically it's a model that receives text, turns it into a spectogram and the spectogram is used to build the audio file. Since I'm still at the start of reproduction, are there papers that you guys would recommend looking into? Did you work with speech synthesis (TTS)? What are good refferences I should look into?

I saw this post over here https://www.reddit.com/r/MachineLearning/comments/nxkuvn/d_what_is_actually_the_state_of_the_art_in_text/

But it already has 3 years. Maybe there is something newer than FastSpeech2?

11 Comments
2024/04/26
04:24 UTC

3

[D] Meta-learning vs Federated Learning?

[D] Hey everyone, do you have any suggestions on what's the better option and the most effective way to dive into a hot topic these days?
I stumbled upon a repository for Federated Learning at: 

But can't seem to find anything similar for Meta Learning. Any advice on how to pick my PhD topic would be greatly appreciated!

6 Comments
2024/04/26
03:53 UTC

Back To Top