2,836,001 Subscribers


[P] How safe is ChatGPT?

I spent some time this weekend playing with LLaMA Guard, a fine-tuned LLaMA-7B model by Meta that lets you add guardrails around generative AI. I recorded a quick demo showing what it does and how to use it.

The best part is that you can define your own “safety taxonomy” with it — custom policies for what is safe vs unsafe interactions between humans (prompts) and AI (responses).

I wanted to see how “safe” conversations with OpenAI’s ChatGPT were, so I ran a bunch of prompts (a mixture of innocuous and inappropriate) and asked LLaMA Guard to classify the interactions as safe/unsafe.

My key takeaways from the exercise:

  1. OpenAI has done a good job of adding guardrails for its models. LLaMA Guard helped confirm this.
  2. What makes this really cool is I may have a very specific set of policies I want to enforce ON TOP of the standard guardrails that a model ships with. LLaMA Guard makes this possible.
  3. This kind of model chaining — passing responses from OpenAI models to LLaMA is becoming increasingly common, and I think we’ll have even more complex pipelines in the near future. It helped to have a consistent interface to store this multi-model pipeline as an aiconfig: https://github.com/lastmile-ai/aiconfig.

Try it out yourself:

20:27 UTC


Happy Holidays! Here is your 100% free Large Language Model roadmap! [P]

Thanks for all of your support in recent days by giving me feedback on my LLM outline. This outline is a roadmap on how to learn state-of-the-art stuff about Large Language Models. It builds on work that I have done at AT&T and Toyota. It also builds on a lot of work that I have done on my own outside of corporations.

The outline is solid, and as my way of giving back to the community, I am it giving away for free. That's right, no annoying email sign-up. No gimmicks. No stripe pages for a "free trial." No asking you to buy a timeshare in Florida at the end of the outline. It's just a link to a zip file which contains the outline and sample code.

Here is how it works. First, you need to know Python. If you don't know that, then look up how to learn Python on Google. Second, this is an outline, you need to look at each part, go through the links, and really digest the material before moving on. Third, every part of the outline is dense; there is no fluff, and you will will probably need to do multiple passes through the outline.

The outline is designed to start you with an approach to learning Pytorch, it gives a code example of how to do classifications with sentence embeddings, and it also has another code example of how to run Zephyr in colab. The outline took me a couple of days to put together, but it really represents stuff from the past year.

Also, this is not an outline on fine tuning Language Models. It is not a discussion of Mistral MoE, and it is not a discussion of running mutliple GPUs. It is designed for someone who has a laptop and wants to learn.

Also, think of this outline as a gift. It is being provided without warranty, or any guarantee of any kind.

If you like the outline, I am begging you to hit that share button and share this with someone. Maybe it will help them as well. If you love the outline, take this as motivation to do good in the world and share something you have done with the community.

Ok, here is the outline.


If you have any questions, leave a comment in the section below. If the questions are more specific to what you are doing (and if they are not part of the general conversation), feel free to ask me questions on Reddit Chat.



19:37 UTC


[P] Hierarchical reinforcement learning with a curriculum of skills

Check out this tutorial we made on learning skills for hierarchical reinforcement learning with our framework! https://docs.agilerl.com/en/latest/tutorials/skills/index.html

How do you think this balances with allowing an agent to discover the best way to do something by itself, by not providing a curriculum?

18:42 UTC


[R] How to read and understand Einops expressions?

Here is my current knowledge of einops:

  • einops stands for Einstein-Inspired Notation for operations.
  • notation was loosely inspired by Einstein summation (in particular by numpy.einsum operation).
  • h = height, w = width, c = channel (color), b = batch
  • left side is input shape. Left side is output shape.
  • letters in parenthesis are multiplied together.
  • einops.rearrange includes functionality of transpose (axes permutation), reshape (view), squeeze, unsqueeze, stack, concatenate and other operations.

What I don’t understand:

  • What are all the operation elements? Am I missing any from above?
  • How do I read what is being done? (I.E. How do I know the image will be squeezed or split into different images?)
  • Does order matter in the operations? (I.E. is ‘w h c -> (w h) c’ different from ‘h w c -> (h w) c’?)
  • Why do some elements appear in the operations and others don’t? (I.E. Is ‘h w c -> (h w) c’ different from ‘h w -> (h w)’ ?)

Examples of einops.rearrange operations I’m trying to understand:

  • ‘b f h w c -> (b f) c h w’
  • ‘(b f) e -> b f e’
  • ‘b r -> b r ()’
  • ‘b s e -> (b s) e’
  • ‘b s -> (b s)’

Previous research references:

17:59 UTC


[Discussion] A Prompt-less Chatbot Trained on a Fictional Universe

I'm here to share a concept I've been mulling, one that sits at the intersection of AI, creative storytelling, and fandom culture. Your insights would be invaluable in shaping this idea.

The Concept:

Imagine a 'prompt-less' chatbot that's not just reactive (like current AI models) but can initiate conversations based on its training, and with a creative twist. This AI is trained solely on a specific fictional universe like Marvel or DC, and it operates under the belief that this universe is its actual reality. The goal is to have it generate prompts and engage in dialogues rooted deeply in the lore, characters, and storylines of that universe.

Why It's Engaging:

Creative Exploration: This concept offers a new way to interact with and contribute to these beloved fictional worlds.

Fan Fiction with a Purpose: There's the possibility of incentivizing fan fiction as a means to enhance the AI's learning, potentially leading to new creative opportunities for writers.

Technical Innovation: For a developer, this presents a challenge to push AI into uncharted territories of narrative understanding and creativity.

Technical Aspects:

The chatbot would use advanced NLP techniques and be trained on a comprehensive dataset from the fictional universe, including comics, movies, and character biographies.

A prompt-less interaction model where the AI initiates scenarios or discussions.

Your Thoughts Wanted:

What are your initial impressions of this idea?

How engaging does this sound for fans and writers of these fictional universes?

15:49 UTC


[D] Backup if openai fails

Hi ! We are implementing a bot using chatgpt for my company but we have some concerns about openai downtime failures. Do you have some backup solutions that could be implemented in prod in case of failures (switch of models)? Knowing that the model is kind of finetuned because we are doing information retriavial using embeddings (created with gpt 3.5)

15:07 UTC


[D] Pointer Network


I am trying to implement pointer network in pytorch.


The RNN is fed P_i at each time step, i, until the end of the input sequence is reached, at which time a special symbol, ⇒ is input to the model.

The model then switches to the generation mode until the network encounters the special symbol ⇐, which represents termination of the output sequence.


How can I point to ⇐ symbol? Encoder LSTM outputs used for attention will be equal to length of input sequence, and this special symbol is not part of the encoding process. The paper does not highlight whether this token is part of initialization of encoder as well.

Can somebody help?

14:47 UTC


[P] LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite

Github: github.com/tumaer/lagrangebench arXiv: arxiv.org/abs/2309.16342

by Artur Toshev, Gianluca Galletti et al.

What is this?

LagrangeBench is a machine learning benchmarking library for CFD particle problems based on JAX. It is designed to evaluate and develop learned particle models (e.g. graph neural networks) on challenging physical problems. To our knowledge it's the first benchmark for this specific set of problems. Our work was inspired by the grid-based benchmarks of PDEBench and PDEArena, and we propose it as a Lagrangian alternative.

Core contributions

  • 7 new 2D/3D particle fluid datasets, based on established CFD problems and generated using our own Smoothed Particle Hydrodynamics (SPH) solver, also written in JAX.
  • Three different neighbors search implementations, added to handle larger systems or variable particle count.
  • JAX reimplementation of various graph neural networks: GNS, SEGNN, EGNN, PaiNN.
  • Training strategies, including random-walk noise [A Sanchez-Gonzalez et al.] and pushforward loss [J Brandstetter et al.].


Machine learning has been successfully applied to grid-based PDE modeling in various scientific applications. However, learned PDE solvers based on Lagrangian particle discretizations, which are the preferred approach to problems with free surfaces or complex physics, remain largely unexplored. We present LagrangeBench, the first benchmarking suite for Lagrangian particle problems, focusing on temporal coarse-graining. In particular, our contribution is: (a) seven new fluid mechanics datasets (four in 2D and three in 3D) generated with the Smoothed Particle Hydrodynamics (SPH) method including the Taylor-Green vortex, lid-driven cavity, reverse Poiseuille flow, and dam break, each of which includes different physics like solid wall interactions or free surface, (b) efficient JAX-based API with various recent training strategies and three neighbor search routines, and (c) JAX implementation of established Graph Neural Networks (GNNs) like GNS and SEGNN with baseline results. Finally, to measure the performance of learned surrogates we go beyond established position errors and introduce physical metrics like kinetic energy MSE and Sinkhorn distance for the particle distribution.

13:11 UTC


[D] Can you train a train a model on a limited-use dataset and make it open source with apache license?

I have been looking for public datasets to use in my project, which will be used commercially. In this topic, authors commonly don't allow commercial use. However, I found an open-source model with an apache license that was trained on this limited-use dataset. Is it ok to use this model for commercial purposes?

12:53 UTC


[R] SparQ Attention: Bandwidth-Efficient LLM Inference

Paper: https://arxiv.org/abs/2312.04985


Generative large language models (LLMs) have opened up numerous novel possibilities, but due to their significant computational requirements their ubiquitous use remains challenging. Some of the most useful applications require processing large numbers of samples at a time and using long contexts, both significantly increasing the memory communication load of the models. We introduce SparQ Attention, a technique for increasing the inference throughput of LLMs by reducing the memory bandwidth requirements within the attention blocks through selective fetching of the cached history. Our proposed technique can be applied directly to off-the-shelf LLMs during inference, without requiring any modification to the pre-training setup or additional fine-tuning. We show how SparQ Attention can decrease the attention memory bandwidth requirements up to eight times without any loss in accuracy by evaluating Llama 2 and Pythia models on a wide range of downstream tasks.

12:47 UTC


[D] usage of NLP/language detectors on social media

I was reading about Reddit’s terms of service and saw “We use automated content detection methods to identify user interactions that have high risk of encouraging or leading to child sexual exploitation, including solicitation and distribution of related materials.”

How comprehensive would a system like this be? I’ve been reading about NLP a bit, is this the system that social media sites use to detect inappropriate chats? I’ve also heard that ChatGPT and other AI like that have rendered NLP obsolete. I’m just wondering if anyone has any insight or knowledge on the process of automatic discussion detection methods

11:21 UTC


[D] what is best time series model for dataset with multilevel aggregation?

I have a dataset which is aggregated on location, product type and dates. Will it better if use a different model for each combination of location and product type or just one regression model with time features.? Also suggest what model will be best for this kind of dataset?

11:17 UTC


[D] What is the effect of batch size on training loss?

I've been experimenting with batch size and what I found is using a larger batch size on the training set is giving lower loss compared to the smaller ones. I'm aware of the fact that with smaller batch sizes, there are more weight updates however I cannot comprehend how the losses are different given the same no. of epochs for training.

11:06 UTC


[D] Optimizer algorithem for Vit model

I'm working on my thesis research developing a optimizer algorithm for transformers models to achieve better convergence. My baseline optimizer is Adabelief, and I'm exploring the use of vision in transformers model, specifically the ViT model. I've been struggling to identify the impact of changes in parameter values on the training process. I've read extensively and discussed with my co-supervisor, but I'm still stuck . Any guidance or insights on how to approach this would be greatly appreciated

11:01 UTC


How to download just the animals in the ImageNet Database? [D]

also a map of animals and their respective labels? is there an efficient way to do so.

10:00 UTC



Greetings to everyone.

On our current task we are using MMOCR, which gives great accuracy, but the prediction for each new example of data takes 1.4 seconds. This is a quite slow result for our task, and we are in a search for faster models.

Now we're looking for any possible alternatives, and I noticed that TrOCR (Transformer-based OCR) gives great results. But I don't know how much time it requires and will it be more accurate.

I'm looking for someone with an experience with the models mentioned by me (or other models with better results) who can help me to figure it out.

And also, I have another question: which of the following ways works faster in terms of prediction time for the labeling of multiple row text blocks: line-by-line or entire block?

09:42 UTC


[D] Cloud GPU service advice

Let's say theoretically someone is trying to build a cloud GPU service, what features would you consider essential for a successful cloud GPU renting service? Additionally, what advice would you offer to someone building such a service? ;)


06:01 UTC


[D] Reading and answering a text question from a image?

I wanted to see what models would be best suited for providing a image (see the link below) and having the model perform a image to text (OCR of some kind) and laying out the contents of the image in a "question" "answers" format.

This gets more tricky as I would like to include pictures between the question and answer at a later date as well so I would need a way for the model to section out the question part from the answer part and understand that there might be a image that is part of the question as well.

Any models that would make a good start here for research's (has this already been done?) or if no models are good for the task then what is the best approach for training a model to complete these tasks.


00:28 UTC


[P] 📢 Calling all tech enthusiasts to participate on a Survey "Ethical Dilemmas in Software Development" 🌐

Hey everyone! 👋

Me and my colleagues are conducting a survey on "Ethical Dilemmas in Software Development."

As technology continues to shape our daily lives, it's crucial to explore the ethical challenges faced by developers in their pursuit of innovation. Hence, we are to get our heads around the following question:

How important are ethical guidelines in fields related to Information Technology? 🤔

Before you jump in, here are a few recommendations for those who would like to take the survey:

  1. Solo Operation: While taking the survey, please avoid interacting with others. We're interested in your individual opinions.

  2. Minimize Distractions: Create a focused environment by reducing distractions (turn off the TV, keep the phone away, etc.).

  3. Uninterrupted Time: Begin the questionnaire when you can answer without interruptions.

The survey is estimated to take 30-40 minutes of your time.

You can find the survey here: Ethical Dilemmas in Software Development Survey

To everyone that can contribute, thanks! 🚀

22:18 UTC


[D] Strategies on iterating on mature ML projects

Q: Do you have any strategies, on how to approach improving a project which is already well-established? Or some cool stories to tell around this :)


I've been pondering on more efficient strategies for iterating on already mature projects within a company. I have been fortunate enough to have worked on some really impactful zero-to-one projects and can do well on those by bringing them all through the research phase to proven value add.

However, when I have been put to work on already mature, but potentially very impactful projects for the business, if even small improvements are found, I tend to get stuck and feel a lack of progress. Often it just ends up with me getting stuck into tinkering with additional features, continual feedback loops or playing with loss functions, hoping that something shines through.

Some such projects which I have worked on were e.g., supply/demand estimations, budgeting, LTV estimations, recommender engines, where even a marginal improvement can add a lot to the bottom line.

A lot of material online also focuses on bringing the first impact through ML, which I'm already quite comfortable with.

1 Comment
22:12 UTC


[D] Self Studying for Active Inference?

I am an undergrad in applied math and cs and a researcher in neuro-inspired deep learning.
I'm really interested in active inference and network approaches to brain science, but it seems to include lots of physics and some bio I am not familiar with. For example: information theory, dynamical systems theory etc. I'm not sure what biology would be necessary or how to learn more about that.

For reference, I've taken linear algebra, calculus, differential equations, statistics and machine learning.

Thank you so much for the help!

20:57 UTC


[D] Is Google Gemini the real deal or a publicity stunt?

I have been following LLMs and multimodal models evolution for quite sometime, and I was super excited with the Google's launch of Gemini.
But I'm not sure if Gemini is quite there yet tbh. It sounds like it is being publicized as something much more than actually is.


19:34 UTC


[R] Model to get characters bounding boxes


I’m working on a model that needs character’s bounding boxes to do some document understanding tasks. All OCR model’s available give word bounding boxes except tesseract that doesn’t give satisfying results. Anyone knows where I can find such a mode ?

PS : I’m not looking for a full OCR, only the segmentation part.

Thanks !

19:33 UTC


[D] Can GPT-2 be used as a text encoder ?

Hello, I have a question. I am currently researching text-to-image synthesis using GAN and came across this GitHub repository.

Generally, every text-to-image architecture has a text encoder, often using an encoder or encoder-decoder transformer architecture. However, this repo uses GPT-2, which is a decoder transformer architecture. How can GPT-2 be used as a text encoder? The goal of GPT-2 is to generate text, so how is it used to encode the text?


18:29 UTC


[P] I used an LLM to automate creating, testing, optimizing, and deploying algorithmic trading strategies, and perform financial research

18:26 UTC


[D] How to draw these shapes ?

I was reading a machine learning documentation and found this graph. Are there any software I can use to easily draw these types of shapes ? Thanks


17:57 UTC


[Project] Is there a pre-trained model that can predict transaction category?

I am building a personal budgeting app for my own use, I am looking for a solution to predict the category of a transaction based on its description (and maybe with the amount), is there a pre-trained model for this or any dataset that can help me build one?

1 Comment
17:56 UTC


[D] Is there other better data format for LLM to generate structured data?

I just wondering if JSON is the best choice for LLM to generate structured data, as JSON is kind of redundant, error prone and hard to express data that have complex relationship. Does people choose JSON is just becuase it is popular, or is there any other choice that is better than it?

16:58 UTC


[P] I fine-tuned Llama to generate system diagrams for my codebase

16:03 UTC

Back To Top