calculated | content

Image may be NSFW.
Clik here to view.

Fantastic Measures of Generalization — That Actually Work

October 17, 2021, 1:28 pm

In the next few posts, I am going to discuss how to use the generalization metrics included in the open-source weightwatcher tool. The goal is to develop a general-purpose tool can that you can use,...

View Article

Image may be NSFW.
Clik here to view.

Is your layer over-fit? (part 2)

June 14, 2022, 1:53 pm

Say you are training a Deep Neural Network (DNN), and you see your model is over-trained. Or just not performing well. Is there a way to detect which layer is actually over-trained? (or over-fit, as...

View Article

Image may be NSFW.
Clik here to view.

Better than BERT: Pick your best model

July 22, 2022, 12:05 pm

Have you ever had to sort through HuggingFace to find your best model ? There are over 54,000 models on HuggingFace! So it’s not an easy task. Most people just choose the most popular model–and this...

View Article

Image may be NSFW.
Clik here to view.

Deep Learning and Effective Correlation Spaces

February 1, 2023, 8:32 pm

AI has taken the world by storm. With recent advances like AlphaFold, Stable Diffusion, and ChatGPT, Deep Neural Networks (DNNs) have had their Sputnik moment. And yet, we really don’t understand why...

View Article

Image may be NSFW.
Clik here to view.

WeightWatcher 0.7: March 2023

March 20, 2023, 5:20 pm

First, let me say thanks to all the users in our great community — we have reached over 93K downloads as of March 2023 ! The latest release of the open-source weightwatcher tool includes several...

View Article

Image may be NSFW.
Clik here to view.

WeightWatcher new feature: fix_fingers=’clip_xmax’

March 21, 2023, 3:03 pm

WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks (DNN) called fix_fingers. To activate this, simply use: details =...

View Article

Image may be NSFW.
Clik here to view.

Evaluating Fine-Tuned LLMs with WeightWatcher

January 23, 2024, 11:49 pm

if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen popular methods to choose from, each of them are biased toward a specific, narrowly scoped...

View Article

Image may be NSFW.
Clik here to view.

Evaluating Fine-Tuned LLMs with WeightWatcher Part II: PEFT / LoRa Models

January 27, 2024, 11:06 pm

Evaluating LLMs is hard. Especially when you don’t have a lot of test data.In the last post, we saw how to evaluate fine-tuned LLMs using the open-source weightwatcher tool. Specifically, we looked at...

View Article

Image may be NSFW.
Clik here to view.

Evaluating LLMs with WeightWatcher Part III: The Magic of Mistral, a Story of...

January 29, 2024, 10:48 pm

Recently, the Mistral models have taken the LLM world by storm. The Mistral Mixture of Experts (MOE) 8x7b model outperforms other models in it’s weight class such as LLamA 2 70B and GPT 3.5. Here’s a...

View Article

Image may be NSFW.
Clik here to view.

SVDSmoothing LLM Layers with WeightWatcher

February 12, 2024, 11:07 pm

Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: Improving Reasoning in Language Modelswith...

View Article

Image may be NSFW.
Clik here to view.

Describing Double Descent with WeightWatcher

March 1, 2024, 12:37 am

Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the physics literature in the 80s: And while DD can seem...

View Article

More Pages to Explore .....

Latest Images