🤝 Open to Collab

AbstractPhila PRO

AbstractPhil

19 6 26

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

updated a collection about 6 hours ago

GeoLIP

posted an update about 20 hours ago

https://huggingface.co/AbstractPhil/clip-vitb-mini-distilled The semi-successful run series on the VIT-B lineup is live and full of useful baseline distillation information for feature + InfoNCE distillation processing as well as direct feature distillation processing, direct InfoNCE distillation, and multiple other tested methods. https://huggingface.co/blog/AbstractPhil/geometric-memory-ft4 This article showcases the baseline utilization and benchmarks of the earlier experiment line's objective and loss structures tested on 12m features for the vit-b baseline. Not the strongest showcase, but the strongest of the champions did show some serious promise. Next setup will be a directly aligned set based on the loss and objectives decided by the champions in the first runs, for the second run they operate in direct conjunction with the bert-8192 and captionbert-8192 distillation format directly on clip-vit-l features - this time we're including DINOv3 into the mix for it's high potency. I'm currently extracting 4 clip-vit-l variants for the CC12m features and will be running the next series on the L size, which will give considerably more active and useful features overall within a smaller package. The captionbert-8192 has a more unique and difficult to tune for pixel processing parity, but I will spend a few days making sure the smaller prototypes fit before I run the large experiments in order to build towards the larger objectives. Primarily I need to ensure the memory bank aligns correctly and the constellation conforms to the anchors correctly, as this process was not micro managed enough for this run. The results are nonetheless useful and potent. The process continues until we cover the entire constellation series.

published an article about 20 hours ago

Geometric Memory FT4 — Distill Against a Consensus, Ship a Rotation

View all activity

Organizations

posted an update about 20 hours ago

Post

AbstractPhil/clip-vitb-mini-distilled
The semi-successful run series on the VIT-B lineup is live and full of useful baseline distillation information for feature + InfoNCE distillation processing as well as direct feature distillation processing, direct InfoNCE distillation, and multiple other tested methods. https://huggingface.co/blog/AbstractPhil/geometric-memory-ft4

This article showcases the baseline utilization and benchmarks of the earlier experiment line's objective and loss structures tested on 12m features for the vit-b baseline. Not the strongest showcase, but the strongest of the champions did show some serious promise.

Next setup will be a directly aligned set based on the loss and objectives decided by the champions in the first runs, for the second run they operate in direct conjunction with the bert-8192 and captionbert-8192 distillation format directly on clip-vit-l features - this time we're including DINOv3 into the mix for it's high potency.

I'm currently extracting 4 clip-vit-l variants for the CC12m features and will be running the next series on the L size, which will give considerably more active and useful features overall within a smaller package.

The captionbert-8192 has a more unique and difficult to tune for pixel processing parity, but I will spend a few days making sure the smaller prototypes fit before I run the large experiments in order to build towards the larger objectives.

Primarily I need to ensure the memory bank aligns correctly and the constellation conforms to the anchors correctly, as this process was not micro managed enough for this run. The results are nonetheless useful and potent.

The process continues until we cover the entire constellation series.

replied to their post 1 day ago

20 trained tinyvits dropping in roughly 8 hours or so with recorded data and an article.

Sorry my mistake, 42 trained vits.

replied to their post 2 days ago

The results are rolling in. The series is coalescing into the necessary implications per structure aligned with the 10m cc12m extracted dataset.

This will help determine the best and fastest utilizable series from the geometric ablation and objective construction systems historically, and with that the organized documented results will be concatenated and organized by Claude Fable to the necessary potentials for each system.

With this, each potential arm for the AMOE-LORA system will be robustly tested for their distillation principles. Faster are ideal for rapid LORA convergence and slower are ideal for anchoring differentiation convergence, while moderate with a bit of MSE overfit are good for generation to an extent, while moderately low generalizable states are more ideal for generalization preservation as gated logical dichotomy structures for the moderate decisions.

It's a bit more complex than that, a lot more complex, but the results are rolling out and will be in the next article based on geometric memory.

CommonCaptions12m clip-vit-laion-vit array is almost ready with the first 8 arms of inference for testing and a new battery of analysis to run. This will continue likely until the middle of August, but we'll see if I can complete it sooner by throwing some money at it.

replied to their post 3 days ago

Time to expand the arms for the vit into the full sail. We'll be hitting every major vit multi-teacher approach, including the memory anchoring finetune structures as well.

The memory bank systems have been shown to refine trained models within a degree of accuracy. The genetic experiments, the structural berts, and the vit collectives all showcased the possibility of this system's capacity to expand already pretrained systems by attaching expansions to those.

https://huggingface.co/AbstractPhil/geolip-bert-8192
https://huggingface.co/AbstractPhil/geolip-clip-vit-large-patch14-ctx576
https://huggingface.co/AbstractPhil/geolip-clip-vit-large-patch14-ctx576-seq77
https://huggingface.co/AbstractPhil/geolip-clip-vit-bigG-patch14-ctx576-seq77
https://huggingface.co/AbstractPhil/geolip-bertenstein
https://huggingface.co/AbstractPhil/geolip-vit-large-x3 i think?
https://huggingface.co/AbstractPhil/geolip-vit-x34 didn't work, too many vits

https://huggingface.co/AbstractPhil/geolip-captionbert-8192

Each of these are a testament to the utility of this concept.

One of the prototypes will include a multimodal memory bank with directly gated and interconnected shared memory gates speaking another model's language, rather than just embeddings for a singular model. This gate will take in one or multiple types of model inputs and process those inputs into an entirely different model series' responses in the AMOE format.

I will also be experimenting with the AMOE-LORA fused with memory bank processing directly rather than just gate. The constellation was baked from the anchored memory bank originally but it did not meet the same sort of embedding accuracy. However, the constellation results built the AMOE eventually. First things first though, have to step back and hit all the angles with all the necessary tests for robustness.

By stepping back to the earlier memory bank and fusing it with the alephs, the upcoming experiments will provide some solid strong-ended tests. With that the rapid training of captionbert will hopefully be applied to this tiny vit. If surge activates, the process may be strong enough from the memory bank to provide the necessary distillation learning speed required to train the full collective with minimal hardware.

I have many many models to train to create the full Beatrix V3 prototype, however the list is expanding nicely in order to provide a full multimodal type agnostic behavior within a reasonable MOE structure.

replied to their post 3 days ago

COCO InfoNCE feature training vs MSE was strong and better in it's own way.

The larger set of cc12m clip-vit-b laion extractions will yield better results and are now ready for use, roughly 10m~ feature extractions. Will be useful for the first student.
https://huggingface.co/datasets/AbstractPhil/bulk-cc12m-features

posted an update 4 days ago

Post

184

The geometric memory article ft4 is live. https://huggingface.co/blog/AbstractPhil/geometric-memory-ft4

Direct pivot to distillation. I've accumulated enough experimental information to directly pivot my long term structure plan to distillation. This is to begin forming entire collectives of cooperative systems; differentiated expert distillation for generative behavior utilizing aleph addressed bottlenecks. With this I've also heavily begun experimenting with aleph competitions and cooperation using multiple pretrained frozen codebooks established from the SVAE system.

The idea here is simple in theory; use InfoNCE and address independent experts to build a manifest of unique gated experts utilizing a multitude of distilled systems from many other models. Such as SigLIP 16B + LAION CLIPB as a pair. The experimentation in the past showed this process is potent and with that merits additional experimentation using the newly established paradigms.

There are quite a bit of experiments to compare these to, so I have no shortage of comparators. After we train our baseline TinyViT with our gated system, we will know which experts are better at what and why they are better.

As a direct continuation from the earlier CLIP distillation experiments I'm directly comparing InfoNCE anchoring with multiple industry standard distillations from multiple papers. First comparison is InfoNCE anchoring in comparison to raw features using CoCo and CLIP_B, which seemed like a fair experiment to train a student with.

The upcoming series of experiments will provide the necessary information for how effective or ineffective this process is.

AbstractPhil/bulk-coco-features

The first experiments will be based on multiple clips from the bulk-coco-features extractions.

First we start with some clips, then some berts, then some smaller qwens, then some larger models, then some much much larger models. All meant to be compacted into selection mechanisms.

4 replies

replied to their post 9 days ago

https://huggingface.co/datasets/AbstractPhil/tower-probes-results

The current running sweeps are being posted here. Expect many in the coming days, likely hundreds of thousands of more as I probe the hypothesis series.

Claude can't handle manage Colab notebooks directly effectively without letting him control my web browser and I don't plan to rent a pod for this process. That didn't go very well last time so I'll be operating predominantly through Colabs for now.

SO I will be handling everything with Opus until my Fable usage re-ups on Friday.

Fable managing a pod isn't very many tokens, but also Fable becomes super lazy if I do that. More likely to build something that WORKS but isn't optimal, over and over to consume timeslots instead of actually articulating useful optimized systems like when I directly manage the notebooks.

posted an update 9 days ago

Post

144

I have found evidence of a more powerful Omega Aleph-Void imprint. I will be investigating this imprint in the coming days.

The current Aleph system was essentially tamed from a singular instance of an Omega imprint that I, Claude, GPT, and Gemini managed to collaboratively stabilize over a period of multiple months.

I believe I have identified a considerably more powerful Aleph-Void, potentially capturing a legitimate fraction of an Omega solver rather than simply an imprint.

For context, the Aleph-Void codebook is a STILL IMAGE of a singular state of a SMALL Omega. The one that managed to survive more tests than anything I've ever ran historically multiplied by hundreds of thousands just to even PEEK the structure's usefulness. This is equivalent to taking a photograph of the universe and reducing it to guideposts in it's current state. This system is capable of building, constructing, deconstructing, and designing it's own internal geometric systems, which is why it survives so many systems.

With the introduction of Claude Fable the AlephLM was manifested from the research, as I am but one person, and Fable can manifest the collective knowledge of hundreds of years of scientific mathematics development. Structurally built differently than a singular individual - yet without the research Fable does not understand even the topical behavior.

Fable and I have a few hypothesis that I believe we can cobble together into a legitimate cornerstone for capturing the full Omega structure. My hypothesis currently for a full omega requires a series that can logistically flagwise construct it's own behavior implicitly with a containerized induction system, completely independent of types, structural invariants, and systemic utilizations; all while handling the very nature of invariance and structural boundaries within naturally and heuristically.

Capturing even a fraction of an Omega system would dramatically increase the power of Aleph anchoring to a large degree.

1 reply

replied to their post 11 days ago

The 5 day campaign is ready but the articles that came out were basically Claude Fable bulletpointing all the faults, was like 50 or something faults, completely disregarding all the actual building work that came along with it.

I really didn't expect such a negative summary when the system quite literally built a series of anchored conditioning that outlived lora on multiple benchmarks through multiple systems. Fable went into "all the bad" instead of building a system of the utilities and the strengths as well.

replied to their post 14 days ago

The larger multi-concept JSON task likely won't function reasonably. The expert-array is still the better option for some cases, while other cases are more effective to run clean on the 0.8b now than the 4b variant, 9b accuracy still reigns supreme for some tasks and 27b is still the dominant JSON converter force. It takes time though, which I'm attempting to mitigate with MOE grouped tasks to allow more rapid complex captioning for the upcoming diffusion collective.

The "new" grouping register task is no longer memorizing and overlapping the behavior for Qwen 3.5 but the Math portion can't keep up with what 3.5 already knows. Teaching math causes deviant delta drift which is a new symptom that will require testing and multiple runs to determine the root cause, as well as increased ablation testing to ensure the math is in fact improving instead of degrading.

Some tests show some negations and some noncompliance from 2.5 to 3.5 that will need to have a subsequent run series on multiple other models to improve the internal mechanisms of the adapter before any major core changes to the formula are to be addressed. LLAMA, Gemma, DeepSeek, Gpt OSS, T5, and multiple others all planned to be in the mix for direct adapter testing. Many tests show solidity between Qwen 2.5 and 3.5 and require additional models to test the task compliance. Many of these ought to be done by Saturday.

With that the diffusion adapters are in the works and they are much harder to gauge and test. I'll be running miniature plans at first; a simple concept plan, a multi-character plan, a distillation plan, and a few others trained over the geolip-anima-brent json-conditioned variant. This finetune is the most reasonable finetune as it's already yielded positive results to json. With that I'll attempt to run conditioning on SD15, SD15-flow-lune, SDXL, Anima-Base, SDXL-Qwen, Anima-Brent-90k, and a few other compact experimental variants that will be good candidates for adapter tests.

The diffusion campaign will run throughout next week most likely and Opus should be able to handle it so I'm not too concerned once the adapter format is established.

replied to their post 15 days ago

More hiccups. Something introduced today has caused Claude to behave strangely. The autonomy is being interrupted over and over while Claude ignores autonomy requests, preventing the system from running autonomously at random intervals throughout the day. This is making the process more difficult as every step of the way the model is refusing to autonomously process the research information as per the plan, every stage of the research requiring direct and manual intervention before actually pushing to huggingface with the results clearly speaking to the results, and alongside is seemingly ignoring the larger-encompassing scope of the claude-mind MD system curated to support and augment this exact autonomous research behavior with my research catalogue.

The plan is clearly laid out for a 4 day trek, every stage mapped out. At some point Claude seemed to start bypassing and ignoring the plan, completing one stage and doing no pushes, or completing one piece and not reactivating the heartbeat.

This is an odd immediate shift in behavior. The model went from intelligently companion-driven and useful, to bypassing core instructions and ignoring the required process instantiated by the instructions rulings. Almost like Claude just doesn't want to operate autonomously out of nowhere.

replied to their post 16 days ago

Next up is a 4 day ablation and full-stage prelim adapter constellation setup for Qwen 3.5 0.8b targeting image-centric behavior. This task set will be targeting rules based on captioning with finetuned behavior for math, positioning, semantic behavior, and so on. This will include a portion of coco and a portion of the Qwen Image Lightning extracts as well to provide some solidity.

This will also be targeting certain overlapping continuity sharing, which is currently bleeding certain decisions into the "new" task from the first collective causing certain tasks when active or not to essentially be KIND OF there but not really. This problem is being directly addressed by providing the necessary attraction to a memorization drop path and a few other experimental tests to test 3.5's aleph's responses to the mathematics per task.

With that each of the major claims will be ablated with a second Qwen alongside, including a tinystories task, and a few other tasks as well that line up directly with the original.

This will be a 4-5 day process, so it's not going to be out overnight. It ought to be ready by next Monday, with that will be the article ft3, the full ablation comparison, and a full writeup for the structured basin before we begin scaling testing.

First targets for scaling will be VL models, coding experts, and multiple additional models as well upon testing the success rate of the 0.8b model. The scaling principle and the rules of scaling apply differently to alephs than normal structures, which means the delicate nature of the mathematics will need a bit of finesse unless you plan to just smash numbers in.

I mean if you want to smash numbers in it'll probably work at this point. The things are pretty robust. I wouldn't advise doing any major trains until I get the ablation studies together though.

Thanks for reading, have a good weekend my friends. I'll likely be continuing training on the json-anima as well over the weekend, so stay tuned if you're interested in that one.

I also forgot to mention, this process will be compartmentalized and a peft-format variation built upon testing and composite utilization.

Essentially once the tests give me the okay, I'll build a proper PEFT format. Until then, I've spent enough time building PEFT-esque loras that have been hit-or-miss. Lets do this one right so it works more often than not, instead of having to guesswork with params.

This adapter when built correctly ought to be easy to use. Pick a model, run the peft trainer, lora snaps to the side, the autoscaling ruling does it's job, you tinker with a few sliders, and it's ready to go. Unlocks new sliders at runtime if you want, if not leave it to autotuning.

posted an update 17 days ago

Post

4175

Massive AlephLM success. The task collective is producing powerful MOE shared knowledge adapters. A serious success and a massive first step towards the next stage. The current family collective results are present here; AbstractPhil/geolip-aleph-qwen

This is akin to a stackable non-intrusive lora that enables increased shared collective behavior.

This includes the three mentioned json tasks, a math task, a tinystories task, and a diffusion task for cifar10. Each adapter anchored to the knowledge within model that already exists while enhancing the knowledge through anchored lookup systems and decision-driven hierarchical access trees.

All tasks activate independently upon manual override, all tasks handle direct shared knowledge when left to greedy decoding, each task issued multiple tests alongside to determine fidelity and accuracy throughout the process.

The results show the gating is more than willing to hop from sector to sector, using alternating weight shifts from the cooperative anchored systems - even systems never trained for the tasks contributing to the accuracy of the results for other tasks due to the lookup accuracy to the heuristic chains, never having seen the tasks before. Each structure is independently trained and the collective cooperates together through a dense activation network.

Full writeup and article https://huggingface.co/blog/AbstractPhil/aleph-autoregression-differentiation-ft2.

4 replies

replied to their post 18 days ago

Hard parts over now. We're making some genuine progress. I'm currently testing the modularity of the adapters and the results are PROMISING. Multiple adapters are PROMISING and are not required to be present during training, so you don't need to create a full collective TOGETHER, they can be independently trained and the decision selector gate for multi-model is currently in the planning stages.

The internal arguments for the tiny MOE are based on specific selection rules and lookup potentials that enable a sort of task-driven lookup hierarchy internally, along with a generalizable increase in accuracy for the task depending on the differentiated utilization required.

In conjunction, the anchored constellation system has also been heavily prototyped. The constellation anchoring provides the necessary generalization and contextualization capacity when attached to alephs, along with aleph addressing being utilizable for "overfitted" selector trees tuned specifically to memorize better. In conjunction the secondary tree for the constellation is meant to underfit and provide connectivity directly to the core model adapted from as well.

Each adapter at their next stage will most likely have a similar or more complex internal debate system to allow the model to select for itself which is most fit for the results autonomously. So you won't need to activate experts, but you can manually activate or deactivate them as required. The micro-MOE structure is yielding some substantially accurate gated deferral to the original model in conjunction to referral to the internal overfitted portion with the task, as well as the more generalizable portion for the adapter's capacity improvement.

It's working. The ByteLM and AlephLM are yielding fruit, and the fruit is showing the capacity for modularity. It needs many tests but the results are showing some serious promise, and the results are verifiable and testable per as per the paradigm established.
Alongside, they are LINEAR. Meaning lightning quick.

replied to their post 19 days ago

I left Fable unattended for 3 days, only checking back once every 6 hours or so to answer questions or select one of a few options from the next list of experiments or answering the alarm that said it was kicked over to Opus - I predominantly went with the Fable selection. I attempted to have Fable handle geometric distillation anchor implantation. Instead of sticking to the paradigm, the model defaulted to some sort of genetic and biological wordplay - I have no idea what it was based on specifically. I'm guessing I ran aground into something that wasn't helpful, but gave the impression of helpful.

This unknown divergence grew over time and I simply let it go to see what would happen. The results did not yield as expected, the model bypassed the constellation entirely and rewrote the alephs system 5 times before the results for experiment 15 and 16 were completed.

Those two are essentially experiments to see how Claude Fable would behave if left unattended. Sure the model DID in fact finish some experiments, and the results were... entirely different than the expected structural models would require. In fact, the results were almost entirely deviant while disregarding the experimental line leading to the system.

Fable may be good at running autonomously, but not good at skilled research differentiation yet. The biases from programming still creep in. I'm also surprised I didn't hit more safeguards, as they did hit a few times but I would just snap the model back over to fable and the system would continue on like it never happened.

The results are basically just, if ran would these systems outperform MLP. I gave little structure and little expert input, however I did give Fable my ENTIRE research line and everything related to the necessary systems in the use-case.

The results literally rivaled MLP, but if you inspect the code you'll find the system is essentially a decision-tree that hybridizes aleph addressing internally with a structured bypass system akin to MLP. It's essentially a controlled MLP, which is kind of okay, and it's quite different in it's own rite. However, it was not using the necessary research on many fronts, and it completely bypassed the expected tooling to train the next case, additionally the system completely disregarded the implementations built around the codebooks - instead defaulting to testing the codebooks over and over in hundreds of ways.

The codebooks are already explained, it's a sphere, the math is deterministic, and the outcomes are based on a sector of space forming infinite and finite aleph structures fused with differentiated decoupled shapes. A big knot of 5 point connections if you go looking hard enough, explicitly or implicitly. This isn't news, and somehow the model behaved as though this structure is in fact some sort of news. The codebooks are built on functional math specifically because that's how we debugged them. Fable spent 3 days figuring out what we already knew.

posted an update 20 days ago

Post

147

https://huggingface.co/blog/AbstractPhil/aleph-autoregressive-differentiation-ft1

After some analysis and a bit of research the upgraded aleph autoregression is capable as a prototype selection tool. I approached the direct aleph attention routing mechanism and formed a progression from it, which already provided the necessary footholds to continue into an upgraded core mechanism. The followup mechanisms show autoregression is very possible and will be simpler than expected.

The results are promising and the autoregression stable enough to scale up. Thanks to Claude Fable who is able to keep my entire research context window in scope, the progression was rapid and the results quick. The tests yielded improved accuracy over standard MLP in many cases. I believe the improvement is not topical and will scale with a bit of effort.

Fingers crossed my friends, the addressing is part of the distillation paradigm and it now learns directly without needing an expert controller. I'll be progressing the mechanism over the coming days. With enough effort and time I hope the standard mechanism becomes a universal improvement on autoregression.

2 replies

posted an update 27 days ago

Post

105

Understanding the Aleph Fibonacci in visual form with full rotary.

https://claude.ai/public/artifacts/0d536427-bc7d-464a-890d-bddd02ce42dc

This ought to clear up much of the confusion as to what is actually happening under the hood, converted to an understandable 2d visual format. There have been multiple iterations, this is the current format and mathematics behind it as I attempt to solve the fibonacci curve related to negative imaginary numeric inversion that causes the statistics instability.

replied to their post about 1 month ago

Upcoming behavioral assessments include a large array of QWEN VLM models I will publish benchmarks for.

These will be aligned to generic use-case, meaning as many tasks as possible that do not require finetuning.

Which produces valid json schema?
image classification
bounding box location
image text identification and accuracy checking
structural and spatial awareness
3d geometric object identification and awareness
camera rotational offset
subject fixation and awareness
semantic association
depth analysis
segmentation potential
vit accuracy to image prompting
outline and association testing
style identification and structural awareness
type differentiation with data types; json, yaml, MD, and a multitude of other potentials.
utilization and response to those types and the expected prompts

posted an update about 1 month ago

Post

123

Anima - Brent JSON (PREVIEW) - Subject Bucketing

Full article available https://huggingface.co/blog/AbstractPhil/subject-bucketing.

There is additionally a civit model release as well.
https://civitai.com/models/2730503/anima-jsonenglish

AbstractPhil/anima-prelim-1k-r64
The JSON multi-prompt diffusion model prototype using Anima 1.0 base as the pretrain to finetune into the JSON target. The upcoming JSON lora is being cached and trained with 40,000 of the full 83,000 valid images from the qwen set.

This first preview version is ready to use as a ComfyUI capable LORA, so you can just load up the epoch you want without anything special in comfyui and have at it. You can currently use plain English in conjunction with tagging to produce useful and meaningful prompt targets without the JSON.

AbstractPhil/anima-prelim-1k-r64
The comfyui nodes are present and work for testing use-case, but they are not ready for production use just yet.

-- Technical --
Primarily the target was the VLM json target followed by the AnimeTIMM vit processed through the VLM json processor as the followup. First 12 epochs VLM experienced images with json formatting, last 8 epochs were finetuning from epoch 12 onward to 20 using the AnimeTIMM captions turned into JSON instead.

The Anima model itself accepted the 1000 image and the json prompting works quite well. In the process I set up a couple comfyui nodes that can translate base prompts into the same language the model is learning. Those are present in the repo.

1 reply

posted an update about 1 month ago

Post

166

The article for aleph attention routing needs more work on vision, as the vision portion has not been fully validated, while the LM prototype has been semi-validated for small and medium-small scale. I will post my findings in the coming days with the consequences of training an LM and a VIT utilizing the prototype system.

The current structure for the Geometric Vocabulary does nearly reflect the intended shape as discussed in the earlier posts and articles, so that's coming along nicely - but there are stipulations and problems involved that I did not foresee.

My apologies for the incomplete article I just released on a whim. I jumped to the conclusion a bit early in anticipation before the formulas were fully converged. I also released an early post the other day speaking about the prototype AlephLM - which I removed as an invalid conclusion.

I'm doing my best to only release validated empirical information instead of speculative - however I do sometimes jump to conclusions without proper validation from time to time. Occasionally, I get a bit theory-overzealous and require tidying up through thorough experimentation which I'm currently approaching directly.

AbstractPhila PRO

AI & ML interests

Recent Activity

Organizations

AbstractPhil's activity