What is “generalization” ?

I’ve been pondering this question for a while now. If x is a Cat, then x + dx is everything that can be still recognized as a Cat… the bigger dx the more “general” the inference. dx has to have a dx_Max, if dx > dx_Max then x+dx cannot be recognized as a Cat. As far as I can tell this is how “Deep Learning” does generalization and I believe this is how we do generalization too.. How am I trying to do generalization ? The same basically but the source of errors (what changes dx value) are multiple… In the end, whatever signal activates a neuron N, is a Cat…

A new year 2023 !

2022 has come and gone with little to show for. I’ll try to summarize all problems and progress for the past year:

  1. Synapse kinetics – as far as I can tell this works as intended, but the model is a crude approximation of the glutamate cycle within the synapse and some parts perhaps should be changed, but it all depends if the approximation is good enough. I don’t have enough information yet to say if it is good enough or not, because other parts of the system are not working well, or not at all.
  2. LTP/D – the change in AMPA receptors (or any other change) at the synapse level are still not clear to me. I don’t have a theory of what should be accomplished by this change. Literature data is too vague on this and the conditions under which the changes occur are not for sure determined. Experimental data is clear enough but the conditions that lead to changes don’t seem that would happen under normal neuronal conditions, therefore is hard to infer from that data, how is that used for information processing. Is clear to me that a neuron cannot work with multiple frequencies synapses regardless of how LTP/D would work. Is not clear to me if there are multiple frequencies, I envision mechanisms where all frequencies are the same (starting from the amacrine cells) but there is a difference only on the phase of the signal. Is clear to me that LTP/D, regardless of how it works specifically, would change the direct correlation between incoming and outgoing frequencies, meaning a high incoming frequency signal could lead to a low (relative) firing frequency on the postsynaptic neuron, because of a low gain in AMPA receptors..
  3. Synaptic Connections. I worked under the premise: “neurons that fire together, wire together” and I implemented 6 different mechanisms (st of rules) for connecting 2 neurons. However in the end there are detailed that make the whole concept uncertain. I have assumed that a neurons when activates will send a signal in its proximity promoting axonal growth from close by neurons which will lead to forming a connection. The problem here comes from the following unknowns: – how far is that signal spreading ? How fast ? How much does it last ? I have assigned equal “probabilities” to form a connection based on distance but this approach leads to a problem o symmetry, too many neurons would become identical and will fail to separate incoming signals. Ignoring the unknowns there is an additional fundamental problem. There is a mechanism that leads to breaking a synapse, that mechanisms seems to supersede the mechanism of forming a synapse. So the mechanism of forming a synapse could be totally random and will still work because the control comes from the synapse breaking mechanism. Yet having equal binding probabilities should have worked too, but it doesn’t because of symmetry, so there still have to be some rules for forming a new connection but I failed to find anything convincing.
  4. Inhibitory neurons – I believe they are a must, but there are also too many unknowns: should they completely stop a neuron from firing or just modify the firing frequency ?? Both mechanisms seem reasonable but I cannot form any theory of how should they work because of unknown details: do they have same activation potential ? do they have same repolarization time ? Can the repolarization time change as a hard to reverse change ? Since they are otherwise regular neurons I still have to deal with all the other problems LTP/D, synapse connection/breaking. I have also to understand how much to inhibit the other neurons, is the level of inhibition a fixed values ? Can it change, be increased or decreased.
  5. Feedback mechanism – I have implemented a way of changing the behavior of a synapse when a feed-back signal is present, but I have no idea of why I should feedback a negative or positive signal… When or why should I change a synapse through feedback ? I have thought of an abstract reason, just declare a neuron good and one bad and if the signal reaches either an appropriate signal should be sent back.. But because of all the other problems I could never test this hypothesis.

What are the predictions for 2023 ? Considering all the unknowns, I don’t believe I will make significant progress in 2023.. All 5 bullet points should work “correctly” otherwise nothing will work… There are many many combination among the 5 and no working theory, so trial and error it is… That takes a lot of time and my motivation is not good either, discovering 10 000 ways of failing may seem fun at the beginning but after a while it takes a toll on you..

How is color processed ?

This too broad of a question to have a simple answer :). I’ve been trying to make a color separation for the past month or so… That seemed simple enough.. but no luck… To separate colors I needed first to have a clear understanding of the role of LTP/D in information processing.. I thought there was the problem … but no.. I’m quite sure now that a neuron cannot accept inputs of various frequencies at the same time… Neurons that receive different colors can accept only a single color at a time, they are color selective it seems.. I don’t understand how a red line (for example) is seen as continuous, when in fact some of its neurons fire at different frequencies (because they may be specific to blue or green)…

So LTP/P does not have the role of synchronizing synapses from neurons receiving different colors. This was one of my working hypothesis for a role of LTP/D.. Now I have no role in mind for LTP/D… again, nothing..

Another thing, any application of LPT/D leads to an irreversible alteration… Running pattern 1 then 2 then 1 again => the response for pattern 1 before and after pattern 2, are not the same.. Now it just happen to be this way, but should it be this way ? or should I get the same response for pattern 1 always ? I’m not quite sure anymore, I thought I should always get the same response… But even if I don’t get the same response, I get the same relative response… it still fires before a competing pattern.. Still, I don’t have sufficient evidence that this would be the case all the time, it’s reasonable to believe that will not always be the case, but even so, this way of working, where the current result dependents on history, may be the correct one … Would be easier to get always the same response for pattern 1, but that does not seem possible… Say there are 3 synapses firing for Pattern 1… If synapse one is also part of pattern N, it will get altered, then when running again in pattern 1, the end result will be a different answer for pattern 1.. Without LTP/D I would always get the same result… so why LTP/D in the first place…

what is learning ?

I’ve said a while back that learning is changing a variable and I believe that it holds true in the most general abstract sense. Something has to change when something is learned. This definition doesn’t help much though, these are some questions that need answers to be more useful:

  1. the variable that has to change, let’s call it x, has to have an initial value. What should that value be ? Why ? What does it represent ? – — > In my system the abstract value x has an initial value that would allow a single synapse to fire a postsynaptic neuron and is constrained by some arbitrary values for Activation Potential and the initial “glutamate” concentration in the postsynaptic axon.
  2. When to change x? this is not clear at all, the input value that would alter x, in this case firing frequency of the presynaptic neuron, is variable. That input value is rarely “accommodating” the x variable, but how much of a deviation from x should be allowed, before changing x ? This is the case where the decision is made locally and rely only on the deviation value from x. Relaying only on the local environment to change x does not seem to work. Changing temporarily x to alert another “decision node” seems more useful, less random. Yet this is just deferring the decision of making a decision.. Another “decision node’, will just be the same problem where instead of x, we have y. Deferring a decision should end somewhere, to some other z variable, that is in fact a constant. Any push to change that should result in a feedback loop that would promote action to change the input..

I see some benefits in having multiple decision nodes that make changes in short feedback loops and have a final decision node that would promote action, yet this is still not enough and not clear. Another way of framing question 2 would be: “What is feedback ?” – while the same question, the answer does not seem to be the same.. Feedback to what ? When you say: “This is not a cat”, this should alter multiple decision nodes or just one, because why that is not a cat, can be many things… Then what to change ?

3 How much to change x ? Looking for a minimum ? Does not seem feasible because it take a long time to accomplish anything. Adding a single AMPA receptor on the postsynaptic side of a synapse, has a big effect on the final output is not just adding a single Calcium ion in the mix.. I have no clear idea about this. When I change x in my system that is an exponential decay but I have not found a dx that would have a link to the rest of the system.

4 When to stop changing x? Just because I detect an increased frequency, increased from the expected value, that is, does not mean that I should change x indefinitely, yet how do I know the change I made is enough ? Still based on expected value of the next decision node ? That node would change x so the expected value in the decision node is met while asking further away nodes if it should also change the expected value ?

Since my last post I haven’t done any work on the code side of the project. I initially had a theory of how this should work, but that proved to be too simplistic. Then I programmed everything I could think of based on what is known in biology hoping that by doing so, something would eventually start making sense, But that was not the case either. Nothing has become clear. Now I’m again trying to understand the basics of the problem and change the code to fit a certain theory 🙂

what is the role of LTP/LTD ?

Still unknown :(. I have assigned some roles based on what I read in literature, but it is still unclear what role it plays in information processing. I have two opposite views..

  1. Could be that through LTP/LTD, the range of “states” for a synapse can be enlarged and would not play a role in learning, at least not directly. That would imply that both processes should be fast, easy to implement, flexible and equivalent in terms of the default (or most likely) state.
  2. They play a role in learning, as such, the state of a synapse should be difficult to change through LTP/LTD and would have a meaning (would represent an internal reference frame).

Since I have not decided which one is which, I implemented the algorithms to behave something in between, but that does not seem to accomplish much… In short, I have made no real progress..

Long Term Potentiation/Depression

Any type of modification for synaptic “strength” leads to modification of the firing rates, so much so that a specific pattern cannot be correlated any longer with a firing rate. That means “learning” has to be limited to perfect timing and the formation or breaking of synapses. I’m not sure what to make of it…

I have assigned 2 roles for LTP/D:

  1. Both will contribute to synchronization of firing events by modifying the frequency of firing – this requires a firing event to take place.
  2. Both will adjust the synapse to high/low frequency leading to breaking of a synapse in both cases, very high or very low frequencies. This does not require a firing event.

I have run simulations where 4 synapses linked to a single neuron fire at different frequencies and there a two scenarios:

  1. The neuron will remove synapses with lower frequencies and the synapse with the highest frequency will remain, but there are cases where that synapse is also removed because it cannot activate the neuron by itself any longer.
  2. The neuron will go through LTP/D events that are canceling out, synapses are not removed but the neuron will activate with variable firing rates. As far as I can tell at this time, a neuron firing at variable firing rates cannot form synapses with the next post-synaptic neuron, so I’m inclined to get rid of this option…

There should be a third option where close enough firing rates of the 4 synapses can by accepted by the neuron resulting in a single firing rate because they should be within the adjusting power of LTP/D, but I have yet to find in practice such a scenario..

Still many issues to investigate and solve… finding a good implementation for LTP/D is going to take much longer than I anticipated..

Long Term Potentiation/Depression

Both continue to elude me… So far I have 2 preliminary conclusions or maybe just hypothesis…

  1. Both LTP/D serve to bring neurons to firing synchronization, which I actually understand as a way to determine which inputs are part of a pattern and which are not. As long as neurons fire within a certain time frame, that time frame is minimized through LTP and that minimization, dt, is the uncertainty in recognizing a pattern. LTD also is trying to bring the firing timing within that dt frame or to remove that association entirely.
  2. Both LTD/P activate when dealing with high/low frequency. Low frequency should mean not urgent, don’t propagate fast or through deep layers. High frequency means this is important, go fast and deep, reach a decision node and see if action is necessary. I also observed that high (relative) frequency overflows the neuron with too much potential the net result is that 2 different patterns with high, above threshold energy, cannot be distinguished requiring in fact an LTD event. When I deal with low frequency (presyinaptic activation that does not end up activating the presynaptic) then I delete that synapse, in multiple steps, though..

To me, the second point is difficult to reconcile, why would they be clustered to the same mechanism ? Are they the same mechanisms or appear to be the same but act on different chemicals / mechanisms…

I found a group in Israel, professor Ido Kanter‘s group. Their work seems to be very useful for my project, I could not find that kind of information anywhere else. Very much appreciated. Thank you guys.

still no progress :(

Something is missing still, but I can’t put my finger on it… There is no clear way to decide when a pattern is learned… In the absence of a clear criterion, that would stop the learning and preserve the variables, that patterns is getting broken down into smaller parts until there is a single pre-synaptic for a single post-synaptic neuron… and even that last link breaks, resulting in no connection in between layers… I believe I’ve mentioned this problem more then a year ago and now I’ve gotten full circle back to the same problem… The equations for synapses are different, the conditions for initiation or breaking connections are different… yet this problem remains… I’ve looked back on my notebooks and I found no notes commenting on this problem. It seems I was not sure this was a real problem… No solution seems obvious at this point. Stopping the learning process at an arbitrary time has many drawbacks, but I see no other way of going forward…

make it or break it – take 2

Yup, that time has come. No more wiggle room. This should be the first step in what I want to build … a new breed of AI. This first step should show how learning is accomplished in such a system. Signals should be classified and recognized as the same when seen again. The first show case I want to build is 2 color recognition on a 3 layer network: L1 2×2, L2 5×5 and L3 2×2 with a single inhibitory neuron on layers 2 and 3. I’m also considering showing angle recognition on a 3×3, 3×3, 1 network configuration.

I did not add color vision but I added a simpler simulation where different “pigments” (as in cone cell pigments) would result in different firing rates / neuron.

This could take days or months or maybe it won’t work at all.

make it or break it time

I’m nearly there, about two more weeks to get there. I worked a lot, but mostly doing simulations. The problems more or less remained the same:

  • inter-layer transmission of signal. Because activation frequency decreases with each layer, in layer 3 I already get to the point where the frequency is too low to activate neurons and synapses are classified as low frequency and removed. Increasing some sort of synaptic strength did not work because that increase is limited, it can’t be something arbitrary. Same layer connections should work as an amplifier, but they suffer from bad timing. There are mechanism that should synchronize the firing events but they too fail more often than not. I still have hopes to improve on this though. Another mechanism could be a persistent reinforcing feedback … Every 3 layers or so the third layer should feedback on the first layer, increasing the frequency for both, it seems far fetched but I’m close to implementing this option too.
  • Inhibition – I’m still looking for the proper level of inhibition. How many synapses should act together to overcome inhibition (meaning in spite of inhibition being present and limiting synaptic potential, the neuron should still fire by adding small but many synaptic signals). For now I believe that this should be the desired mechanism, rather than “all or nothing” mechanism I had so far. The all or nothing is still present for fewer activating synapses. But the number of synapses needed to cooperate is still unclear .. it has to be related to some minimum frequency..
  • Synaptic frequency – a single synapse seems to very limited in the values it can take.. In theory I could have around 100 values, but in practical terms that number is even lower .. around 20 values.. Messing with this number is tricky … There is a hard theoretical limit for the highest and lowest values, but if I increase the value, I also increase the computation time and limit the inter-layer transmission. I believe I can find a reasonable compromise, but right now I feel severely limited by this problem..
  • Synaptic connections – has now become the most pressing issue and the most complicated.. Making and breaking synapses on large matrices is slow an requires me to keep track of every connection. A single neuron can connect to a lower layer same layer and upper layer + to a Inhibitory layer .. at the same time. So far I used a simplification, once removed a synapse would not form again. But that was putting serious pressure on the mechanism for timing the connections, meaning a connection should be the best connection possible from the first time. That is not working right. I tried to wait long enough for all connections to form in the first layer and only after that bind to the second layer.. There are too many cases to consider. So now I let connections form and break forever. But keeping track of all of them is difficult. I still wonder how two neurons close by, stop forming an infinite amount of connections among them. They form many, to be sure but not a fixed or an infinite number.. So when does the process stop.. why ? If I don’t keep track of every synapse, two neurons would keep forming synapses among themselves..
  • Training patterns – I thought they could be everything.. I’m not so sure anymore. Signals received from the retina are heterogeneous because there are 3 types of pigments for the cone cells and 2 types of bipolar cells.. This results in a complex pattern even when you deal with a very uniform input pattern (say watching a white piece of paper ). Having a heterogeneous signal works while a homogeneous signal leads to “bad timing” .. close by neurons receiving exactly the same signal cannot form connections among themselves. I have a work around for this but I don’t like it because what I see as input is not the real input (the white sheet of paper, say), I need to imagine the input… that makes understanding very difficult.
  • Small input going into a bigger second layer.. I’ve procrastinated on this. I can create layers of different dimensions but neurons don’t bind with some sort of step. I need to have neurons that don’t get direct connections from the lower layer, or even if they still bind I need to have more neurons in the second layer to act as signal amplifier ..